6_4_plotFig_FuncClasses module
Plot Figure 5
from the paper and related plots
from Supplementary Figures, i.e., plots related to the
distribution of slow-evolving regions across functional classes.
All annotations for human (hg38) were obtained from UCSC Table Browser:
Coding sequence coordinates were taken from the tracks GENCODE-V44 and CCDS;
Repeats from the track RepeatMasker;
Regulation elements from the RefSeq track.
In the generated plots, rows indicate different functional classes, columns indicate different pairwise alignments between human and another vertebrate (left: closest species to human; right: farthest species from human). For each pairwise alignment, the set of slow-evolving windows was defined by selecting the windows with the lowest evolutionary times, summing up to 308 MB (around 10% of the human genome). Variations on this threshold were tested and generated as Supplementary Figures.
For each functional class, the y-axis shows the percentage of windows identified as slow-evolving that fall under that class, compared to the baseline (solid black line in the middle of each graph), which denotes the percentage of windows in the entire genome under the same functional class.
Use:
python3 6_4_plotFig_FuncClasses.py
Example of Usage:
python3 ~/code/6_4_plotFig_FuncClasses.py
Input Parameter:
To ensure the graphs match those used in the paper, the parameters are hard-coded in the script and cannot be modified via command line.
Pre-requisites
Before using this script, make sure all the required files were pre-computed:
a) Files with sampled evolutionary times
Make sure to run 5_sampleEvolTimes.py
for α=1.1.
b) Annotated windows
Make sure to run annotateWindows.py
, to annotate all windows used in the analysis.
Time, Memory & Disk space
Running the script on a single core takes 16 minutes and requires a small amount of memory. In total, the output files (svg+pdf plots) require 3.9 MB of disk space.
Step |
Time (s) |
---|---|
Figure 5 |
248.59 |
Figure 5 - SI (conserv. thresh.: ~1%) |
214.66 |
Figure 5 - SI (conserv. thresh.: ~5%) |
231.99 |
Figure Supp S8 |
243.96 |
Total time |
939.20 |
Output files:
A SVG version of the files listed below is also saved in the same directory:
funcClasses-conservDistrib.conservedThresh308824475.pdf
: The PDF output file containing the plots for Figure 5;
funcClasses-conservDistrib.conservedThresh30882447.pdf
: The PDF output file for Supp. Figure with similar plots to Figure 5, but using a different conservation threshold (1%);
funcClasses-conservDistrib.conservedThresh154412238.pdf
: The PDF output file for Supp. Figure with similar plots to Figure 5, but using a different conservation threshold (5%);
funcClasses-stats.overlapThresh0.0.pdf
: The PDF output file for Supp. Figure S8;
Function details
Only relevant functions have been documented below. For more details on any function, check the comments in the souce code.
- 6_4_plotFig_FuncClasses.getAnnotOrder(annotGrid_paper, rowsLabels)
- 6_4_plotFig_FuncClasses.getAnnotTitles(annotGrid_paper, rowsLabels, my_dataset)
- 6_4_plotFig_FuncClasses.getColors(annotations)
- 6_4_plotFig_FuncClasses.get_aspect(ax)
- 6_4_plotFig_FuncClasses.groupWindowsByAnnotType(my_dataset, annotatedWindows)
- 6_4_plotFig_FuncClasses.loadAnnotations(my_dataset, threshold_annotation)
- 6_4_plotFig_FuncClasses.loadEvolEstimates(my_dataset, UCSCname, alpha, threshold_conserved=-1.0)
- 6_4_plotFig_FuncClasses.makeAnnotationMatrix(alpha, my_dataset, unit, threshold_annotation, threshold_conserved=-1.0)
- 6_4_plotFig_FuncClasses.makeFigure5(alpha, my_dataset, threshold_conserved=250000000.0)
- 6_4_plotFig_FuncClasses.makeFigureSuppS8(alpha, my_dataset)
- 6_4_plotFig_FuncClasses.plotFuncClassDistrib(my_dataset, ax, label, desc, color, x, y_totbps, y_selbps)
Plot the distribution of a given annotation in conserved windows for different species.
- 6_4_plotFig_FuncClasses.sortAnnotations(annotations)
Sort annotations. Each item in the list is a different annotation. An annotation is a tuple of three numbers: (Main category: X, Sub-category: Y, Sub-sub-category: Z). These numbers specify the hierarchical order (X.Y.Z) used in the plots.