6_1_plotFig_PcsDistribComp module
Plot Figure 2
and Supplementary Fig. S1
from the paper, i.e., a comparison
between observed and estimated PCS size distributions.
40 pairwise alignments between human and other vertebrates were used to compute the perfectly conserved sequence (PCS) size distribution across a wide divergence time range.
Observed data are shown in blue, and predictions from our model are shown in orange. In the main plots (PCS size distribution), the upper limits for both the x-axis and y-axis vary depending on the maximum PCS size and count found in each species, respectively.
To enhance clarity and prevent points from obscuring one another, the x-axis values in the main plot were divided into 20 logarithmic bins, with the y-axis showing the mean PCS count and its standard deviation for each bin.
Inset shows the distribution of evolutionary times. The insets of all graphs have the same x-axis (log) and y-axis, both ranging between 0 and 1.
Use:
python3 6_1_plotFig_PcsDistribComp.py
Example of Usage:
python3 ~/code/6_1_plotFig_PcsDistribComp.py
Input Parameter:
To ensure the graphs match those used in the paper, the parameters are hard-coded in the script and cannot be modified via command line.
Pre-requisites
Before using this script, make sure all the required files were pre-computed:
a) Files with sampled evolutionary times
Make sure to run 5_sampleEvolTimes.py
for α=1.1.
Time, Memory & Disk space
Running the script on a single core takes 187.39 seconds and requires a small amount of memory.
- Output files:
The SVG output file for Fig. 2,
pcsDistrib-comp.alpha1.1.svg
, has 2.3 MB;The PDF output file for Fig. 2,
pcsDistrib-comp.alpha1.1.pdf
, has 362.2 KB;The SVG output file for Supplementary Fig. 1,
pcsDistrib-comp-supp.alpha1.1.svg
, has 9.1 MB;The PDF output file for Supplementary Fig. 1,
pcsDistrib-comp-supp.alpha1.1.pdf
, has 1.5 MB.
Function details
Only relevant functions have been documented below. For more details on any function, check the comments in the souce code.
- 6_1_plotFig_PcsDistribComp.findBin(val, bins)
Given a list of bins and a value, find in which bin the value should go.
- Parameters:
bins (list of floats) – a sorted list of values. A bin is defined as
(bins[i-1], bins[i]]
.
- 6_1_plotFig_PcsDistribComp.makeFigure2(alpha, my_dataset, allspecies=False)
Create the plots from Figure 2, containing the estimated and observed PCS size distributions for 10 representative species. The same function also creates the supplementary figure of Figure 2, containing the plots of all 40 species. Depending on the case (whether for the main text or the supplementary material), some adjustments are necessary to the positioning and size of elements (insets, icons, etc.).
- 6_1_plotFig_PcsDistribComp.plotPCSsizeDistribBig(ax, PCSdist_obs_onespecies, PCSdist_est_onespecies, min_PCSsize, PCSsize_nbbins)
Plot the observed and estimated distribution of PCS sizes for a pair of species. Due to the limited space, in order to improve clarity and avoid overlapping dots, the x-axis values are organized into bins.
- 6_1_plotFig_PcsDistribComp.plotTauDistribSampled_sum(ax_fg, ax_bg, taudistrib_est_onespecies, tau_bins, min_tau, max_tau, colors, cmap)
Plot the estimated distribution of evolutionary times for a pair of species.
- 6_1_plotFig_PcsDistribComp.prepareDataForPlotting(dictVals)
Input is provided as a dictionary containing binned data, with keys representing bin values (like PCS sizes) and values indicating the counts of PCS sizes within those bins. The process computes the average and standard deviation of the observed counts for each bin.