6_3_plotFig_MutRatesComp module

Plot Figure 4 from the paper, i.e., a comparison between the new estimated indel rates in the lineage relationships between human and 40 other vertebrates. It also includes, as a reference, Direct estimates and Indirect estimates from previous studies (listed below).

Direct estimates

Direct estimates refers to methods that count mutations between generations in present-day individuals. The following studies were used as reference:

Authors

Year

Indel sizes

Indel rate estimation (original)

Indel rate estimation (CI)

Generation Time

Generation Time Interval (CI)

Indel rate PPPY (Per Position Per Year)

Indel rate PPPY (Per Position Per Year) (CI)

Kloosterman et al.

2015

1-20

0.68*(10**(-9))

29.27

(24.385, 34.155)

2.3231886903593173e-11

(1.990918179344908e-11, 2.788584149604477e-11)

Besenbacher et al.

2016

1-35

0.929*(10**(-9))

30.26

3.07*(10**(-11))

(2.91*10**(-11), 3.25*(10**(-11)))

Maretty et al.

2017

1-10

1.3*(10**(-9))

27.7

4.70e-11

Besenbacher et al.

2015

1-50

1.5e-9

(1.2e-9, 1.9e-9)

28.4

5.28169014084507e-11

(4.225352112676056e-11, 6.690140845070423e-11)

Kondrashov (del.)

2002

1-

0.526*(10**(-9))

(0.216e-9,0.836e-9)

20

2.63e-11

(1.58e-11, 4.18e-11)

Kondrashov (ins.)

2002

1-

0.182*(10**(-9))

(0.072e-9,0.292e-9)

20

0.91e-11

(0.36e-11, 1.46e-11)

Palamara et al.

2015

1-20

1.26*(10**(-9))

(1.2e-09, 1.32e-09)

29

4.3448275862068967e-11

(4.137931034482759e-11, 4.5517241379310344e-11)

Note

Note that the indel sizes vary in each study (see values in the ``Indel sizes’’ column). As lower the maximum indel size is, higher the mutation rate estimation.

Indirect estimates

Indirect estimates refers to estimates based on the evolutionary distance separating two species divided by (twice) their divergence time. The following studies were used as reference:

Authors

Year

Species

Indel sizes

Indel rate estimation (original)

Indel rate estimation (CI)

Generation Time

Generation Time Interval (CI)

Indel rate PPPY (Per Position Per Year)

Indel rate PPPY (Per Position Per Year) (CI)

Nachman and Crowell

2000

Chimp

1-4

2.3*(10**(-9))

20

4.95049504950495e-11

(3.712871287128713e-11, 6.188118811881188e-11)

Lunter

2007

Mouse

1-

0.053

2*87*(10**6)

(2*81.3*(10**6), 2*91*(10**6))

30.46e-11

(29.12087912087912e-11, 32.595325953259533e-11)

Plots

In the generated plot, the new estimates are shown in orange, with the rectangle borders representing the standard deviation, and the middle point showing the mean indel rate. Indirect and direct estimates from previous studies are indicated in green and blue, respectively. All indel rates were adjusted to “per position per year” (PPPY) in order to make them comparable. The orange dashed line indicates the average indel rate across all species. If evolution were uniform across all lineages, values should be concentrated around this point.

  • Use:

    python3 6_3_plotFig_MutRatesComp.py
    
  • Example of Usage:

    python3 ~/code/6_3_plotFig_MutRatesComp.py
    
  • Input Parameter:

To ensure the graphs match those used in the paper, the parameters are hard-coded in the script and cannot be modified via command line.

Pre-requisites

Before using this script, make sure all the required files were pre-computed:

a) Files with sampled evolutionary times

Make sure to run 5_sampleEvolTimes.py for α=1.1.

b) Logs from evolutionary time estimates

Make sure to keep the logs from 4_estimateEvolTimes.py for α=1.1. It contains the information regarding windows without estimates.

Time, Memory & Disk space

Running the script on a single core takes 3 minutes (143.07 seconds) and requires a small amount of memory. In total, the output file requires 2.4 MB of disk space.

Output files:

The file ``mutRates-comp.highEvolTimeQuant0.99.svg’’ contains the plot for Figure 4.

Function details

Only relevant functions have been documented below. For more details on any function, check the comments in the souce code.

class 6_3_plotFig_MutRatesComp.MutRateStudy(author, publYear, methodType, ucscName, mutRatePPPY, mutRatePPPY_lb, mutRatePPPY_ub)

Bases: tuple

author

Alias for field number 0

methodType

Alias for field number 2

mutRatePPPY

Alias for field number 4

mutRatePPPY_lb

Alias for field number 5

mutRatePPPY_ub

Alias for field number 6

publYear

Alias for field number 1

ucscName

Alias for field number 3

6_3_plotFig_MutRatesComp.computeDistribQuantile(taudistrib_est_onespecies, quantile)
6_3_plotFig_MutRatesComp.computeEvolTimeEmpty(my_dataset, UCSCname, alpha, empty_evoltime_quantile)
6_3_plotFig_MutRatesComp.createLines(ax, my_dataset, rows, corr=0)

Creates horizontal lines separating direct and indirect estimates.

6_3_plotFig_MutRatesComp.getDirectEstimates()

It returns a list where each entry corresponds to a previous study. All studies returned by this method are Direct Estimates, i.e., they count mutations that occur between generations in present-day individuals. WARNING: The indel sizes vary in each study (see entry ``Indel sizes’’ in each tuple). As lower the maximum indel size is, higher the mutation rate estimation.

6_3_plotFig_MutRatesComp.getEmptyWindows(alpha, my_dataset)
6_3_plotFig_MutRatesComp.getExtrapolatedEstimates()

It returns a list where each entry corresponds to a previous study. All studies returned by this method are Extrapolated Estimates, i.e., they estimate the indel rate based on the substitution rate. These studies have the generation time unclear and, therefore, were left out of the analysis.

6_3_plotFig_MutRatesComp.getIndirectEstimates()

It returns a list where each entry corresponds to a previous study. All studies returned by this method are Indirect Estimates, i.e., they compute their estimate based on the evolutionary distance and the divergence time separating two species.

6_3_plotFig_MutRatesComp.getNewEstimates(my_dataset, alpha, empty_evoltime_quantile=-1)
6_3_plotFig_MutRatesComp.loadOurData(alpha, my_dataset, empty_windows=None, empty_evoltime_quantile=-1)
6_3_plotFig_MutRatesComp.makeFigure4(alpha, my_dataset)
6_3_plotFig_MutRatesComp.meanEvolTimes(taudistrib_est_onespecies, empty_win_info=(-1, -1), empty_win_tau=-1, bootstrap=False)
6_3_plotFig_MutRatesComp.plotErrorRect(ax, study, ycoord, color)
6_3_plotFig_MutRatesComp.plotIcon(ax, icon_width, iconFilename, xval, yval, rect)
6_3_plotFig_MutRatesComp.plotMutationRateComparison(my_dataset, new_ests, direct_ests, indirect_ests, empty_evoltime_quantile)
6_3_plotFig_MutRatesComp.plotMutationRatePerType(ax, mutRateEstsAll, mutRateEsts, markerStyle, icon_width)
6_3_plotFig_MutRatesComp.yLabels(ax, my_dataset, rows, corr=0)

Creates y-labels The y-label consists of the name of the species that was compared with human (or “Human” if it is a direct comparison), and from which study (author + publication year) the estimate comes from.