Saturation of the PCR Survey
Saturation Curves
The completeness of the sample can be estimated using a first
order saturation model
Nnew(n) = Ntotal
[ 1 - exp(-K n)
]
where Nnew(n) is the number of distinct Hox genes
encountered after sequencing n clones. Saturation curves are
shown below:
Figure. Saturation curve for the three PCR series.
Probability of Missing a Gene
The probability of missing a gene given the expected number
Ntotal of distinct genes and a sample size
of n sequences PCR products (which Hox genes) is
Prob[miss] = (
1-1/Ntotal )n
PCR series |
n |
Ntotal |
Nfound |
K |
corr. |
chi2 |
Prob[miss] |
5E5/3F |
29 |
8.02 |
8 |
0.1575 |
0.988 |
2.55 |
0.021 |
5E5-2/3F |
64 |
14.52 |
14 |
0.080 |
0.991 |
18.15 |
0.010 |
5E/3F |
144 |
22.88 |
24 |
0.027 |
0.972 |
283.82 |
0.002 |
Note that the E/F series does not fit well to the saturation curve.
This was noted earlier, see also [Misof, M.Y. and Wagner G.P., Evidence
for Four Hox Clusters in the Killifish Fundulus Heteroclitus
(Teleostei), Mol. Phyl. Evol. 5: 309-322 (1996).
Summary
The survey is fairly well saturated in the sense that it is unlikely
that:
(1) any further sequences would be found in 5E5/3F,
(2) more than one more sequence would be found in 5E5-2/3F,
(3) any further sequences would be found in 5E/3F.