Saturation of the PCR Survey

Saturation Curves

The completeness of the sample can be estimated using a first order saturation model

N_new(n) = N_total [ 1 - exp(-K n) ]

where N_new(n) is the number of distinct Hox genes encountered after sequencing n clones. Saturation curves are shown below:

Figure. Saturation curve for the three PCR series.

Probability of Missing a Gene

The probability of missing a gene given the expected number N_total of distinct genes and a sample size of n sequences PCR products (which Hox genes) is

Prob[miss] = ( 1-1/N_total )ⁿ

PCR series n N_total N_found K corr. chi² Prob[miss]

5E5/3F 29 8.02 8 0.1575 0.988 2.55 0.021

5E5-2/3F 64 14.52 14 0.080 0.991 18.15 0.010

5E/3F 144 22.88 24 0.027 0.972 283.82 0.002

Note that the E/F series does not fit well to the saturation curve. This was noted earlier, see also [Misof, M.Y. and Wagner G.P., Evidence for Four Hox Clusters in the Killifish Fundulus Heteroclitus (Teleostei), Mol. Phyl. Evol. 5: 309-322 (1996).

Summary

The survey is fairly well saturated in the sense that it is unlikely that:
(1) any further sequences would be found in 5E5/3F,
(2) more than one more sequence would be found in 5E5-2/3F,
(3) any further sequences would be found in 5E/3F.

PCR series	n	N_total	N_found	K	corr.	chi²	Prob[miss]
5E5/3F	29	8.02	8	0.1575	0.988	2.55	0.021
5E5-2/3F	64	14.52	14	0.080	0.991	18.15	0.010
5E/3F	144	22.88	24	0.027	0.972	283.82	0.002