In common along with other answers to quoting the brand new variables regarding self-confident selection, i have made multiple literally unrealistic assumptions. Since BGS seemingly have apparently little affect new ? and you can p prices (Table step 1), area of the real question is the end result out of group situations on the SSW estimates. Inclusion of these issue during the suggestions for estimating selection parameters try a difficult disease. not, i observe that the latest spread to a high frequency out of a beneficial positive mutation in a people spread-over a two-dimensional ecosystem is significantly much slower than in an effective panmictic society, which suggests that there surely is a whole lot more chance for recombination so you’re able to dilute the consequences off SSWs than which have panmixia (47). This process manage therefore bring about our ? estimates to be shorter as compared to real philosophy, in addition to p rates becoming big.
Materials and techniques
Second, we have assumed “hard” sweeps, based on unique mutations, rather than “soft sweeps” based on recurrent mutations or mutations arising from standing variation (48). If soft sweeps are prevalent in Drosophila, as has recently been argued (49), then the same pattern of bias as from a subdivided population would arise (50, 51). (Note, however, that gene conversion of a favored mutation onto an ancestral haplotype could generate the appearance of a soft sweep.) The opposite would apply to incomplete sweeps (52), if their incidence in a gene is correlated with its KA value. These were omitted from our models because they do not affect KA. However, the lack of evidence for intermediate-frequency NS and synonymous variants in pooled site frequency spectra for the Rwandan population of D. melanogaster, as seen in figure 5 of ref. 33, suggests that incomplete sweeps are relatively infrequent in this population. If favorable mutations do not arise as single events, the estimates of the proportions of favorable mutations are likely to be overestimated as well.
These considerations mean that the estimates of the parameters of positive selection obtained in this and previous studies need to be treated with caution, and will no doubt be revised with future improvements in inference procedures. It seems clear, however, that hitchhiking effects greatly reduce neutral or nearly neutral sequence diversity in genes in normally recombining regions of the Drosophila genome. There is increasing evidence that this is also true for many other organisms (1, 3). Such processes have important implications for attempts to estimate demographic parameters, which usually ignore these complications, as has been pointed out before (53 ? ? –56). This is especially important when selection at linked sites distorts gene genealogies and hence site frequency spectra, because these are the main basis for inferring demographic parameters. There is evidence from our unbinned data for mel-yak that KA is weakly positively correlated with the proportion of singletons at escort North Charleston synonymous sites (Spearman partial rank correlation, ? = 0.044, P = 0.002), consistent with increased distortions of the frequency spectra caused by hitchhiking in genes with large KA, as was previously found by Andolfatto (15). The problem of relating the magnitude of these effects to the BGS and SSW models remains to be explored.
Primary Study Analyses.
We used polymorphism data for coding sequences of 7,099 autosomal genes, using 17 haploid genomes from the Gikongoro (Rwanda) population of Drosophila melanogaster provided by the Drosophila Population Genomics Project 2 (57), with Drosophila yakuba as an outgroup. The coding sequence data were filtered and analyzed as described in materials and methods in ref. 19. We excluded 225 genes located in the autosomal heterochromatic regions and on chromosome 4, where crossing over is absent (19, 58). We obtained diversity and divergence statistics for synonymous and NS sites, as well as for 5?- and 3?-YouTRs for D. melanogaster genes with UTR annotations. For the analyses of UTRs, we followed the annotations of Flybase, version 5.33, masking any UTRs included in coding sequences and excluding UTRs with no available sequence in the outgroup, leaving a dataset of 5,992 genes with 3?- and/or 5?-UTRs. After applying a Kimura two-parameter correction (59), the mean level of divergence of UTR sequences between species, KU, was 0.10, which is intermediate between the mean values for NS sites (KA = 0.038) and synonymous sites (KS = 0.262).