Common Sequence Polymorphisms Shaping Genetic Diversity in Arabidopsis thaliana

Richard M. Clark, Gabriele Schweikert,* Christopher Toomajian,* Stephan Ossowski,* Georg Zeller,* Paul Shinn, Norman Warthmann, Tina T. Hu, Glenn Fu, David A. Hinds, Huaming Chen, Kelly A. Frazer, Daniel H. Huson, Bernhard Schölkopf, Magnus Nordborg, Gunnar Rätsch, Joseph R. Ecker, Detlef Weigel


The genomes of individuals from the same species vary in sequence as a result of different evolutionary processes. To examine the patterns of, and the forces shaping, sequence variation in Arabidopsis thaliana, we performed high-density array resequencing of 20 diverse strains (accessions). More than 1 million nonredundant single-nucleotide polymorphisms (SNPs) were identified at moderate false discovery rates (FDRs), and 4% of the genome was identified as being highly dissimilar or deleted relative to the reference genome sequence. Patterns of polymorphism are highly nonrandom among gene families, with genes mediating interaction with the biotic environment having exceptional polymorphism levels. At the chromosomal scale, regional variation in polymorphism was readily apparent. A scan for recent selective sweeps revealed several candidate regions, including a notable example in which almost all variation was removed in a 500-kilobase window. Analyzing the polymorphisms we describe in larger sets of accessions will enable a detailed understanding of forces shaping population-wide sequence variation in A. thaliana.