Production of massive DNA sequence data sets is transforming phylogenetic inference, but best practices for analyzing such data sets are not well established. One uncertainty is robustness to missing data, particularly in coalescent frameworks. To understand the effects of increasing matrix size and loci at the cost of increasing missing data, we produced a 90 taxon, 2.2 megabase, 4,800 locus sequence matrix of landfowl using target capture of ultraconserved elements. We then compared phylogenies estimated with concatenated maximum likelihood, quartet-based methods executed on concatenated matrices and gene tree reconciliation methods, across five thresholds of missing data. Results of maximum likelihood and quartet analyses were similar, well resolved, and demonstrated increasing support with increasing matrix size and sparseness. Conversely, gene tree reconciliation produced unexpected relationships when we included all informative loci, with certain taxa placed toward the root compared with other approaches. Inspection of these taxa identified a prevalence of short average contigs, which potentially biased gene tree inference and caused erroneous results in gene tree reconciliation. This suggests that the more problematic missing data in gene tree–based analyses are partial sequences rather than entire missing sequences from locus alignments. Limiting gene tree reconciliation to the most informative loci solved this problem, producing well-supported topologies congruent with concatenation and quartet methods. Collectively, our analyses provide a well-resolved phylogeny of landfowl, including strong support for previously problematic relationships such as those among junglefowl (Gallus), and clarify the position of two enigmatic galliform genera (Lerwa, Melanoperdix) not sampled in previous molecular phylogenetic studies.

Resolving the short phylogenetic branches that result from rapid evolutionary diversification often requires large numbers of loci. We collected targeted sequence capture data from 585 nuclear loci (541 ultraconserved elements and 44 protein-coding genes) to estimate the phylogenetic relationships among iguanian lizards in the North American genus Sceloporus. We tested for diversification rate shifts to determine if rapid radiation in the genus is correlated with chromosomal evolution.

Premise of research. Studies of complete plastomes have proven informative for our understanding of the molecular evolution and phylogenomics of grasses, but subfamily Chloridoideae has not been included in this research. In previous multilocus studies, specific deep branches, as in the large clade corresponding to Cynodonteae, are not uniformly well supported.Methodology. In this study, a plastome phylogenomic analysis sampled 14 species representing 4 tribes and 10 genera of Chloridoideae. One species was Sanger sequenced, and 14 other species, including outgroups, were sequenced with next-generation sequencing-by-synthesis methods. Plastomes from next-generation sequences were assembled by de novo methods, and the unambiguously aligned coding and noncoding sequences of the entire plastomes were analyzed phylogenetically.Pivotal results. Complete plastomes showed rare genomic changes in Distichlis, Centropodia, and Eragrostis tef that were of potential phylogenomic significance. Phylogenomic analyses showed uniformly strong support for all ingroup relationships except one node in Cynodonteae in which a short internal branch connected long terminal branches. Resolution within this clade was found to be taxon dependent and possibly subject to long-branch attraction artifacts.Conclusions. Our study indicates that the increase in phylogenetic information in sequences of entire plastomes well resolves and strongly supports relationships among tribes and genera of chloridoid grasses. Sampling more species, especially in the Centropodia + Ellisochloa clade and Cynodonteae, will further address relationships in these groups and clarify the evolutionary origins of the subfamily.

Summary Among the fossils of hitherto unknown mammals that Darwin collected in South America between 1832 and 1833 during the Beagle expedition [1] were examples of the large, heavily armored herbivores later known as glyptodonts. Ever since, glyptodonts have fascinated evolutionary biologists because of their remarkable skeletal adaptations and seemingly isolated phylogenetic position even within their natural group, the cingulate xenarthrans (armadillos and their allies [2]). In possessing a carapace comprised of fused osteoderms, the glyptodonts were clearly related to other cingulates, but their precise phylogenetic position as suggested by morphology remains unresolved [3,4]. To provide a molecular perspective on this issue, we designed sequence-capture baits using in silico reconstructed ancestral sequences and successfully assembled the complete mitochondrial genome of Doedicurus sp., one of the largest glyptodonts. Our phylogenetic reconstructions establish that glyptodonts are in fact deeply nested within the armadillo crown-group, representing a distinct subfamily (Glyptodontinae) within family Chlamyphoridae [5]. Molecular dating suggests that glyptodonts diverged no earlier than around 35 million years ago, in good agreement with their fossil record. Our results highlight the derived nature of the glyptodont morphotype, one aspect of which is a spectacular increase in body size until their extinction at the end of the last ice age.

Massively parallel sequencing has revolutionized many areas of biology, but sequencing large amounts of DNA in many individuals is cost-prohibitive and unnecessary for many studies. Genomic complexity reduction techniques such as sequence capture and restriction enzyme-based methods enable the analysis of many more individuals per unit cost. Despite their utility, current complexity reduction methods have limitations, especially when large numbers of individuals are analyzed. Here we develop a much improved restriction site-associated DNA (RAD) sequencing protocol and a new method called Rapture (RAD capture). The new RAD protocol improves versatility by separating RAD tag isolation and sequencing library preparation into two distinct steps. This protocol also recovers more unique (nonclonal) RAD fragments, which improves both standard RAD and Rapture analysis. Rapture then uses an in-solution capture of chosen RAD tags to target sequencing reads to desired loci. Rapture combines the benefits of both RAD and sequence capture, i.e., very inexpensive and rapid library preparation for many individuals as well as high specificity in the number and location of genomic loci analyzed. Our results demonstrate that Rapture is a rapid and flexible technology capable of analyzing a very large number of individuals with minimal sequencing and library preparation cost. The methods presented here should improve the efficiency of genetic analysis for many aspects of agricultural, environmental, and biomedical science.

Cis-regulatory elements (CREs, e.g., promoters and enhancers) regulate gene expression, and variants within CREs can modulate disease risk. Next-generation sequencing has enabled the rapid generation of genomic data that predict the locations of CREs, but a bottleneck lies in functionally interpreting these data. To address this issue, massively parallel reporter assays (MPRAs) have emerged, in which barcoded reporter libraries are introduced into cells, and the resulting barcoded transcripts are quantified by next-generation sequencing. Thus far, MPRAs have been largely restricted to assaying short CREs in a limited repertoire of cultured cell types. Here, we present two advances that extend the biological relevance and applicability of MPRAs. First, we adapt exome capture technology to instead capture candidate CREs, thereby tiling across the targeted regions and markedly increasing the length of CREs that can be readily assayed. Second, we package the library into adeno-associated virus (AAV), thereby allowing delivery to target organs in vivo. As a proof of concept, we introduce a capture library of about 46,000 constructs, corresponding to roughly 3500 DNase I hypersensitive (DHS) sites, into the mouse retina by ex vivo plasmid electroporation and into the mouse cerebral cortex by in vivo AAV injection. We demonstrate tissue-specific cis-regulatory activity of DHSs and provide examples of high-resolution truncation mutation analysis for multiplex parsing of CREs. Our approach should enable massively parallel functional analysis of a wide range of CREs in any organ or species that can be infected by AAV, such as nonhuman primates and human stem cell–derived organoids.

Metazoan genomes are spatially organized at multiple scales, from packaging of DNA around individual nucleosomes to segregation of whole chromosomes into distinct territories. At the intermediate scale of kilobases to megabases, which encompasses the sizes of genes, gene clusters and regulatory domains, the three-dimensional (3D) organization of DNA is implicated in multiple gene regulatory mechanisms, but understanding this organization remains a challenge. At this scale, the genome is partitioned into domains of different epigenetic states that are essential for regulating gene expression. Here we investigate the 3D organization of chromatin in different epigenetic states using super-resolution imaging. We classified genomic domains in Drosophila cells into transcriptionally active, inactive or Polycomb-repressed states, and observed distinct chromatin organizations for each state. All three types of chromatin domains exhibit power-law scaling between their physical sizes in 3D and their domain lengths, but each type has a distinct scaling exponent. Polycomb-repressed domains show the densest packing and most intriguing chromatin folding behaviour, in which chromatin packing density increases with domain length. Distinct from the self-similar organization displayed by transcriptionally active and inactive chromatin, the Polycomb-repressed domains are characterized by a high degree of chromatin intermixing within the domain. Moreover, compared to inactive domains, Polycomb-repressed domains spatially exclude neighbouring active chromatin to a much stronger degree. Computational modelling and knockdown experiments suggest that reversible chromatin interactions mediated by Polycomb-group proteins play an important role in these unique packaging properties of the repressed chromatin. Taken together, our super-resolution images reveal distinct chromatin packaging for different epigenetic states at the kilobase-to-megabase scale, a length scale that is directly relevant to genome regulation.

With the increasing availability of high-throughput sequencing, phylogenetic analyses are no longer constrained by the limited availability of a few loci. Here, we describe a sequence capture methodology, which we used to collect data for analyses of diversification within Sabal (Arecaceae), a palm genus native to the south-eastern USA, Caribbean, Bermuda and Central America. RNA probes were developed and used to enrich DNA samples for putatively low copy nuclear genes and the plastomes for all Sabal species and two outgroup species. Sequence data were generated on an Illumina MiSeq sequencer and target sequences were assembled using custom workflows. Both coalescence and supermatrix analyses of 133 nuclear genes were used to estimate species trees relationships. Plastid genomes were also analysed, yielding generally poor resolution with regard to species relationships. Species relationships described in both nuclear gene and plastome sequences largely reflect the biogeography of the group and, to a lesser extent, previous morphology-based hypotheses. Beyond the biological implications, this research validates a high-throughput methodology for generating a large number of genes for coalescence-based phylogenetic analyses in plant lineages.

In an era of ever-increasing amounts of whole-genome sequence data for individuals and populations, the utility of traditional single nucleotide polymorphisms (SNPs) array-based genome scans is uncertain. We previously performed a SNP array-based genome scan to identify candidate genes under selection in six distinct grey wolf (Canis lupus) ecotypes. Using this information, we designed a targeted capture array for 1040 genes, including all exons and flanking regions, as well as 5000 1-kb nongenic neutral regions, and resequenced these regions in 107 wolves. Selection tests revealed striking patterns of variation within candidate genes relative to noncandidate regions and identified potentially functional variants related to local adaptation. We found 27% and 47% of candidate genes from the previous SNP array study had functional changes that were outliers in sweed and bayenv analyses, respectively. This result verifies the use of genomewide SNP surveys to tag genes that contain functional variants between populations. We highlight nonsynonymous variants in APOB, LIPG and USH2A that occur in functional domains of these proteins, and that demonstrate high correlation with precipitation seasonality and vegetation. We find Arctic and High Arctic wolf ecotypes have higher numbers of genes under selection, which highlight their conservation value and heightened threat due to climate change. This study demonstrates that combining genomewide genotyping arrays with large-scale resequencing and environmental data provides a powerful approach to discern candidate functional variants in natural populations.

After evolving in Africa at the close of the Miocene, mammoths (Mammuthus sp.) spread through much of the northern hemisphere, diversifying morphologically as they entered various habitats. Paleontologically, these morphs are conventionally recognized as species. In Pleistocene North America alone, several mammoth species have been recognized, inhabiting environments as different as cold tundra-steppe in the north and the arid grasslands or temperate savanna-parklands of the south. Yet mammoth phylogeographic studies have overwhelmingly focused on permafrost-preserved remains of only one of these species, Mammuthus primigenius (woolly mammoth). Here we challenge this bias by performing a geographically and taxonomically wide survey of mammoth genetic diversity across North America. Using a targeted enrichment technique, we sequenced 67 complete mitochondrial genomes from non-primigenius specimens representing M. columbi (Columbian mammoth), M. jeffersonii (Jeffersonian mammoth), and M. exilis (pygmy mammoth), including specimens from contexts not generally associated with good DNA preservation. While we uncovered clear phylogeographic structure in mammoth matrilines, their phylogeny as recovered from mitochondrial DNA is not compatible with existing systematic interpretations of their paleontological record. Instead, our results strongly suggest that various nominal mammoth species interbred, perhaps extensively. We hypothesize that at least two distinct stages of interbreeding between conventional paleontological species are likely responsible for this pattern – one between Siberian woolly mammoths and resident American populations that introduced woolly mammoth phenotypes to the continent, and another between ecomorphologically distinct populations of woolly and Columbian mammoths in North America south of the ice.