Production of massive DNA sequence data sets is transforming phylogenetic inference, but best practices for analyzing such data sets are not well established. One uncertainty is robustness to missing data, particularly in coalescent frameworks. To understand the effects of increasing matrix size and loci at the cost of increasing missing data, we produced a 90 taxon, 2.2 megabase, 4,800 locus sequence matrix of landfowl using target capture of ultraconserved elements. We then compared phylogenies estimated with concatenated maximum likelihood, quartet-based methods executed on concatenated matrices and gene tree reconciliation methods, across five thresholds of missing data. Results of maximum likelihood and quartet analyses were similar, well resolved, and demonstrated increasing support with increasing matrix size and sparseness. Conversely, gene tree reconciliation produced unexpected relationships when we included all informative loci, with certain taxa placed toward the root compared with other approaches. Inspection of these taxa identified a prevalence of short average contigs, which potentially biased gene tree inference and caused erroneous results in gene tree reconciliation. This suggests that the more problematic missing data in gene tree–based analyses are partial sequences rather than entire missing sequences from locus alignments. Limiting gene tree reconciliation to the most informative loci solved this problem, producing well-supported topologies congruent with concatenation and quartet methods. Collectively, our analyses provide a well-resolved phylogeny of landfowl, including strong support for previously problematic relationships such as those among junglefowl (Gallus), and clarify the position of two enigmatic galliform genera (Lerwa, Melanoperdix) not sampled in previous molecular phylogenetic studies.

Resolving the short phylogenetic branches that result from rapid evolutionary diversification often requires large numbers of loci. We collected targeted sequence capture data from 585 nuclear loci (541 ultraconserved elements and 44 protein-coding genes) to estimate the phylogenetic relationships among iguanian lizards in the North American genus Sceloporus. We tested for diversification rate shifts to determine if rapid radiation in the genus is correlated with chromosomal evolution.

Premise of research. Studies of complete plastomes have proven informative for our understanding of the molecular evolution and phylogenomics of grasses, but subfamily Chloridoideae has not been included in this research. In previous multilocus studies, specific deep branches, as in the large clade corresponding to Cynodonteae, are not uniformly well supported.Methodology. In this study, a plastome phylogenomic analysis sampled 14 species representing 4 tribes and 10 genera of Chloridoideae. One species was Sanger sequenced, and 14 other species, including outgroups, were sequenced with next-generation sequencing-by-synthesis methods. Plastomes from next-generation sequences were assembled by de novo methods, and the unambiguously aligned coding and noncoding sequences of the entire plastomes were analyzed phylogenetically.Pivotal results. Complete plastomes showed rare genomic changes in Distichlis, Centropodia, and Eragrostis tef that were of potential phylogenomic significance. Phylogenomic analyses showed uniformly strong support for all ingroup relationships except one node in Cynodonteae in which a short internal branch connected long terminal branches. Resolution within this clade was found to be taxon dependent and possibly subject to long-branch attraction artifacts.Conclusions. Our study indicates that the increase in phylogenetic information in sequences of entire plastomes well resolves and strongly supports relationships among tribes and genera of chloridoid grasses. Sampling more species, especially in the Centropodia + Ellisochloa clade and Cynodonteae, will further address relationships in these groups and clarify the evolutionary origins of the subfamily.

Summary Among the fossils of hitherto unknown mammals that Darwin collected in South America between 1832 and 1833 during the Beagle expedition [1] were examples of the large, heavily armored herbivores later known as glyptodonts. Ever since, glyptodonts have fascinated evolutionary biologists because of their remarkable skeletal adaptations and seemingly isolated phylogenetic position even within their natural group, the cingulate xenarthrans (armadillos and their allies [2]). In possessing a carapace comprised of fused osteoderms, the glyptodonts were clearly related to other cingulates, but their precise phylogenetic position as suggested by morphology remains unresolved [3,4]. To provide a molecular perspective on this issue, we designed sequence-capture baits using in silico reconstructed ancestral sequences and successfully assembled the complete mitochondrial genome of Doedicurus sp., one of the largest glyptodonts. Our phylogenetic reconstructions establish that glyptodonts are in fact deeply nested within the armadillo crown-group, representing a distinct subfamily (Glyptodontinae) within family Chlamyphoridae [5]. Molecular dating suggests that glyptodonts diverged no earlier than around 35 million years ago, in good agreement with their fossil record. Our results highlight the derived nature of the glyptodont morphotype, one aspect of which is a spectacular increase in body size until their extinction at the end of the last ice age.

Cis-regulatory elements (CREs, e.g., promoters and enhancers) regulate gene expression, and variants within CREs can modulate disease risk. Next-generation sequencing has enabled the rapid generation of genomic data that predict the locations of CREs, but a bottleneck lies in functionally interpreting these data. To address this issue, massively parallel reporter assays (MPRAs) have emerged, in which barcoded reporter libraries are introduced into cells, and the resulting barcoded transcripts are quantified by next-generation sequencing. Thus far, MPRAs have been largely restricted to assaying short CREs in a limited repertoire of cultured cell types. Here, we present two advances that extend the biological relevance and applicability of MPRAs. First, we adapt exome capture technology to instead capture candidate CREs, thereby tiling across the targeted regions and markedly increasing the length of CREs that can be readily assayed. Second, we package the library into adeno-associated virus (AAV), thereby allowing delivery to target organs in vivo. As a proof of concept, we introduce a capture library of about 46,000 constructs, corresponding to roughly 3500 DNase I hypersensitive (DHS) sites, into the mouse retina by ex vivo plasmid electroporation and into the mouse cerebral cortex by in vivo AAV injection. We demonstrate tissue-specific cis-regulatory activity of DHSs and provide examples of high-resolution truncation mutation analysis for multiplex parsing of CREs. Our approach should enable massively parallel functional analysis of a wide range of CREs in any organ or species that can be infected by AAV, such as nonhuman primates and human stem cell–derived organoids.

Massively parallel sequencing has revolutionized many areas of biology, but sequencing large amounts of DNA in many individuals is cost-prohibitive and unnecessary for many studies. Genomic complexity reduction techniques such as sequence capture and restriction enzyme-based methods enable the analysis of many more individuals per unit cost. Despite their utility, current complexity reduction methods have limitations, especially when large numbers of individuals are analyzed. Here we develop a much improved restriction site-associated DNA (RAD) sequencing protocol and a new method called Rapture (RAD capture). The new RAD protocol improves versatility by separating RAD tag isolation and sequencing library preparation into two distinct steps. This protocol also recovers more unique (nonclonal) RAD fragments, which improves both standard RAD and Rapture analysis. Rapture then uses an in-solution capture of chosen RAD tags to target sequencing reads to desired loci. Rapture combines the benefits of both RAD and sequence capture, i.e., very inexpensive and rapid library preparation for many individuals as well as high specificity in the number and location of genomic loci analyzed. Our results demonstrate that Rapture is a rapid and flexible technology capable of analyzing a very large number of individuals with minimal sequencing and library preparation cost. The methods presented here should improve the efficiency of genetic analysis for many aspects of agricultural, environmental, and biomedical science.

Exon-capture studies have typically been restricted to relatively shallow phylogenetic scales due primarily to hybridization constraints. Here, we present an exon-capture system for an entire class of marine invertebrates, the Ophiuroidea, built upon a phylogenetically diverse transcriptome foundation. The system captures approximately 90% of the 1,552 exon target, across all major lineages of the quarter-billion-year-old extant crown group. Key features of our system are 1) basing the target on an alignment of orthologous genes determined from 52 transcriptomes spanning the phylogenetic diversity and trimmed to remove anything difficult to capture, map, or align; 2) use of multiple artificial representatives based on ancestral state reconstructions rather than exemplars to improve capture and mapping of the target; 3) mapping reads to a multi-reference alignment; and 4) using patterns of site polymorphism to distinguish among paralogy, polyploidy, allelic differences, and sample contamination. The resulting data give a well-resolved tree (currently standing at 417 samples, 275,352 sites, 91% data-complete) that will transform our understanding of ophiuroid evolution and biogeography.

The qualification of orthology is a significant challenge when developing large, multiloci phylogenetic data sets from assembled transcripts. Transcriptome assemblies have various attributes, such as fragmentation, frameshifts and mis-indexing, which pose problems to automated methods of orthology assessment. Here, we identify a set of orthologous single-copy genes from transcriptome assemblies for the land snails and slugs (Eupulmonata) using a thorough approach to orthology determination involving manual alignment curation, gene tree assessment and sequencing from genomic DNA. We qualified the orthology of 500 nuclear, protein-coding genes from the transcriptome assemblies of 21 eupulmonate species to produce the most complete phylogenetic data matrix for a major molluscan lineage to date, both in terms of taxon and character completeness. Exon capture targeting 490 of the 500 genes (those with at least one exon textgreater120 bp) from 22 species of Australian Camaenidae successfully captured sequences of 2825 exons (representing all targeted genes), with only a 3.7% reduction in the data matrix due to the presence of putative paralogs or pseudogenes. The automated pipeline Agalma retrieved the majority of the manually qualified 500 single-copy gene set and identified a further 375 putative single-copy genes, although it failed to account for fragmented transcripts resulting in lower data matrix completeness when considering the original 500 genes. This could potentially explain the minor inconsistencies we observed in the supported topologies for the 21 eupulmonate species between the manually curated and ‘Agalma-equivalent’ data set (sharing 458 genes). Overall, our study confirms the utility of the 500 gene set to resolve phylogenetic relationships at a range of evolutionary depths and highlights the importance of addressing fragmentation at the homolog alignment stage for probe design.

By combining high-throughput sequencing with target enrichment (‘hybridization capture’), researchers are able to obtain molecular data from genomic regions of interest for projects that are otherwise constrained by sample quality (e.g. degraded and contamination-rich samples) or a lack of a priori sequence information (e.g. studies on nonmodel species). Despite the use of hybridization capture in various fields of research for many years, the impact of enrichment conditions on capture success is not yet thoroughly understood. We evaluated the impact of a key parameter – hybridization temperature – on the capture success of mitochondrial genomes across the carnivoran family Felidae. Capture was carried out for a range of sample types (fresh, archival, ancient) with varying levels of sequence divergence between bait and target (i.e. across a range of species) using pools of individually indexed libraries on Agilent SureSelect™ arrays. Our results suggest that hybridization capture protocols require specific optimization for the sample type that is being investigated. Hybridization temperature affected the proportion of on-target sequences following capture: for degraded samples, we obtained the best results with a hybridization temperature of 65 °C, while a touchdown approach (65 °C down to 50 °C) yielded the best results for fresh samples. Evaluation of capture performance at a regional scale (sliding window approach) revealed no significant improvement in the recovery of DNA fragments with high sequence divergence from the bait at any of the tested hybridization temperatures, suggesting that hybridization temperature may not be the critical parameter for the enrichment of divergent fragments.

After evolving in Africa at the close of the Miocene, mammoths (Mammuthus sp.) spread through much of the northern hemisphere, diversifying morphologically as they entered various habitats. Paleontologically, these morphs are conventionally recognized as species. In Pleistocene North America alone, several mammoth species have been recognized, inhabiting environments as different as cold tundra-steppe in the north and the arid grasslands or temperate savanna-parklands of the south. Yet mammoth phylogeographic studies have overwhelmingly focused on permafrost-preserved remains of only one of these species, Mammuthus primigenius (woolly mammoth). Here we challenge this bias by performing a geographically and taxonomically wide survey of mammoth genetic diversity across North America. Using a targeted enrichment technique, we sequenced 67 complete mitochondrial genomes from non-primigenius specimens representing M. columbi (Columbian mammoth), M. jeffersonii (Jeffersonian mammoth), and M. exilis (pygmy mammoth), including specimens from contexts not generally associated with good DNA preservation. While we uncovered clear phylogeographic structure in mammoth matrilines, their phylogeny as recovered from mitochondrial DNA is not compatible with existing systematic interpretations of their paleontological record. Instead, our results strongly suggest that various nominal mammoth species interbred, perhaps extensively. We hypothesize that at least two distinct stages of interbreeding between conventional paleontological species are likely responsible for this pattern – one between Siberian woolly mammoths and resident American populations that introduced woolly mammoth phenotypes to the continent, and another between ecomorphologically distinct populations of woolly and Columbian mammoths in North America south of the ice.