The qualification of orthology is a significant challenge when developing large, multiloci phylogenetic data sets from assembled transcripts. Transcriptome assemblies have various attributes, such as fragmentation, frameshifts and mis-indexing, which pose problems to automated methods of orthology assessment. Here, we identify a set of orthologous single-copy genes from transcriptome assemblies for the land snails and slugs (Eupulmonata) using a thorough approach to orthology determination involving manual alignment curation, gene tree assessment and sequencing from genomic DNA. We qualified the orthology of 500 nuclear, protein-coding genes from the transcriptome assemblies of 21 eupulmonate species to produce the most complete phylogenetic data matrix for a major molluscan lineage to date, both in terms of taxon and character completeness. Exon capture targeting 490 of the 500 genes (those with at least one exon textgreater120 bp) from 22 species of Australian Camaenidae successfully captured sequences of 2825 exons (representing all targeted genes), with only a 3.7% reduction in the data matrix due to the presence of putative paralogs or pseudogenes. The automated pipeline Agalma retrieved the majority of the manually qualified 500 single-copy gene set and identified a further 375 putative single-copy genes, although it failed to account for fragmented transcripts resulting in lower data matrix completeness when considering the original 500 genes. This could potentially explain the minor inconsistencies we observed in the supported topologies for the 21 eupulmonate species between the manually curated and ‘Agalma-equivalent’ data set (sharing 458 genes). Overall, our study confirms the utility of the 500 gene set to resolve phylogenetic relationships at a range of evolutionary depths and highlights the importance of addressing fragmentation at the homolog alignment stage for probe design.

By combining high-throughput sequencing with target enrichment (‘hybridization capture’), researchers are able to obtain molecular data from genomic regions of interest for projects that are otherwise constrained by sample quality (e.g. degraded and contamination-rich samples) or a lack of a priori sequence information (e.g. studies on nonmodel species). Despite the use of hybridization capture in various fields of research for many years, the impact of enrichment conditions on capture success is not yet thoroughly understood. We evaluated the impact of a key parameter – hybridization temperature – on the capture success of mitochondrial genomes across the carnivoran family Felidae. Capture was carried out for a range of sample types (fresh, archival, ancient) with varying levels of sequence divergence between bait and target (i.e. across a range of species) using pools of individually indexed libraries on Agilent SureSelect™ arrays. Our results suggest that hybridization capture protocols require specific optimization for the sample type that is being investigated. Hybridization temperature affected the proportion of on-target sequences following capture: for degraded samples, we obtained the best results with a hybridization temperature of 65 °C, while a touchdown approach (65 °C down to 50 °C) yielded the best results for fresh samples. Evaluation of capture performance at a regional scale (sliding window approach) revealed no significant improvement in the recovery of DNA fragments with high sequence divergence from the bait at any of the tested hybridization temperatures, suggesting that hybridization temperature may not be the critical parameter for the enrichment of divergent fragments.

After evolving in Africa at the close of the Miocene, mammoths (Mammuthus sp.) spread through much of the northern hemisphere, diversifying morphologically as they entered various habitats. Paleontologically, these morphs are conventionally recognized as species. In Pleistocene North America alone, several mammoth species have been recognized, inhabiting environments as different as cold tundra-steppe in the north and the arid grasslands or temperate savanna-parklands of the south. Yet mammoth phylogeographic studies have overwhelmingly focused on permafrost-preserved remains of only one of these species, Mammuthus primigenius (woolly mammoth). Here we challenge this bias by performing a geographically and taxonomically wide survey of mammoth genetic diversity across North America. Using a targeted enrichment technique, we sequenced 67 complete mitochondrial genomes from non-primigenius specimens representing M. columbi (Columbian mammoth), M. jeffersonii (Jeffersonian mammoth), and M. exilis (pygmy mammoth), including specimens from contexts not generally associated with good DNA preservation. While we uncovered clear phylogeographic structure in mammoth matrilines, their phylogeny as recovered from mitochondrial DNA is not compatible with existing systematic interpretations of their paleontological record. Instead, our results strongly suggest that various nominal mammoth species interbred, perhaps extensively. We hypothesize that at least two distinct stages of interbreeding between conventional paleontological species are likely responsible for this pattern – one between Siberian woolly mammoths and resident American populations that introduced woolly mammoth phenotypes to the continent, and another between ecomorphologically distinct populations of woolly and Columbian mammoths in North America south of the ice.

With the increasing availability of high-throughput sequencing, phylogenetic analyses are no longer constrained by the limited availability of a few loci. Here, we describe a sequence capture methodology, which we used to collect data for analyses of diversification within Sabal (Arecaceae), a palm genus native to the south-eastern USA, Caribbean, Bermuda and Central America. RNA probes were developed and used to enrich DNA samples for putatively low copy nuclear genes and the plastomes for all Sabal species and two outgroup species. Sequence data were generated on an Illumina MiSeq sequencer and target sequences were assembled using custom workflows. Both coalescence and supermatrix analyses of 133 nuclear genes were used to estimate species trees relationships. Plastid genomes were also analysed, yielding generally poor resolution with regard to species relationships. Species relationships described in both nuclear gene and plastome sequences largely reflect the biogeography of the group and, to a lesser extent, previous morphology-based hypotheses. Beyond the biological implications, this research validates a high-throughput methodology for generating a large number of genes for coalescence-based phylogenetic analyses in plant lineages.

In an era of ever-increasing amounts of whole-genome sequence data for individuals and populations, the utility of traditional single nucleotide polymorphisms (SNPs) array-based genome scans is uncertain. We previously performed a SNP array-based genome scan to identify candidate genes under selection in six distinct grey wolf (Canis lupus) ecotypes. Using this information, we designed a targeted capture array for 1040 genes, including all exons and flanking regions, as well as 5000 1-kb nongenic neutral regions, and resequenced these regions in 107 wolves. Selection tests revealed striking patterns of variation within candidate genes relative to noncandidate regions and identified potentially functional variants related to local adaptation. We found 27% and 47% of candidate genes from the previous SNP array study had functional changes that were outliers in sweed and bayenv analyses, respectively. This result verifies the use of genomewide SNP surveys to tag genes that contain functional variants between populations. We highlight nonsynonymous variants in APOB, LIPG and USH2A that occur in functional domains of these proteins, and that demonstrate high correlation with precipitation seasonality and vegetation. We find Arctic and High Arctic wolf ecotypes have higher numbers of genes under selection, which highlight their conservation value and heightened threat due to climate change. This study demonstrates that combining genomewide genotyping arrays with large-scale resequencing and environmental data provides a powerful approach to discern candidate functional variants in natural populations.

Museums hold most of the world’s most valuable biological specimens and tissues collected, including type material that is often decades or even centuries old. Unfortunately, traditional museum collection and storage methods were not designed to preserve the nucleic acids held within the material, often reducing its potential viability and value for many genetic applications. High-throughput sequencing technologies and associated applications offer new opportunities for obtaining sequence data from museum samples. In particular, target sequence capture offers a promising approach for recovering large numbers of orthologous loci from relatively small amounts of starting material. In the present study, we test the utility of target sequence capture for obtaining data from museum-held material from a speciose mammalian genus: the horseshoe bats (Rhinolophidae: Chiroptera). We designed a ‘bait’ for capturing > 3600 genes and applied this to 10 species of horseshoe bat that had been collected between 93 and 7 years ago and preserved using a range of methods. We found that the mean recovery rate per species was approximately 89% of target genes with partial sequence coverage, ranging from 3024 to 3186 genes recovered. On average, we recovered 1206 genes with ≥ 90% sequence coverage, per species. Our findings provide good support for the application of large-scale bait capture across congeneric species spanning approximately 15 Myr of evolution. On the other hand, we observed no clear association between the success of capture and the phylogenetic distance from the bait model, although sample sizes precluded a formal test.

The complete mitochondrial genome of the extinct musk ox Bootherium bombifrons is presented for the first time. Phylogenetic analysis supports placement of Bootherium as sister to the living musk ox, Ovibos moschatus, in agreement with morphological taxonomy. SNPs identified in the COI-5p region provide a tool for the identification of Bootherium among material, which is not morphologically diagnosable, for example postcrania, coprolites, and archaeological specimens, and/or lacks precise stratigraphic control, like many from glacial alluvium and in placer mines.

The genus Cucurbita (squashes, pumpkins, gourds) contains numerous domesticated lineages with ancient New World origins. It was broadly distributed in the past but has declined to the point that several of the crops’ progenitor species are scarce or unknown in the wild. We hypothesize that Holocene ecological shifts and megafaunal extinctions severely impacted wild Cucurbita, whereas their domestic counterparts adapted to changing conditions via symbiosis with human cultivators. First, we used high-throughput sequencing to analyze complete plastid genomes of 91 total Cucurbita samples, comprising ancient (n = 19), modern wild (n = 30), and modern domestic (n = 42) taxa. This analysis demonstrates independent domestication in eastern North America, evidence of a previously unknown pathway to domestication in northeastern Mexico, and broad archaeological distributions of taxa currently unknown in the wild. Further, sequence similarity between distant wild populations suggests recent fragmentation. Collectively, these results point to wild-type declines coinciding with widespread domestication. Second, we hypothesize that the disappearance of large herbivores struck a critical ecological blow against wild Cucurbita, and we take initial steps to consider this hypothesis through cross-mammal analyses of bitter taste receptor gene repertoires. Directly, megafauna consumed Cucurbita fruits and dispersed their seeds; wild Cucurbita were likely left without mutualistic dispersal partners in the Holocene because they are unpalatable to smaller surviving mammals with more bitter taste receptor genes. Indirectly, megafauna maintained mosaic-like landscapes ideal for Cucurbita, and vegetative changes following the megafaunal extinctions likely crowded out their disturbed-ground niche. Thus, anthropogenic landscapes provided favorable growth habitats and willing dispersal partners in the wake of ecological upheaval.

The spread of farming out of the Balkans and into the rest of Europe followed two distinct routes: An initial expansion represented by the Impressa and Cardial traditions, which followed the Northern Mediterranean coastline; and another expansion represented by the LBK (Linearbandkeramik) tradition, which followed the Danube River into Central Europe. Although genomic data now exist from samples representing the second migration, such data have yet to be successfully generated from the initial Mediterranean migration. To address this, we generated the complete genome of a 7,400-year-old Cardial individual (CB13) from Cova Bonica in Vallirana (Barcelona), as well as partial nuclear data from five others excavated from different sites in Spain and Portugal. CB13 clusters with all previously sequenced early European farmers and modern-day Sardinians. Furthermore, our analyses suggest that both Cardial and LBK peoples derived from a common ancient population located in or around the Balkan Peninsula. The Iberian Cardial genome also carries a discernible hunter–gatherer genetic signature that likely was not acquired by admixture with local Iberian foragers. Our results indicate that retrieving ancient genomes from similarly warm Mediterranean environments such as the Near East is technically feasible.