Publications Archives

* Targeted enrichment of conserved genomic regions (e.g. ultraconserved elements or UCEs) has emerged as a promising tool for inferring evolutionary history in many organismal groups. Because the UCE approach is still relatively new, much remains to be learned about how best to identify UCE loci and design baits to enrich them. * We test an updated UCE identification and bait design workflow for the insect order Hymenoptera, with a particular focus on ants. The new strategy augments a previous bait design for Hymenoptera by (i) changing the parameters by which conserved genomic regions are identified and retained, and (ii) increasing the number of genomes used for locus identification and bait design. We perform in vitro validation of the approach in ants by synthesizing an ant-specific bait set that targets UCE loci and a set of ‘legacy’ phylogenetic markers. Using this bait set, we generate new data for 84 taxa (16/17 ant subfamilies) and extract loci from an additional 17 genome-enabled taxa. We then use these data to examine UCE capture success and phylogenetic performance across ants. We also test the workability of extracting legacy markers from enriched samples and combining the data with published datasets. * The updated bait design (hym-v2) contained a total of 2590-targeted UCE loci for Hymenoptera, significantly increasing the number of loci relative to the original bait set (hym-v1; 1510 loci). Across 38 genome-enabled Hymenoptera and 84 enriched samples, experiments demonstrated a high and unbiased capture success rate, with the mean locus enrichment rate being 2214 loci per sample. Phylogenomic analyses of ants produced a robust tree that included strong support for previously uncertain relationships. Complementing the UCE results, we successfully enriched legacy markers, combined the data with published Sanger datasets and generated a comprehensive ant phylogeny containing 1060 terminals. * Overall, the new UCE bait design strategy resulted in an enhanced bait set for genome-scale phylogenetics in ants and other Hymenoptera. Our in vitro tests demonstrate the utility of the updated design workflow, providing evidence that this approach could be applied to any organismal group with available genomic information.

The sinipercids are freshwater fishes endemic to East Asia, mainly in China. Phylogenetic studies on the sinipercids have made great progress in the last decades, but interspecific relationships and evolutionary history of the sinipercids remain unresolved. Lack of distinctive morphological characters leads to problems in validating of some species, such as Siniperca loona. Moreover, genetic data are needed to delimitate species pairs with explicit hypothesis testing, such as in S. chuatsi vs. S. kneri and Coreoperca whiteheadi vs. C. liui. Here we reconstructed phylogeny of the sinipercids with an unprecedented scale of data, 16,943 loci of single-copy coding sequence data from nine sinipercid species, eight putative sister taxa and two outgroups. Targeted sequences were collected using gene enrichment and Illumina sequencing, yielding thousands of protein coding sequences and single nucleotide polymorphisms (SNPs) data. Maximum likelihood and coalescent species tree analyses resulted in identical and highly supported trees. We confirmed that the centrarchids are sister to the sinipercids. A monophyletic Sinipercidae with two genera, Siniperca and Coreoperca was also supported. Different from most previous studies, S. scherzeri was found as the most basal taxon to other species of Siniperca, which consists of two clades: a clade having S. roulei sister to S. chuatsi and S. kneri, and a clade consisting S. loona sister to S. obscura and S. undulata. We found that both S. loona and C. liui are valid species using Bayes factor delimitation (BFD∗) based on SNPs data. Species delimitation also provided decisive support for S. chuatsi and S. kneri being two distinct species. We calibrated a chronogram of the sinipercids based on 100 loci and three fossil calibration points using BEAST, and reconstructed ancestral ranges of the sinipercids using Lagrange Analysis (DEC model) and Statistical Dispersal-Vicariance Analysis (S-DIVA) implemented in RASP. Divergence time estimates and ancestral habitat reconstruction suggested a wide-ranging distribution of the common ancestor of the sinipercids in southern China at 53.1 million years ago (CI: 30.4–85.8Ma). The calibrated time tree is consistent with historical climate changes and geological events that might have shaped the current distribution of the sinipercids.

While hybridization has recently received a resurgence of attention from systematists and evolutionary biologists, there remains a dearth of case studies on ancient, diversified hybrid lineages—clades of organisms that originated through reticulation. Studies on these groups are valuable in that they would speak to the long-term phylogenetic success of lineages following gene flow between species. We present a phylogenomic view of Heuchera, long known for frequent hybridization, incorporating all three independent genomes: targeted nuclear (~400,000 bp), plastid (~160,000 bp), and mitochondrial (~470,000 bp) data. We analyze these data using multiple concatenation and coalescence strategies. The nuclear phylogeny is consistent with previous work and with morphology, confidently suggesting a monophyletic Heuchera. By contrast, analyses of both organellar genomes recover a grossly polyphyletic Heuchera,consisting of three primary clades with relationships extensively rearranged within these as well. A minority of nuclear loci also exhibit phylogenetic discord; yet these topologies remarkably never resemble the pattern of organellar loci and largely present low levels of discord inter alia. Two independent estimates of the coalescent branch length of the ancestor of Heuchera using nuclear data suggest rare or nonexistent incomplete lineage sorting with related clades, inconsistent with the observed gross polyphyly of organellar genomes (confirmed by simulation of gene trees under the coalescent). These observations, in combination with previous work, strongly suggest hybridization as the cause of this phylogenetic discord.

High-throughput sequencing has dramatically fostered ancient DNA research in recent years. Shotgun sequencing, however, does not necessarily appear as the best-suited approach due to the extensive contamination of samples with exogenous environmental microbial DNA. DNA capture-enrichment methods represent cost-effective alternatives that increase the sequencing focus on the endogenous fraction, whether it is from mitochondrial or nuclear genomes, or parts thereof. Here, we explored experimental parameters that could impact the efficacy of MYbaits in-solution capture assays of ~5000 nuclear loci or the whole genome. We found that varying quantities of the starting probes had only moderate effect on capture outcomes. Starting DNA, probe tiling, the hybridization temperature and the proportion of endogenous DNA all affected the assay, however. Additionally, probe features such as their GC content, number of CpG dinucleotides, sequence complexity and entropy and self-annealing properties need to be carefully addressed during the design stage of the capture assay. The experimental conditions and probe molecular features identified in this study will improve the recovery of genetic information extracted from degraded and ancient remains.

Archived specimens are highly valuable sources of DNA for retrospective genetic/genomic analysis. However, often limited effort has been made to evaluate and optimize extraction methods, which may be crucial for downstream applications. Here, we assessed and optimized the usefulness of abundant archived skeletal material from sharks as a source of DNA for temporal genomic studies. Six different methods for DNA extraction, encompassing two different commercial kits and three different protocols, were applied to material, so-called bio-swarf, from contemporary and archived jaws and vertebrae of tiger sharks (Galeocerdo cuvier). Protocols were compared for DNA yield and quality using a qPCR approach. For jaw swarf, all methods provided relatively high DNA yield and quality, while large differences in yield between protocols were observed for vertebrae. Similar results were obtained from samples of white shark (Carcharodon carcharias). Application of the optimized methods to 38 museum and private angler trophy specimens dating back to 1912 yielded sufficient DNA for downstream genomic analysis for 68% of the samples. No clear relationships between age of samples, DNA quality and quantity were observed, likely reflecting different preparation and storage methods for the trophies. Trial sequencing of DNA capture genomic libraries using 20 000 baits revealed that a significant proportion of captured sequences were derived from tiger sharks. This study demonstrates that archived shark jaws and vertebrae are potential high-yield sources of DNA for genomic-scale analysis. It also highlights that even for similar tissue types, a careful evaluation of extraction protocols can vastly improve DNA yield.

Single nucleotide polymorphisms (SNPs) are replacing microsatellites for population genetic analyses, but it is not apparent how many SNPs are needed or how well SNPs correlate with microsatellites. We used data from the gopher tortoise, Gopherus polyphemus—a species with small populations, to compare SNPs and microsatellites to estimate population genetic parameters. Specifically, we compared one SNP data set (16 tortoises from four populations sequenced at 17 901 SNPs) to two microsatellite data sets, a full data set of 101 tortoises and a partial data set of 16 tortoises previously genotyped at 10 microsatellites. For the full microsatellite data set, observed heterozygosity, expected heterozygosity and FST were correlated between SNPs and microsatellites; however, allelic richness was not. The same was true for the partial microsatellite data set, except that allelic richness, but not observed heterozygosity, was correlated. The number of clusters estimated by structure differed for each data set (SNPs = 2; partial microsatellite = 3; full microsatellite = 4). Principle component analyses (PCA) showed four clusters for all data sets. More than 800 SNPs were needed to correlate with allelic richness, observed heterozygosity and expected heterozygosity, but only 100 were needed for FST. The number of SNPs typically obtained from next-generation sequencing (NGS) far exceeds the number needed to correlate with microsatellite parameter estimates. Our study illustrates that diversity, FST and PCA results from microsatellites can mirror those obtained with SNPs. These results may be generally applicable to small populations, a defining feature of endangered and threatened species, because theory predicts that genetic drift will tend to outweigh selection in small populations.

Growing evidence supports the idea that species can diverge in the presence of gene flow. However, most methods of phylogeny estimation do not consider this process, despite the fact that ignoring gene flow is known to bias phylogenetic inference. Furthermore, studies that do consider divergence-with-gene-flow typically do so by estimating rates of gene flow using a isolation-with-migration model (IM), rather than evaluating scenarios of gene flow (such as divergence-with-gene flow or secondary contact) that represent very different types of diversification. In this investigation, we aim to infer the recent phylogenetic history of a clade of western long-eared bats while evaluating a number of different models that parameterize gene flow in a variety of ways. We utilize PHRAPL, a new tool for phylogeographic model selection, to compare the fit of a broad set of demographic models that include divergence, migration, or both among Myotis evotis, $$M$$. thysanodes and M. keenii. A genomic data set consisting of 808 loci of ultraconserved elements was used to explore such models in three steps using an incremental design where each successive set was informed by, and thus more focused than, the previous set of models. Specifically, the three steps were to (i) assess whether gene flow should be modeled and identify the best topologies, (ii) infer directionality of migration using the best topologies, and (iii) estimate the timing of gene flow. The best model (AIC model weight $${sim}0.98$$) included two divergence events (($$M$$. evotis, $$M$$. thysanodes), M. keenii) accompanied by gene flow at the initial stages of divergence. These results provide a striking example of speciation-with-gene-flow in an evolutionary lineage.

Recent genomic studies of both ancient and modern indigenous people of the Americas have shed light on the demographic processes involved during the first peopling. The Pacific Northwest Coast proves an intriguing focus for these studies because of its association with coastal migration models and genetic ancestral patterns that are difficult to reconcile with modern DNA alone. Here, we report the low-coverage genome sequence of an ancient individual known as “Shuká Káa” (“Man Ahead of Us”) recovered from the On Your Knees Cave (OYKC) in southeastern Alaska (archaeological site 49-PET-408). The human remains date to ∼10,300 calendar (cal) y B.P. We also analyze low-coverage genomes of three more recent individuals from the nearby coast of British Columbia dating from ∼6,075 to 1,750 cal y B.P. From the resulting time series of genetic data, we show that the Pacific Northwest Coast exhibits genetic continuity for at least the past 10,300 cal y B.P. We also infer that population structure existed in the late Pleistocene of North America with Shuká Káa on a different ancestral line compared with other North American individuals from the late Pleistocene or early Holocene (i.e., Anzick-1 and Kennewick Man). Despite regional shifts in mtDNA haplogroups, we conclude from individuals sampled through time that people of the northern Northwest Coast belong to an early genetic lineage that may stem from a late Pleistocene coastal migration into the Americas.

Genetic diversity within and among populations lies at the heart of evolution. Unraveling the extent to which each intrinsic or extrinsic factor determines levels of diversity among genes, populations, and species is challenging, given the difficulty of isolating any single potentially important variable from all others. Allopolyploid species provide an opportunity to disentangle external and intrinsic factors, as the two (or more) homoeologous genomes co-occur in the same nucleus, often exhibiting high collinearity along homoeologous chromosomes. Here we evaluate the pace of molecular evolution and intraspecific, intragenomic diversity in two species of allopolyploid Gossypium, G. hirsutum and G. barbadense, using several hundred genes sequenced from multiple accessions of each species. Genic diversity in both species is low, having been influenced both by the polyploid bottleneck and a domestication bottleneck (for cultivated accessions), but with a directional bias in homoeolog diversity favoring the same genome in both allopolyploids. Total diversity is remarkably similar for the two homoeologous genomes overall, but the two copies of many gene pairs have accumulated statistically different diversity levels, and in a biased fashion with respect to genome. Domesticated accessions show reduced diversity in both genomes, as expected, but with a much greater reduction in one of the two homoeologous genomes. Furthermore, this biased reduction affects opposite homoeologous genomes in the two species. Interspecific introgression has played a role in shaping diversity within each species. Introgression was only detected for certain accessions, and only from G. barbadense into G. hirsutum in one of the two co-resident genomes.

Quick Links