Publications Archives

Hybrid enrichment is an increasingly popular approach for obtaining hundreds of loci for phylogenetic analysis across many taxa quickly and cheaply. The genes targeted for sequencing are typically single-copy loci, which facilitate a more straightforward sequence assembly and homology assignment process. However, this approach limits the inclusion of most genes of functional interest, which often belong to multi-gene families. Here, we demonstrate the feasibility of including large gene families in hybrid enrichment protocols for phylogeny reconstruction and subsequent analyses of molecular evolution, using a new set of bait sequences designed for the “portullugo” (Caryophyllales), a moderately sized lineage of flowering plants (~ 2200 species) that includes the cacti and harbors many evolutionary transitions to C$$_{mathrm{4}}$$ and CAM photosynthesis. Including multi-gene families allowed us to simultaneously infer a robust phylogeny and construct a dense sampling of sequences for a major enzyme of C$$_{mathrm{4}}$$ and CAM photosynthesis, which revealed the accumulation of adaptive amino acid substitutions associated with C$$_{mathrm{4}}$$ and CAM origins in particular paralogs. Our final set of matrices for phylogenetic analyses included 75–218 loci across 74 taxa, with ~ 50% matrix completeness across data sets. Phylogenetic resolution was greatly improved across the tree, at both shallow and deep levels. Concatenation and coalescent-based approaches both resolve the sister lineage of the cacti with strong support: Anacampserotaceae $$+$$ Portulacaceae, two lineages of mostly diminutive succulent herbs of warm, arid regions. In spite of this congruence, BUCKy concordance analyses demonstrated strong and conflicting signals across gene trees. Our results add to the growing number of examples illustrating the complexity of phylogenetic signals in genomic-scale data.

Today, next generation sequencing (NGS) is extensively used in the research setting. However, high costs of NGS testing still prevent its routine use in clinical practice. One of the factors affecting the cost of sequencing is the number of reads per site, i.e. the number of times each nucleotide gets sequenced. On the one hand, lower coverage makes the whole process much faster and less time-consuming. On the other hand, it results in poor data quality. No unanimous opinion has been reached yet as to what minimum depth of coverage can produce reliable results. The aim of this study was to determine the minimum number of reads sufficient for accurate base calling of heterozygous and single nucleotide variants (SNV). Using bioinformatics methods, we demonstrate that accuracy can be achieved at a minimum depth of 12X.

Smallpox holds a unique position in the history of medicine. It was the first disease for which a vaccine was developed and remains the only human disease eradicated by vaccination. Although there have been claims of smallpox in Egypt, India, and China dating back millennia [1–4], the timescale of emergence of the causative agent, variola virus (VARV), and how it evolved in the context of increasingly widespread immunization, have proven controversial [4–9]. In particular, some molecular-clock-based studies have suggested that key events in VARV evolution only occurred during the last two centuries [4–6] and hence in apparent conflict with anecdotal historical reports, although it is difficult to distinguish smallpox from other pustular rashes by description alone. To address these issues, we captured, sequenced, and reconstructed a draft genome of an ancient strain of VARV, sampled from a Lithuanian child mummy dating between 1643 and 1665 and close to the time of several documented European epidemics [1, 2, 10]. When compared to vaccinia virus, this archival strain contained the same pattern of gene degradation as 20th century VARVs, indicating that such loss of gene function had occurred before ca. 1650. Strikingly, the mummy sequence fell basal to all currently sequenced strains of VARV on phylogenetic trees. Molecular-clock analyses revealed a strong clock-like structure and that the timescale of smallpox evolution is more recent than often supposed, with the diversification of major viral lineages only occurring within the 18th and 19th centuries, concomitant with the development of modern vaccination.

The three surviving ‘brush-tailed’ bettong species—Bettongia gaimardi (Tasmania), B. tropica (Queensland) and B. penicillata (Western Australia), are all classified as threatened or endangered. These macropodids are prolific diggers and are recognised as important ‘ecosystem engineers’ that improve soil quality and increase seed germination success. However, a combination of introduced predators, habitat loss and disease has seen populations become increasingly fragmented and census numbers decline. Robust phylogenies are vital to conservation management, but the extent of extirpation and fragmentation in brush-tailed bettongs is such that a phylogeny based upon modern samples alone may provide a misleading picture of former connectivity, genetic diversity and species boundaries. Using ancient DNA isolated from fossil bones and museum skins, we genotyped two mitochondrial DNA (mtDNA) genes: cytochrome b (266 bp) and control region (356 bp). These ancient DNA data were combined with a pre-existing modern DNA data set on the historically broadly distributed brush-tailed bettongs (~300 samples total), to investigate their phylogenetic relationships. Molecular dating estimates the most recent common ancestor of these bettongs occurred c. 2.5 Ma (million years ago), which suggests that increasing aridity likely shaped their modern-day distribution. Analyses of the concatenated mtDNA sequences of all brush-tailed bettongs generated five distinct and well-supported clades including: a highly divergent Nullarbor form (Clade I), B. tropica (Clade II), B. penicillata (Clades III and V), and B. gaimardi (Clade IV). The generated phylogeny does not reflect current taxonomy and the question remains outstanding of whether the brush-tailed bettongs consisted of several species, or a single widespread species. The use of nuclear DNA markers (single nucleotide polymorphisms and/or short tandem repeats) will be needed to better inform decisions about historical connectivity and the appropriateness of ongoing conservation measures such as translocations and captive breeding.

Crassulacean acid metabolism (CAM) is a modified form of photosynthesis that has arisen independently at least 35 times in flowering plants. The occurrence of CAM is often correlated with shifts to arid, semiarid, or epiphytic habits, as well as transitions in leaf morphology (e.g. increased leaf thickness) and anatomy (e.g. increased cell size and packing). We assess shifts between C3 and CAM photosynthesis in the subfamily Agavoideae (Asparagaceae) through phylogenetic analysis of targeted loci captured from the nuclear and chloroplast genomes of over 60 species. Carbon isotope data was used as a proxy for mode of photosynthesis in extant species and ancestral states were estimated on the phylogeny. Ancestral character state mapping suggests three independent origins of CAM in the Agavoideae. CAM species differ from C3 species in climate space and are found to have thicker leaves with densely packed cells. C3 ancestors of CAM species show a predisposition toward CAM-like morphology. Leaf characteristics in the ancestral C3 species may have enabled the repeated evolution of CAM in the Agavoideae subfamily. Anatomical changes, including a tendency toward 3D venation, may have initially arisen in C3 ancestors in response to aridity as a way to increase leaf succulence for water storage.

Chloridoideae (chloridoid grasses) are a subfamily of ca. 1700 species with high diversity in arid habitats. Until now, their evolutionary relationships have primarily been studied with DNA sequences from the chloroplast, a maternally inherited organelle. Next-generation sequencing is able to efficiently recover large numbers of nuclear loci that can then be used to estimate the species phylogeny based upon bi-parentally inherited data. We sought to test our chloroplast-based hypotheses of relationships among chloridoid species with 122 nuclear loci generated through targeted-enrichment next-generation sequencing, sometimes referred to as hyb-seq. We targeted putative single-copy housekeeping genes, as well as genes that have been implicated in traits characteristic of, or particularly labile in, chloridoids: e.g., drought and salt tolerance. We recovered ca. 70% of the targeted loci (122 of 177 loci) in all 47 species sequenced using hyb-seq. We then analyzed the nuclear loci with Bayesian and coalescent methods and the resulting phylogeny resolves relationships between the four chloridoid tribes. Several novel findings with this data were: the sister lineage to Chloridoideae is unresolved; Centropodia + Ellisochloa are excluded from Chloridoideae in phylogenetic estimates using a coalescent model; Sporobolus subtilis is more closely related to Eragrostis than to other species of Sporobolus; and Tragus is more closely related to Chloris and relatives than to a lineage of mainly New World species. Relationships in Cynodonteae in the nuclear phylogeny are quite different from chloroplast estimates, but were not robust to changes in the method of phylogenetic analysis. We tested the data signal with several partition schemes, a concatenation analysis, and tests of alternative hypotheses to assess our confidence in this new, nuclear estimate of evolutionary relationships. Our work provides markers and a framework for additional phylogenetic studies that sample more densely within chloridoid tribes. These results represent progress towards a robust classification of this important subfamily of grasses, as well as proof-of-concept for hyb-seq next-generation sequencing as a method to generate sequences for phylogenetic analyses in grasses and other plant families.

Article

Host–pathogen interactions may result in either directional selection or in pressure for the maintenance of polymorphism at the molecular level. Hence signatures of both positive and balancing selection are expected in immune genes. Because both overall selective pressure and specific targets may differ between species, large-scale population genomic studies are useful in detecting functionally important immune genes and comparing selective landscapes between taxa. Such studies are of particular interest in amphibians, a group threatened worldwide by emerging infectious diseases. Here, we present an analysis of polymorphism and divergence of 634 immune genes in two lineages of Lissotriton newts: L. montandoni and L. vulgaris graecus. Variation in newt immune genes has been shaped predominantly by widespread purifying selection and strong evolutionary constraint, implying long-term importance of these genes for functioning of the immune system. The two evolutionary lineages differ in the overall strength of purifying selection which can partially be explained by demographic history but may also signal differences in long-term pathogen pressure. The prevalent constraint notwithstanding, 23 putative targets of positive selection and 11 putative targets of balancing selection were identified. The latter were detected by composite tests involving the demographic model and further validated in independent population samples. Putative targets of balancing selection encode proteins which may interact closely with pathogens but include also regulators of immune response. The identified candidates will be useful for testing whether genes affected by balancing selection are more prone to interspecific introgression than other genes in the genome.

Herbaria are unparalleled collections of biodiversity information representing the world’s flora. However, this treasure has remained largely inaccessible to genetic studies, frequently limited by the low yields of poor-quality DNA. Next-generation sequencing (NGS) has transformed every field of biological research. The different strategies for accessing genetic data using NGS are changing the direction of biodiversity research—we are no longer constrained by a relatively small number of markers for non-model organisms, by time and cost limited sample sizes, or by incomplete datasets due to recalcitrant DNA extractions or PCR amplification failure. Here we show that targeted enrichment through hybrid capture can be used to generate hundreds of kilobases of nuclear sequence data of the Neotropical genus Inga, from herbarium specimens as old as 180 years and using as little as 16 ng of degraded DNA.

Quick Links