Gene-editing technologies, including the widespread usage of CRISPR endonucleases, have the potential for clinical treatments of various human diseases. Due to the rapid mutations of SARS-CoV-2, specific and effective prevention and treatment by CRISPR toolkits for coronavirus disease 2019 (COVID-19) are urgently needed to control the current pandemic spread. Here, we designed Type III CRISPR endonuclease antivirals for coronaviruses (TEAR-CoV) as a therapeutic to combat SARS-CoV-2 infection. We provided a proof of principle demonstration that TEAR-CoV-based RNA engineering approach leads to RNA-guided transcript degradation both in vitro and in eukaryotic cells, which could be used to broadly target RNA viruses. We report that TEAR-CoV not only cleaves SARS-CoV-2 genome and mRNA transcripts, but also degrades live influenza A virus (IAV), impeding viral replication in cells and in mice. Moreover, bioinformatics screening of gRNAs along RNA sequences reveals that a group of five gRNAs (hCoV-gRNAs) could potentially target 99.98% of human coronaviruses. TEAR-CoV also exerted specific targeting and cleavage of common human coronaviruses. The fast design and broad targeting of TEAR-CoV may represent a versatile antiviral approach for SARS-CoV-2 or potentially other emerging human coronaviruses.

Phormium tenax (harakeke, New Zealand flax: Asphodelaceae) has long been considered indigenous to New Zealand (including the Chatham Islands) and Norfolk Island. However, the indigeneity of P. tenax on Norfolk Island, in particular, has been challenged by an alternative hypothesis that it was introduced by East Polynesians prior to European colonisation. We tested this alternative hypothesis using a dated phylogenetic tree. We also tested whether dated phylogenetic trees of Phormium could be reconciled with vicariance explanations of its distribution. We examined near-complete plastid genome sequences of P. tenax, P. cookianumand related plants. We then undertook Bayesian and likelihood estimation of the age of the divergence between Norfolk Island and New Zealand accessions using fossil calibration, and separately using biogeographic calibration assuming vicariance explanations of trans-oceanic distributions. DNA sequences of Norfolk Island plants were invariant and nested well within the wider diversity of P. tenax. Estimates of divergence times using fossil calibration did not exclude a common ancestor as recent as the second millennium CE. DNA sequences of Chatham Islands P. tenax were also nested within the diversity of P. tenax from New Zealand, but age estimates were older for their divergence from New Zealand plants (around 20,000–400,000 years). Biogeographic calibrations resulted in extremely ancient ages (tens of billions of years) of deeper nodes within the tree or several orders of magnitude variation in rates among lineages. Our results are consistent with translocation of harakeke by East Polynesian people, but our analyses cannot exclude a Late Quaternary natural dispersal event, which might result in similar genetic patterns. Biogeographic calibrations based on the break-up of Gondwana imply major departures from contemporary ideas of Earth’s history, or orders of magnitude rate variation among lineages.

Abstract The tree of life is the fundamental biological roadmap for navigating the evolution and properties of life on Earth, and yet remains largely unknown. Even angiosperms (flowering plants) are fraught with data gaps, despite their critical role in sustaining terrestrial life. Today, high-throughput sequencing promises to significantly deepen our understanding of evolutionary relationships. Here, we describe a comprehensive phylogenomic platform for exploring the angiosperm tree of life, comprising a set of open tools and data based on the 353 nuclear genes targeted by the universal Angiosperms353 sequence capture probes. The primary goals of this article are to (i) document our methods, (ii) describe our first data release, and (iii) present a novel open data portal, the Kew Tree of Life Explorer (https://treeoflife.kew.org). We aim to generate novel target sequence capture data for all genera of flowering plants, exploiting natural history collections such as herbarium specimens, and augment it with mined public data. Our first data release, described here, is the most extensive nuclear phylogenomic data set for angiosperms to date, comprising 3099 samples validated by DNA barcode and phylogenetic tests, representing all 64 orders, 404 families (96$%$) and 2333 genera (17$%$). A “first pass” angiosperm tree of life was inferred from the data, which totaled 824,878 sequences, 489,086,049 base pairs, and 532,260 alignment columns, for interactive presentation in the Kew Tree of Life Explorer. This species tree was generated using methods that were rigorous, yet tractable at our scale of operation. Despite limitations pertaining to taxon and gene sampling, gene recovery, models of sequence evolution and paralogy, the tree strongly supports existing taxonomy, while challenging numerous hypothesized relationships among orders and placing many genera for the first time. The validated data set, species tree and all intermediates are openly accessible via the Kew Tree of Life Explorer and will be updated as further data become available. This major milestone toward a complete tree of life for all flowering plant species opens doors to a highly integrated future for angiosperm phylogenomics through the systematic sequencing of standardized nuclear markers. Our approach has the potential to serve as a much-needed bridge between the growing movement to sequence the genomes of all life on Earth and the vast phylogenomic potential of the world’s natural history collections. [Angiosperms; Angiosperms353; genomics; herbariomics; museomics; nuclear phylogenomics; open access; target sequence capture; tree of life.]

Abstract Target enrichment (such as Hyb-Seq) is a well-established high throughput sequencing method that has been increasingly used for phylogenomic studies. Unfortunately, current widely used pipelines for analysis of target enrichment data do not have a vigorous procedure to remove paralogs in target enrichment data. In this study, we develop a pipeline we call Putative Paralogs Detection (PPD) to better address putative paralogs from enrichment data. The new pipeline is an add-on to the existing HybPiper pipeline, and the entire pipeline applies criteria in both sequence similarity and heterozygous sites at each locus in the identification of paralogs. Users may adjust the thresholds of sequence identity and heterozygous sites to identify and remove paralogs according to the level of phylogenetic divergence of their group of interest. The new pipeline also removes highly polymorphic sites attributed to errors in sequence assembly and gappy regions in the alignment. We demonstrated the value of the new pipeline using empirical data generated from Hyb-Seq and the Angiosperms353 kit for two woody genera Castanea (Fagaceae, Fagales) and Hamamelis (Hamamelidaceae, Saxifragales). Comparisons of data sets showed that the PPD identified many more putative paralogs than the popular method HybPiper. Comparisons of tree topologies and divergence times showed evident differences between data from HybPiper and data from our new PPD pipeline. We further evaluated the accuracy and error rates of PPD by BLAST mapping of putative paralogous and orthologous sequences to a reference genome sequence of Castanea mollissima. Compared to HybPiper alone, PPD identified substantially more paralogous gene sequences that mapped to multiple regions of the reference genome (31 genes for PPD compared with 4 genes for HybPiper alone). In conjunction with HybPiper, paralogous genes identified by both pipelines can be removed resulting in the construction of more robust orthologous gene data sets for phylogenomic and divergence time analyses. Our study demonstrates the value of Hyb-Seq with data derived from the Angiosperms353 probe set for elucidating species relationships within a genus, and argues for the importance of additional steps to filter paralogous genes and poorly aligned regions (e.g., as occur through assembly errors), such as our new PPD pipeline described in this study. [Angiosperms353; Castanea; divergence time; Hamamelis; Hyb-Seq, paralogs, phylogenomics.]

Early stages of speciation in plants might involve genetic incompatibilities between plastid and nuclear genomes, leading to inter-lineage hybrid breakdown due to the disruption between co-adapted plastid and nuclear genes encoding subunits of the same plastid protein complexes. We tested this hypothesis in Silene nutans, a gynodioecious Caryophyllaceae, where four distinct genetic lineages exhibited strong reproductive isolation among each other, resulting in chlorotic or variegated hybrids. By sequencing the whole gene content of the four plastomes through gene capture, and a large part of the nuclear genes encoding plastid subunits from RNAseq data, we searched for non-synonymous substitutions fixed in each lineage on both genomes. Lineages of S. nutans exhibited a high level of dN/dS ratios for plastid and nuclear genes encoding most plastid complexes, with a strong pattern of coevolution for genes encoding the subunits of ribosome and cytochrome b6/f that could explain the chlorosis of hybrids. Overall, relaxation of selection due to past bottlenecks and positive selection have driven the diversity pattern observed in S. nutans plastid complexes, leading to plastid-nuclear incompatibilities. We discuss the possible role of gynodioecy in the evolutionary dynamics of the plastomes through linked selection.

Target sequence capture has emerged as a powerful method to sequence hundreds or thousands of genomic regions in a cost- and time-efficient approach. In most cases, however, targeted regions lack full sequence information for certain samples, due to taxonomic, laboratory, or stochastic factors. Loci lacking molecular data for a large number of samples are commonly excluded from downstream analyses, even though they may still contain valuable information. On the other hand, including data-poor loci may bias phylogenetic analyses. Here we use a target sequence capture dataset of an ecologically and taxonomically diverse group of spiny sunflowers (Asteraceae, or Compositae: Barnadesioideae) to test how the inclusion or exclusion of such data-poor loci affects phylogenetic inference. We investigate the sensitivity of concatenation and coalescent approaches to missing data with matrices of varying taxonomic completeness by filtering loci with different proportions of missing samples prior to data analysis. We find that missing data affect both the topology and branch support of the resulting phylogenies. The matrix containing all loci yielded the overall highest node support values, independently of the amount of missing nucleotides. These results provide empirical support to earlier suggestions based on single genes and data simulations that taxa with high amounts of missing data should not be readily dismissed as they can provide essential information for phylogenomic reconstruction.

Lagomorpha (lagomorphs), the order of mammals including pikas, hares, and rabbits, is distributed on all continents. The order currently is hypothesized to comprise 12 genera and 108 species, split into two families: Ochotonidae (pikas) and Leporidae (rabbits and hares). Molecular and morphological attempts have been undertaken to resolve the phylogeny of lagomorphs, although chronological relationships are still to be established. The aim of this research was to unravel lagomorph phylogeny using ultraconserved elements. We focused on Romerolagus, in light of its largely unknown phylogenetic relationships and sparse fossil record, to assess times of divergence for the genus. We obtained samples from at least one species in each of 11 genera (except Caprolagus) comprising the order and captured and sequenced ultraconserved elements (UCEs). A Maximum-Likelihood phylogenetic analysis was carried out on the 4,195 loci captured, resulting in 59,112 informative sites. We further used BEAST2 v2.6.3 on the CIPRES computing cluster to estimate the timing of cladogenesis in lagomorph evolution. Our results confirm that lagomorphs and rodents split about 65 million years ago. The former further split into its constituent families, Leporidae and Ochotonidae, about 60 million years ago. Pronolagus rupestris and Nesolagus timminsi were retrieved as basal sister taxa; the most recent common ancestor of that clade and remaining leporids was estimated to have existed about 47 million years ago. Romerolagus diazi is sister to remaining Leporidae excluding Pronolagus and Nesolagus, a topology that generally matches previously published phylogenies, although our results suggest a most recent common ancestor of Romerolagus and remaining ingroup leporids at ca. 4.8 Ma (95% highest posterior density [HPD] interval: 5.9 – 3.8 Ma), with an internal diversification in the Middle to Late Pleistocene (0.9 Ma; 95% HPD 1.8 – 0.2 Ma). Our final results yielded a robust phylogeny with high support values for every clade of the order Lagomorpha and unraveled previously unresolved phylogenetic relationships. In addition, we further conclude that the method we used, UCEs, may serve to complete the entire phylogeny of mammals by using existing museum specimens.

The human pathogen Haemophilus influenzae was the main cause of bacterial meningitis in children and a major cause of worldwide infant mortality before the introduction of a vaccine in the 1980s. Although the occurrence of serotype b (Hib), the most virulent type of H. influenzae, has since decreased, reports of infections with other serotypes and non-typeable strains are on the rise. While non-typeable strains have been studied in-depth, very little is known of the pathogen’s evolutionary history, and no genomes dating prior to 1940 were available.