Target sequence capture has emerged as a powerful method to sequence hundreds or thousands of genomic regions in a cost- and time-efficient approach. In most cases, however, targeted regions lack full sequence information for certain samples, due to taxonomic, laboratory, or stochastic factors. Loci lacking molecular data for a large number of samples are commonly excluded from downstream analyses, even though they may still contain valuable information. On the other hand, including data-poor loci may bias phylogenetic analyses. Here we use a target sequence capture dataset of an ecologically and taxonomically diverse group of spiny sunflowers (Asteraceae, or Compositae: Barnadesioideae) to test how the inclusion or exclusion of such data-poor loci affects phylogenetic inference. We investigate the sensitivity of concatenation and coalescent approaches to missing data with matrices of varying taxonomic completeness by filtering loci with different proportions of missing samples prior to data analysis. We find that missing data affect both the topology and branch support of the resulting phylogenies. The matrix containing all loci yielded the overall highest node support values, independently of the amount of missing nucleotides. These results provide empirical support to earlier suggestions based on single genes and data simulations that taxa with high amounts of missing data should not be readily dismissed as they can provide essential information for phylogenomic reconstruction.

Lagomorpha (lagomorphs), the order of mammals including pikas, hares, and rabbits, is distributed on all continents. The order currently is hypothesized to comprise 12 genera and 108 species, split into two families: Ochotonidae (pikas) and Leporidae (rabbits and hares). Molecular and morphological attempts have been undertaken to resolve the phylogeny of lagomorphs, although chronological relationships are still to be established. The aim of this research was to unravel lagomorph phylogeny using ultraconserved elements. We focused on Romerolagus, in light of its largely unknown phylogenetic relationships and sparse fossil record, to assess times of divergence for the genus. We obtained samples from at least one species in each of 11 genera (except Caprolagus) comprising the order and captured and sequenced ultraconserved elements (UCEs). A Maximum-Likelihood phylogenetic analysis was carried out on the 4,195 loci captured, resulting in 59,112 informative sites. We further used BEAST2 v2.6.3 on the CIPRES computing cluster to estimate the timing of cladogenesis in lagomorph evolution. Our results confirm that lagomorphs and rodents split about 65 million years ago. The former further split into its constituent families, Leporidae and Ochotonidae, about 60 million years ago. Pronolagus rupestris and Nesolagus timminsi were retrieved as basal sister taxa; the most recent common ancestor of that clade and remaining leporids was estimated to have existed about 47 million years ago. Romerolagus diazi is sister to remaining Leporidae excluding Pronolagus and Nesolagus, a topology that generally matches previously published phylogenies, although our results suggest a most recent common ancestor of Romerolagus and remaining ingroup leporids at ca. 4.8 Ma (95% highest posterior density [HPD] interval: 5.9 – 3.8 Ma), with an internal diversification in the Middle to Late Pleistocene (0.9 Ma; 95% HPD 1.8 – 0.2 Ma). Our final results yielded a robust phylogeny with high support values for every clade of the order Lagomorpha and unraveled previously unresolved phylogenetic relationships. In addition, we further conclude that the method we used, UCEs, may serve to complete the entire phylogeny of mammals by using existing museum specimens.

The human pathogen Haemophilus influenzae was the main cause of bacterial meningitis in children and a major cause of worldwide infant mortality before the introduction of a vaccine in the 1980s. Although the occurrence of serotype b (Hib), the most virulent type of H. influenzae, has since decreased, reports of infections with other serotypes and non-typeable strains are on the rise. While non-typeable strains have been studied in-depth, very little is known of the pathogen’s evolutionary history, and no genomes dating prior to 1940 were available.

Emerging variants of concern (VOCs) are driving the COVID-19 pandemic1,2. Experimental assessments of replication and transmission of major VOCs and progenitors are needed to understand the mechanisms of replication and transmission of VOCs3. Here we show that the spike protein (S) from Alpha (also known as B.1.1.7) and Beta (B.1.351) VOCs had a greater affinity towards the human angiotensin-converting enzyme 2 (ACE2) receptor than that of the progenitor variant S(D614G) in vitro. Progenitor variant virus expressing S(D614G) (wt-S614G) and the Alpha variant showed similar replication kinetics in human nasal airway epithelial cultures, whereas the Beta variant was outcompeted by both. In vivo, competition experiments showed a clear fitness advantage of Alpha over wt-S614G in ferrets and two mouse models—the substitutions in S were major drivers of the fitness advantage. In hamsters, which support high viral replication levels, Alpha and wt-S614G showed similar fitness. By contrast, Beta was outcompeted by Alpha and wt-S614G in hamsters and in mice expressing human ACE2. Our study highlights the importance of using multiple models to characterize fitness of VOCs and demonstrates that Alpha is adapted for replication in the upper respiratory tract and shows enhanced transmission in vivo in restrictive models, whereas Beta does not overcome Alpha or wt-S614G in naive animals.