Combining the flexible hybridization power of in-solution target capture with an expertly selected set of orthologous locus sequences, this new probe set has been demonstrated to enrich hundreds of putatively single-copy protein-coding genes across a broad range of angiosperms (flowering plants). Probes were designed from 353 loci, each with 5-15 representative sequences from across all angiosperms which were selected using a novel “k-medoids clustering approach” to maximize taxonomic breadth of the design (Johnson et al 2018, bioRxiv).
– baitset is broadly applicable for phylogenetic research across all flowering plants, and is available from Arbor Biosciences as an in-stock catalog kit available for immediate shipment at a low per-reaction cost. As with all myBaits kits, the Angiosperms353 panel is provided as a complete solution target capture kit, including buffers, blockers, and baits, along with an easy-to-use protocol. Or if you would prefer to outsource the work, our myReads NGS service expert scientists are available to perform library preparation, target capture, and sequencing for your entire project.
M Johnson, L Pokorny, S Dodsworth, LR Botigue, RS Cowan, A Devault, WL Eiserhardt, N Epitawalage, F Forest, JT Kim, JHH Leebens-Mack, IJ Leitch, O Maurin, DE Soltis, PS Soltis, GK Wong, WJ Baker, N Wickett. (2018). “A Universal Probe Set for Targeted Sequencing of 353 Nuclear Genes from Any Flowering Plant Designed Using k-medoids Clustering“. bioRxiv 361618; doi: https://doi.org/10.1101/361618
Fig 1. “Heatmap of Gene Recovery Efficiency. Each row is one sample, and each column is one gene. Colors indicate the percentage of the target length (calculated by the mean length of all k-medoid transcripts for each gene) recovered.” (pg 19, Johnson et al 2018, bioRxiv). Image is Figure 3 from Johnson et al 2018, bioRxiv.
Fig 2. “Total Length of Sequence Recovery for Both Coding and Non-coding Regions Across 353 Loci for 42 Angiosperm Species. Reads were mapped back to either coding sequence (yellow) or coding sequence plus flanking non-coding (i.e. intron) sequence (purple)… The total length of coding sequence targeted was 260,802 bp. The median recovery of coding sequence was 137,046 bp and the median amount of non-coding sequence recovered was 216,816 bp (with at least 8x depth of coverage).” (pg 21, Johnson et al 2018, bioRxiv). Image is modified version of Figure 4 from Johnson et al 2018, bioRxiv.