|2020-06-30||A Cancer Biologist's Primer on Machine Learning Applications in High-Dimensional Cytometry||Keyes TJ, Domizi P, Lo YC, Nolan GP, Davis KL||TMC-Stanford||The application of machine learning and artificial intelligence to high-dimensional cytometry data sets has increasingly become a staple of bioinformatic data analysis over the past decade. This is especially true in the field of cancer biology, where protocols for collecting multiparameter single-cell data in a high-throughput fashion are rapidly developed. As the use of machine learning methodology in cytometry becomes increasingly common, there is a need for cancer biologists to understand the basic theory and applications of a variety of algorithmic tools for analyzing and interpreting cytometry data. We introduce the reader to several keystone machine learning-based analytic approaches with an emphasis on defining key terms and introducing a conceptual framework for making translational or clinically relevant discoveries. The target audience consists of cancer cell biologists and physician-scientists interested in applying these tools to their own data, but who may have limited training in bioinformatics. © 2020 International Society for Advancement of Cytometry.|
|2020-06-16||Single-cell Lineage Tracing by Integrating CRISPR-Cas9 Mutations With Transcriptomic Data||Zafar H, Lin C, Bar-Joseph Z.||HIVE TC-CMU||Recent studies combine two novel technologies, single-cell RNA-sequencing and CRISPR-Cas9 barcode editing for elucidating developmental lineages at the whole organism level. While these studies provided several insights, they face several computational challenges. First, lineages are reconstructed based on noisy and often saturated random mutation data. Additionally, due to the randomness of the mutations, lineages from multiple experiments cannot be combined to reconstruct a species-invariant lineage tree. To address these issues we developed a statistical method, LinTIMaT, which reconstructs cell lineages using a maximum-likelihood framework by integrating mutation and expression data. Our analysis shows that expression data helps resolve the ambiguities arising in when lineages are inferred based on mutations alone, while also enabling the integration of different individual lineages for the reconstruction of an invariant lineage tree. LinTIMaT lineages have better cell type coherence, improve the functional significance of gene sets and provide new insights on progenitors and differentiation pathways.|
|2020-05-14||Use of Single Cell -omic Technologies to Study the Gastrointestinal Tract and Diseases, From Single Cell Identities to Patient Features.||Islam M, Chen B, Spraggins JM, Kelly RT, Lau KS.||TMC-Vanderbilt||Single cells are the building blocks of tissue systems that determine organ phenotypes,
behaviors, and function. Understanding the differences between cell types and their
activities might provide us with insights into normal tissue functions, development of
disease, and new therapeutic strategies. Although -omic level single cell technologies
are a relatively recent development that been used only in laboratory studies, these
approaches might eventually be used in the clinic. We review the prospects of applying
single cell genome, transcriptome, epigenome, proteome, and metabolome analyses to
gastroenterology and hepatology research. Combining data from multi-omic platforms
and rapid technological developments could lead to new diagnostic, prognostic, and
|2020-04-25||GiniClust3: a fast and memory-efficient tool for rare cell type identification.||Dong R, Yuan GC.||TTD-Cal Tech||BACKGROUND:
With the rapid development of single-cell RNA sequencing technology, it is possible to dissect cell-type composition at high resolution. A number of methods have been developed with the purpose to identify rare cell types. However, existing methods are still not scalable to large datasets, limiting their utility. To overcome this limitation, we present a new software package, called GiniClust3, which is an extension of GiniClust2 and significantly faster and memory-efficient than previous versions.
Using GiniClust3, it only takes about 7 h to identify both common and rare cell clusters from a dataset that contains more than one million cells. Cell type mapping and perturbation analyses show that GiniClust3 could robustly identify cell clusters.
Taken together, these results suggest that GiniClust3 is a powerful tool to identify both common and rare cell population and can handle large dataset. GiniCluster3 is implemented in the open-source python package and available at https://github.com/rdong08/GiniClust3.
|2020-04-02||Reconstructed Single-Cell Fate Trajectories Define Lineage Plasticity Windows during Differentiation of Human PSC-Derived Distal Lung Progenitors.||Hurley K, Ding J, Villacorta-Martin C, Herriges MJ, Jacob A, Vedaie M, Alysandratos KD, Sun YL, Lin C, Werder RB, Huang J, Wilson AA, Mithal A, Mostoslavsky G, Oglesby I, Caballero IS, Guttentag SH, Ahangari F, Kaminski N, Rodriguez-Fraticelli A, Camargo F, Bar-Joseph Z, Kotton DN.||HIVE TC-CMU||Alveolar epithelial type 2 cells (AEC2s) are the facultative progenitors responsible for maintaining lung alveoli throughout life but are difficult to isolate from patients. Here, we engineer AEC2s from human pluripotent stem cells (PSCs) in vitro and use time-series single-cell RNA sequencing with lentiviral barcoding to profile the kinetics of their differentiation in comparison to primary fetal and adult AEC2 benchmarks. We observe bifurcating cell-fate trajectories as primordial lung progenitors differentiate in vitro, with some progeny reaching their AEC2 fate target, while others diverge to alternative non-lung endodermal fates. We develop a Continuous State Hidden Markov model to identify the timing and type of signals, such as overexuberant Wnt responses, that induce some early multipotent NKX2-1+ progenitors to lose lung fate. Finally, we find that this initial developmental plasticity is regulatable and subsides over time, ultimately resulting in PSC-derived AEC2s that exhibit a stable phenotype and nearly limitless self-renewal capacity.|
|2020-04||Integrated molecular imaging technologies for investigation of metals in biological systems: A brief review||Perry WJ, Weiss A, Van de Plas R, Spraggins JM, Caprioli RM, Skaar EP.||TMC-Vanderbilt||Metals play an essential role in biological systems and are required as structural or catalytic co-factors in many proteins. Disruption of the homeostatic control and/or spatial distributions of metals can lead to disease. Imaging technologies have been developed to visualize elemental distributions across a biological sample. Measurement of elemental distributions by imaging mass spectrometry and imaging X-ray fluorescence are increasingly employed with technologies that can assess histological features and molecular compositions. Data from several modalities can be interrogated as multimodal images to correlate morphological, elemental, and molecular properties. Elemental and molecular distributions have also been axially resolved to achieve three-dimensional volumes, dramatically increasing the biological information. In this review, we provide an overview of recent developments in the field of metal imaging with an emphasis on multimodal studies in two and three dimensions. We specifically highlight studies that present technological advancements and biological applications of how metal homeostasis affects human health.|
|2020-03-13||Considerations for Using the Vasculature as a Coordinate System to Map All the Cells in the Human Body||Weber, GM, Ju, Y, Börner K.||HIVE MC-IU||Several ongoing international efforts are developing methods of localizing single cells
within organs or mapping the entire human body at the single cell level, including
the Chan Zuckerberg Initiative’s Human Cell Atlas (HCA), and the Knut and Allice
Wallenberg Foundation’s Human Protein Atlas (HPA), and the National Institutes of
Health’s Human BioMolecular Atlas Program (HuBMAP). Their goals are to understand
cell specialization, interactions, spatial organization in their natural context, and ultimately
the function of every cell within the body. In the same way that the Human Genome
Project had to assemble sequence data from different people to construct a complete
sequence, multiple centers around the world are collecting tissue specimens from diverse
populations that vary in age, race, sex, and body size. A challenge will be combining
these heterogeneous tissue samples into a 3D reference map that will enable multiscale,
multidimensional Google Maps-like exploration of the human body. Key to making
alignment of tissue samples work is identifying and using a coordinate system called
a Common Coordinate Framework (CCF), which defines the positions, or “addresses,”
in a reference body, from whole organs down to functional tissue units and individual
cells. In this perspective, we examine the concept of a CCF based on the vasculature and describe why it would be an attractive choice for mapping the human body.
|2020-03-11||Multiplexed single-cell morphometry for hematopathology diagnostics||Tsai AG, Glass DR, Juntilla M, Hartmann FJ, Oak JS, Fernandez-Pol S, Ohgami RS, Bendall SC.||RTI-Stanford||The diagnosis of lymphomas and leukemias requires hematopathologists to integrate microscopically visible cellular morphology with antibody-identified cell surface molecule expression. To merge these into one high-throughput, highly multiplexed, single-cell assay, we quantify cell morphological features by their underlying, antibody-measurable molecular components, which empowers mass cytometers to ‘see’ like pathologists. When applied to 71 diverse clinical samples, single-cell morphometric profiling reveals robust and distinct patterns of ‘morphometric’ markers for each major cell type. Individually, lamin B1 highlights acute leukemias, lamin A/C helps distinguish normal from neoplastic mature T cells, and VAMP-7 recapitulates light-cytometric side scatter. Combined with machine learning, morphometric markers form intuitive visualizations of normal and neoplastic cellular distribution and differentiation. When recalibrated for myelomonocytic blast enumeration, this approach is superior to flow cytometry and comparable to expert microscopy, bypassing years of specialized training. The contextualization of traditional surface markers on independent morphometric frameworks permits more sensitive and automated diagnosis of complex hematopoietic diseases.|
|2020-02-18||Inferring TF activation order in time series scRNA-Seq studies.||Lin C, Ding J, Bar-Joseph Z.||HIVE TC-CMU||Methods for the analysis of time series single cell expression data (scRNA-Seq) either do not utilize information about transcription factors (TFs) and their targets or only study these as a post-processing step. Using such information can both, improve the accuracy of the reconstructed model and cell assignments, while at the same time provide information on how and when the process is regulated. We developed the Continuous-State Hidden Markov Models TF (CSHMM-TF) method which integrates probabilistic modeling of scRNA-Seq data with the ability to assign TFs to specific activation points in the model. TFs are assumed to influence the emission probabilities for cells assigned to later time points allowing us to identify not just the TFs controlling each path but also their order of activation. We tested CSHMM-TF on several mouse and human datasets. As we show, the method was able to identify known and novel TFs for all processes, assigned time of activation agrees with both expression information and prior knowledge and combinatorial predictions are supported by known interactions. We also show that CSHMM-TF improves upon prior methods that do not utilize TF-gene interaction.|
|2020-01-07||Automated mass spectrometry imaging of over 2000 proteins from tissue sections at 100-Î¼m spatial resolution.||Piehowski PD, Zhu Y, Bramer LM, Stratton KG, Zhao R, Orton DJ, Moore RJ, Yuan J, Mitchell HD, Gao Y, Webb-Robertson BM, Dey SK, Kelly RT, Burnum-Johnson KE.||TTD-Purdue||Biological tissues exhibit complex spatial heterogeneity that directs the functions of multicellular organisms. Quantifying protein expression is essential for elucidating processes within complex biological assemblies. Imaging mass spectrometry (IMS) is a powerful emerging tool for mapping the spatial distribution of metabolites and lipids across tissue surfaces, but technical challenges have limited the application of IMS to the analysis of proteomes. Methods for probing the spatial distribution of the proteome have generally relied on the use of labels and/or antibodies, which limits multiplexing and requires a priori knowledge of protein targets. Past efforts to make spatially resolved proteome measurements across tissues have had limited spatial resolution and proteome coverage and have relied on manual workflows. Here, we demonstrate an automated approach to imaging that utilizes label-free nanoproteomics to analyze tissue voxels, generating quantitative cell-type-specific images for >2000 proteins with 100-µm spatial resolution across mouse uterine tissue sections preparing for blastocyst implantation.|
|2020||High-Parameter Immune Profiling with CyTOF||Sahaf B, Rahman A, Maecker HT, Bendall SC.||RTI-Stanford||Mass cytometry, or CyTOF, is a useful technology for high-parameter single-cell phenotyping, especially from suspension cells such as blood or PBMC. It is particularly appealing to monitor the systemic immune changes that could accompany cancer immunotherapy. Here we present a reference panel for identification of all major immune cell populations, with flexibility for addition of trial-specific markers. We also describe best-practice measures for minimizing and tracking batch variability. These include: sample barcoding, use of spiked-in reference cells, and lyophilization of the antibody cocktail. Our protocol assumes the use of cryopreserved PBMC, both for convenience of batching samples and for maximum comparability across patients and time points. Finally, we show an option for automated analysis using the Astrolabe platform (Astrolabe Diagnostics, Inc.).|
|2019-12-31||Immune monitoring usingÂ mass cytometry and related high-dimensional imaging approaches.||Hartmann FJ, Bendall SC.||RTI-Stanford||The cellular complexity and functional diversity of the human immune system necessitate the use of high-dimensional single-cell tools to uncover its role in multifaceted diseases such as rheumatic diseases, as well as other autoimmune and inflammatory disorders. Proteomic technologies that use elemental (heavy metal) reporter ions, such as mass cytometry (also known as CyTOF) and analogous high-dimensional imaging approaches (including multiplexed ion beam imaging (MIBI) and imaging mass cytometry (IMC)), have been developed from their low-dimensional counterparts, flow cytometry and immunohistochemistry, to meet this need. A growing number of studies have been published that use these technologies to identify functional biomarkers and therapeutic targets in rheumatic diseases, but the full potential of their application to rheumatic disease research has yet to be fulfilled. This Review introduces the underlying technologies for high-dimensional immune monitoring and discusses aspects necessary for their successful implementation, including study design principles, analytical tools and future developments for the field of rheumatology.
|2019-12-26||Deep learning for inferring gene relationships from single-cell expression data||Yuan Y, Bar-Joseph Z.||HIVE TC-CMU||Several methods were developed to mine gene–gene relationships from expression data. Examples include correlation and mutual information methods for coexpression analysis, clustering and undirected graphical models for functional assignments, and directed graphical models for pathway reconstruction. Using an encoding for gene expression data, followed by deep neural networks analysis, we present a framework that can successfully address all of these diverse tasks. We show that our method, convolutional neural network for coexpression (CNNC), improves upon prior methods in tasks ranging from predicting transcription factor targets to identifying disease-related genes to causality inference. CNNC’s encoding provides insights about some of the decisions it makes and their biological basis. CNNC is flexible and can easily be extended to integrate additional types of genomics data, leading to further improvements in its performance.|
|2019-12-23||Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression.||Hafemeister C, Satija R.||HIVE MC-NYGC||Single-cell RNA-seq (scRNA-seq) data exhibits significant cell-to-cell variation due to technical factors, including the number of molecules detected in each cell, which can confound biological heterogeneity with technical effects. To address this, we present a modeling framework for the normalization and variance stabilization of molecular count data from scRNA-seq experiments. We propose that the Pearson residuals from “regularized negative binomial regression,” where cellular sequencing depth is utilized as a covariate in a generalized linear model, successfully remove the influence of technical characteristics from downstream analyses while preserving biological heterogeneity. Importantly, we show that an unconstrained negative binomial model may overfit scRNA-seq data, and overcome this by pooling information across genes with similar abundances to obtain stable parameter estimates. Our procedure omits the need for heuristic steps including pseudocount addition or log-transformation and improves common downstream analytical tasks such as variable gene selection, dimensional reduction, and differential expression. Our approach can be applied to any UMI-based scRNA-seq dataset and is freely available as part of the R package sctransform, with a direct interface to our single-cell toolkit Seurat.|
|2019-12-20||Uncovering matrix effects on lipid analyses in MALDI imaging mass spectrometry experiments.||Perry WJ, Patterson NH, Prentice BM, Neumann EK, Caprioli RM, Spraggins JM.||TMC-Vanderbilt||The specific matrix used in matrix‐assisted laser desorption/ionization imaging mass spectrometry (MALDI IMS) can have an effect on the molecules ionized from a tissue sample. The sensitivity for distinct classes of biomolecules can vary when employing different MALDI matrices. Here, we compare the intensities of various lipid subclasses measured by Fourier transform ion cyclotron resonance (FT‐ICR) IMS of murine liver tissue when using 9‐aminoacridine (9AA), 5‐chloro‐2‐mercaptobenzothiazole (CMBT), 1,5‐diaminonaphthalene (DAN), 2,5‐Dihydroxyacetophenone (DHA), and 2,5‐dihydroxybenzoic acid (DHB). Principal component analysis and receiver operating characteristic curve analysis revealed significant matrix effects on the relative signal intensities observed for different lipid subclasses and adducts. Comparison of spectral profiles and quantitative assessment of the number and intensity of species from each lipid subclass showed that each matrix produces unique lipid signals. In positive ion mode, matrix application methods played a role in the MALDI analysis for different cationic species. Comparisons of different methods for the application of DHA showed a significant increase in the intensity of sodiated and potassiated analytes when using an aerosol sprayer. In negative ion mode, lipid profiles generated using DAN were significantly different than all other matrices tested. This difference was found to be driven by modification of phosphatidylcholines during ionization that enables them to be detected in negative ion mode. These modified phosphatidylcholines are isomeric with common phosphatidylethanolamines confounding MALDI IMS analysis when using DAN. These results show an experimental basis of MALDI analyses when analyzing lipids from tissue and allow for more informed selection of MALDI matrices when performing lipid IMS experiments.|
|2019-12-12||Toward a Common Coordinate Framework for the Human Body.||Rood JE, Stuart T, Ghazanfar S, Biancalani T, Fisher E, Butler A, Hupalowska A, Gaffney L, Mauck W, Eraslan G, Marioni JC, Regev A, Satija R.||HIVE MC-NYGC||Understanding the genetic and molecular drivers of phenotypic heterogeneity across individuals is central to biology. As new technologies enable fine-grained and spatially resolved molecular profiling, we need new computational approaches to integrate data from the same organ across different individuals into a consistent reference and to construct maps of molecular and cellular organization at histological and anatomical scales. Here, we review previous efforts and discuss challenges involved in establishing such a common coordinate framework, the underlying map of tissues and organs. We focus on strategies to handle anatomical variation across individuals and highlight the need for new technologies and analytical methods spanning multiple hierarchical scales of spatial resolution.|
|2019-12||High-throughput sequencing of the transcriptome and chromatin accessibility in the same cell||Chen S, Lake BB, Zhang K.||TMC-UCSD||Single-cell RNA sequencing can reveal the transcriptional state of cells, yet provides little insight into the upstream regulatory landscape associated with open or accessible chromatin regions. Joint profiling of accessible chromatin and RNA within the same cells would permit direct matching of transcriptional regulation to its outputs. Here, we describe droplet-based single-nucleus chromatin accessibility and mRNA expression sequencing (SNARE-seq), a method that can link a cell’s transcriptome with its accessible chromatin for sequencing at scale. Specifically, accessible sites are captured by Tn5 transposase in permeabilized nuclei to permit, within many droplets in parallel, DNA barcode tagging together with the mRNA molecules from the same cells. To demonstrate the utility of SNARE-seq, we generated joint profiles of 5,081 and 10,309 cells from neonatal and adult mouse cerebral cortices, respectively. We reconstructed the transcriptome and epigenetic landscapes of major and rare cell types, uncovered lineage-specific accessible sites, especially for low-abundance cells, and connected the dynamics of promoter accessibility with transcription level during neurogenesis.|
|2019-11-15||High spatial resolution imaging of biological tissues using nanospray desorption electrospray ionization mass spectrometry||Yin R, Burnum-Johnson KE, Sun X, Dey SK & Laskin J||TTD-Purdue||Mass spectrometry imaging (MSI) enables label-free spatial mapping of hundreds of biomolecules in tissue sections. This capability provides valuable information on tissue heterogeneity that is difficult to obtain using population-averaged assays. Despite substantial developments in both instrumentation and methodology, MSI of tissue samples at single-cell resolution remains challenging. Herein, we describe a protocol for robust imaging of tissue sections with a high (better than 10-μm) spatial resolution using nanospray desorption electrospray ionization (nano-DESI) mass spectrometry, an ambient ionization technique that does not require sample pretreatment before analysis. In this protocol, mouse uterine tissue is used as a model system to illustrate both the workflow and data obtained in these experiments. We provide a detailed description of the nano-DESI MSI platform, fabrication of the nano-DESI and shear force probes, shear force microscopy experiments, spectral acquisition, and data processing. A properly trained researcher (e.g., technician, graduate student, or postdoc) can complete all the steps from probe fabrication to data acquisition and processing within a single day. We also describe a new strategy for acquiring both positive- and negative-mode imaging data in the same experiment. This is achieved by alternating between positive and negative data acquisition modes during consecutive line scans. Using our imaging approach, hundreds of high-quality ion images were obtained from a single uterine section. This protocol enables sensitive and quantitative imaging of lipids and metabolites in heterogeneous tissue sections with high spatial resolution, which is critical to understanding biochemical processes occurring in biological tissues.|
|2019-11-01||Continuous State HMMs for Modeling Time Series Single Cell RNA-Seq Data.||Lin C, Bar-Joseph Z.||HIVE TC-CMU||MOTIVATION:
Methods for reconstructing developmental trajectories from time series single cell RNA-Seq (scRNA-Seq) data can be largely divided into two categories. The first, often referred to as pseudotime ordering methods, are deterministic and rely on dimensionality reduction followed by an ordering step. The second learns a probabilistic branching model to represent the developmental process. While both types have been successful, each suffers from shortcomings that can impact their accuracy.
We developed a new method based on continuous state HMMs (CSHMMs) for representing and modeling time series scRNA-Seq data. We define the CSHMM model and provide efficient learning and inference algorithms which allow the method to determine both the structure of the branching process and the assignment of cells to these branches. Analyzing several developmental single cell datasets we show that the CSHMM method accurately infers branching topology and correctly and continuously assign cells to paths, improving upon prior methods proposed for this task. Analysis of genes based on the continuous cell assignment identifies known and novel markers for different cell types.
Software and Supporting website: www.andrew.cmu.edu/user/chiehl1/CSHMM/.
Supplementary data are available at Bioinformatics online.
Unsupervised machine learning for exploratory data analysis in imaging mass spectrometry.
Screen reader support enabled.
|Verbeeck N, Caprioli RM, Van de Plas R.||TMC-Vanderbilt||Imaging mass spectrometry (IMS) is a rapidly advancing molecular imaging modality that can map the spatial distribution of molecules with high chemical specificity. IMS does not require prior tagging of molecular targets and is able to measure a large number of ions concurrently in a single experiment. While this makes it particularly suited for exploratory analysis, the large amount and high‐dimensional nature of data generated by IMS techniques make automated computational analysis indispensable. Research into computational methods for IMS data has touched upon different aspects, including spectral preprocessing, data formats, dimensionality reduction, spatial registration, sample classification, differential analysis between IMS experiments, and data‐driven fusion methods to extract patterns corroborated by both IMS and other imaging modalities. In this work, we review unsupervised machine learning methods for exploratory analysis of IMS data, with particular focus on (a) factorization, (b) clustering, and (c) manifold learning. To provide a view across the various IMS modalities, we have attempted to include examples from a range of approaches including matrix assisted laser desorption/ionization, desorption electrospray ionization, and secondary ion mass spectrometry‐based IMS. This review aims to be an entry point for both (i) analytical chemists and mass spectrometry experts who want to explore computational techniques; and (ii) computer scientists and data mining specialists who want to enter the IMS field.|
|2019-10-09||The human body at cellular resolution: the NIH Human Biomolecular Atlas Program||HuBMAP Consortium||All||Transformative technologies are enabling the construction of three-dimensional maps of tissues with unprecedented spatial and molecular resolution. Over the next seven years, the NIH Common Fund Human Biomolecular Atlas Program (HuBMAP) intends to develop a widely accessible framework for comprehensively mapping the human body at single-cell resolution by supporting technology development, data acquisition, and detailed spatial mapping. HuBMAP will integrate its efforts with other funding agencies, programs, consortia, and the biomedical research community at large towards the shared vision of a comprehensive, accessible three-dimensional molecular and cellular atlas of the human body, in health and under various disease conditions.|
|2019-10-08||High-Performance Molecular Imaging with MALDI Trapped Ion-Mobility Time-of-Flight (timsTOF) Mass Spectrometry.||Spraggins JM, Djambazova KV, Rivera ES, Migas LG, Neumann EK, Fuetterer A, Suetering J, Goedecke N, Ly A, Van de Plas R, Caprioli RM.||TMC-Vanderbilt||Understanding the genetic and molecular drivers of phenotypic heterogeneity across individuals is central to biology. As new technologies enable fine-grained and spatially resolved molecular profiling, we need new computational approaches to integrate data from the same organ across different individuals into a consistent reference and to construct maps of molecular and cellular organization at histological and anatomical scales. Here, we review previous efforts and discuss challenges involved in establishing such a common coordinate framework, the underlying map of tissues and organs. We focus on strategies to handle anatomical variation across individuals and highlight the need for new technologies and analytical methods spanning multiple hierarchical scales of spatial resolution.|
|2019-09-09||Supervised classification enables rapid annotation of cell atlases.||Pliner HA, Shendure J, Trapnell C.||TMC-CalTech||Single-cell molecular profiling technologies are gaining rapid traction, but the manual process by which resulting cell types are typically annotated is labor intensive and rate-limiting. We describe Garnett, a tool for rapidly annotating cell types in single-cell transcriptional profiling and single-cell chromatin accessibility datasets, based on an interpretable, hierarchical markup language of cell type-specific genes. Garnett successfully classifies cell types in tissue and whole organism datasets, as well as across species.|
|2019-09-02||A pooled single-cell genetic screen identifies regulatory checkpoints in the continuum of the epithelial-to-mesenchymal transition.||McFaline-Figueroa JL, Hill AJ, Qiu X, Jackson D, Shendure J, Trapnell C.||TMC-CalTech||Integrating single-cell trajectory analysis with pooled genetic screening could reveal the genetic architecture that guides cellular decisions in development and disease. We applied this paradigm to probe the genetic circuitry that controls epithelial-to-mesenchymal transition (EMT). We used single-cell RNA sequencing to profile epithelial cells undergoing a spontaneous spatially determined EMT in the presence or absence of transforming growth factor-β. Pseudospatial trajectory analysis identified continuous waves of gene regulation as opposed to discrete ‘partial’ stages of EMT. KRAS was connected to the exit from the epithelial state and the acquisition of a fully mesenchymal phenotype. A pooled single-cell CRISPR-Cas9 screen identified EMT-associated receptors and transcription factors, including regulators of KRAS, whose loss impeded progress along the EMT. Inhibiting the KRAS effector MEK and its upstream activators EGFR and MET demonstrates that interruption of key signaling events reveals regulatory ‘checkpoints’ in the EMT continuum that mimic discrete stages, and reconciles opposing views of the program that controls EMT.|
|2019-08-19||Immuno-SABER enables highly multiplexed and amplified protein imaging in tissues.||Saka SK, Wang Y, Kishi JY, Zhu A, Zeng Y, Xie W, Kirli K, Yapp C, Cicconet M, Beliveau BJ, Lapan SW, Yin S, Lin M, Boyden ES, Kaeser PS, Pihan G, Church GM, Yin P.||TTD-Harvard||Spatial mapping of proteins in tissues is hindered by limitations in multiplexing, sensitivity and throughput. Here we report immunostaining with signal amplification by exchange reaction (Immuno-SABER), which achieves highly multiplexed signal amplification via DNA-barcoded antibodies and orthogonal DNA concatemers generated by primer exchange reaction (PER). SABER offers independently programmable signal amplification without in situ enzymatic reactions, and intrinsic scalability to rapidly amplify and visualize a large number of targets when combined with fast exchange cycles of fluorescent imager strands. We demonstrate 5- to 180-fold signal amplification in diverse samples (cultured cells, cryosections, formalin-fixed paraffin-embedded sections and whole-mount tissues), as well as simultaneous signal amplification for ten different proteins using standard equipment and workflows. We also combined SABER with expansion microscopy to enable rapid, multiplexed super-resolution tissue imaging. Immuno-SABER presents an effective and accessible platform for multiplexed and amplified imaging of proteins with high sensitivity and throughput.|
|2019-06-19||The 2019 mathematical oncology roadmap.||Rockne RC, Hawkins-Daarud A, Swanson KR, Sluka JP, Glazier JA, Macklin P, Hormuth DA, Jarrett AM, Lima EABF, Tinsley Oden J, Biros G, Yankeelov TE, Curtius K, Al Bakir I, Wodarz D, Komarova N, Aparicio L, Bordyuh M, Rabadan R, Finley SD, Enderling H, Caudell J, et al.||Whether the nom de guerre is Mathematical Oncology, Computational or Systems Biology, Theoretical Biology, Evolutionary Oncology, Bioinformatics, or simply Basic Science, there is no denying that mathematics continues to play an increasingly prominent role in cancer research. Mathematical Oncology—defined here simply as the use of mathematics in cancer research—complements and overlaps with a number of other fields that rely on mathematics as a core methodology. As a result, Mathematical Oncology has a broad scope, ranging from theoretical studies to clinical trials designed with mathematical models. This Roadmap differentiates Mathematical Oncology from related fields and demonstrates specific areas of focus within this unique field of research. The dominant theme of this Roadmap is the personalization of medicine through mathematics, modelling, and simulation. This is achieved through the use of patient-specific clinical data to: develop individualized screening strategies to detect cancer earlier; make predictions of response to therapy; design adaptive, patient-specific treatment plans to overcome therapy resistance; and establish domain-specific standards to share model predictions and to make models and simulations reproducible. The cover art for this Roadmap was chosen as an apt metaphor for the beautiful, strange, and evolving relationship between mathematics and cancer.|
|2019-06-06||Comprehensive Integration of Single-Cell Data.||Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, Mauck WM 3rd, Hao Y, Stoeckius M, Smibert P, Satija R.||HIVE MC-NYGC||Single-cell transcriptomics has transformed our ability to characterize cell states, but deep biological understanding requires more than a taxonomic listing of clusters. As new methods arise to measure distinct cellular modalities, a key analytical challenge is to integrate these datasets to better understand cellular identity and function. Here, we develop a strategy to “anchor” diverse datasets together, enabling us to integrate single-cell measurements not only across scRNA-seq technologies, but also across different modalities. After demonstrating improvement over existing methods for integrating scRNA-seq data, we anchor scRNA-seq experiments with scATAC-seq to explore chromatin differences in closely related interneuron subsets and project protein expression measurements onto a bone marrow atlas to characterize lymphocyte populations. Lastly, we harmonize in situ gene expression and scRNA-seq datasets, allowing transcriptome-wide imputation of spatial gene expression patterns. Our work presents a strategy for the assembly of harmonized references and transfer of information across datasets.|
|2019-06-04||Cell lineage inference from SNP and scRNA-Seq data.||Ding J, Lin C, Bar-Joseph Z.||HIVE TC-CMU||Several recent studies focus on the inference of developmental and response trajectories from single cell RNA-Seq (scRNA-Seq) data. A number of computational methods, often referred to as pseudo-time ordering, have been developed for this task. Recently, CRISPR has also been used to reconstruct lineage trees by inserting random mutations. However, both approaches suffer from drawbacks that limit their use. Here, we develop a method to detect significant, cell type specific, sequence mutations from scRNA-Seq data. We show that only a few mutations are enough for reconstructing good branching models. Integrating these mutations with expression data further improves the accuracy of the reconstructed models. As we show, the majority of mutations we identify are likely RNA editing events indicating that such information can be used to distinguish cell types.|
|2019-06||SABER amplifies FISH: enhanced multiplexed imaging of RNA and DNA in cells and tissues.||Kishi JY, Lapan SW, Beliveau BJ, West ER, Zhu A, Sasaki HM, Saka SK, Wang Y, Cepko CL, Yin P.||TTD-Harvard||Fluorescence in situ hybridization (FISH) reveals the abundance and positioning of nucleic acid sequences in fixed samples. Despite recent advances in multiplexed amplification of FISH signals, it remains challenging to achieve high levels of simultaneous amplification and sequential detection with high sampling efficiency and simple workflows. Here we introduce signal amplification by exchange reaction (SABER), which endows oligonucleotide-based FISH probes with long, single-stranded DNA concatemers that aggregate a multitude of short complementary fluorescent imager strands. We show that SABER amplified RNA and DNA FISH signals (5- to 450-fold) in fixed cells and tissues. We also applied 17 orthogonal amplifiers against chromosomal targets simultaneously and detected mRNAs with high efficiency. We then used 10-plex SABER-FISH to identify in vivo introduced enhancers with cell-type-specific activity in the mouse retina. SABER represents a simple and versatile molecular toolkit for rapid and cost-effective multiplexed imaging of nucleic acid targets.|
|2019-05-06||Visualizing learner engagement, performance, and trajectories to evaluate and optimize online course design.||Ginda M, Richey MC, Cousino M, Börner K.||HIVE MC-IU||Learning analytics and visualizations make it possible to examine and communicate learners’ engagement, performance, and trajectories in online courses to evaluate and optimize course design for learners. This is particularly valuable for workforce training involving employees who need to acquire new knowledge in the most effective manner. This paper introduces a set of metrics and visualizations that aim to capture key dynamical aspects of learner engagement, performance, and course trajectories. The metrics are applied to identify prototypical behavior and learning pathways through and interactions with course content, activities, and assessments. The approach is exemplified and empirically validated using more than 30 million separate logged events that capture activities of 1,608 Boeing engineers taking the MITxPro Course, “Architecture of Complex Systems,” delivered in Fall 2016. Visualization results show course structure and patterns of learner interactions with course material, activities, and assessments. Tree visualizations are used to represent course hierarchical structures and explicit sequence of content modules. Learner trajectory networks represent pathways and interactions of individual learners through course modules, revealing patterns of learner engagement, content access strategies, and performance. Results provide evidence for instructors and course designers for evaluating the usage and effectiveness of course materials and intervention strategies.|
|2019-04-06||Imaging mass spectrometry enables molecular profiling of mouse and human pancreatic tissue||Prentice BM, Hart NJ, Phillips N, Haliyur R, Judd A, Armandala R, Spraggins JM, Lowe CL, Boyd KL, Stein RW, Wright CV, Norris JL, Powers AC, Brissova M, Caprioli RM.||The molecular response and function of pancreatic islet cells during metabolic stress is a complex process. The anatomical location and small size of pancreatic islets coupled with current methodological limitations have prevented the achievement of a complete, coherent picture of the role that lipids and proteins play in cellular processes under normal conditions and in diseased states. Herein, we describe the development of untargeted tissue imaging mass spectrometry (IMS) technologies for the study of in situ protein and, more specifically, lipid distributions in murine and human pancreases.|
|2019-04||The Importance of Clinical Tissue Imaging||Spraggins JM, Schwamborn K, Heeren RMA, Eberlin LS.||TMC-Vanderbilt||Tissue imaging by mass spectrometry (MS) combines the sensitivity and molecular specificity of MS with the spatial fidelity of classical histology for analysis of metabolites, lipids and proteins in tissues (Fig. 1). MS-based imaging is label-free, untargeted, sensitive, and specific, thereby enabling application in both basic biomedical research and the clinical laboratory. While all tissue imaging experiments are conceptually similar in their ability to generate spatial molecular data; ionization, data collection, and purpose vary widely. Here, we highlight recent technical advances and efforts that are motivating translational applications of this emerging technology.|
|2019-03-25||Transcriptome-scale super-resolved imaging in tissues by RNA seqFISH.||Eng CL, Lawson M, Zhu Q, Dries R, Koulena N, Takei Y, Yun J, Cronin C, Karp C, Yuan GC, Cai L.||TTD-Cal Tech||Imaging the transcriptome in situ with high accuracy has been a major challenge in single-cell biology, which is particularly hindered by the limits of optical resolution and the density of transcripts in single cells. Here we demonstrate an evolution of sequential fluorescence in situ hybridization (seqFISH+). We show that seqFISH+ can image mRNAs for 10,000 genes in single cells-with high accuracy and sub-diffraction-limit resolution-in the cortex, subventricular zone and olfactory bulb of mouse brain, using a standard confocal microscope. The transcriptome-level profiling of seqFISH+ allows unbiased identification of cell classes and their spatial organization in tissues. In addition, seqFISH+ reveals subcellular mRNA localization patterns in cells and ligand-receptor pairs across neighbouring cells. This technology demonstrates the ability to generate spatial cell atlases and to perform discovery-driven studies of biological processes in situ.|
|2019-03||Multiple TOF/TOF Events in a Single Laser Shot for Multiplexed Lipid Identifications in MALDI Imaging Mass Spectrometry.||Prentice BM, McMillen JC, Caprioli RM||TMC-Vanderbilt||Tandem mass spectrometry (MS/MS) is often used to identify lipids in matrix-assisted laser desorption/ionization imaging mass spectrometry (MALDI IMS) workflows. The molecular specificity afforded by MS/MS is crucial on MALDI time-of-flight (TOF) platforms that generally lack high resolution accurate mass measurement capabilities. Unfortunately, imaging MS/MS workflows generally only monitor a single precursor ion over the imaged area, limiting the throughput of this methodology. Herein, we demonstrate that multiple TOF/TOF events performed in each laser shot can be used to improve the throughput of imaging MS/MS. This is shown to enable the simultaneous identification of multiple phosphatidylcholine lipids in rat brain tissue. Uniquely, the separation in time achieved for the precursor ions in the TOF-1 region of the instrument is maintained for the fragment ions as they are analyzed in TOF-2, allowing for the differentiation of fragment ions of the exact same m/z derived from different precursor ions (e.g., the m/z 163 fragment ion from precursor ion m/z 772.5 is easily distinguished from the m/z 163 fragment ion from precursor ion m/z 826.5). This multiplexed imaging MS/MS approach allows for the acquisition of complete fragment ion spectra for multiple precursor ions per laser shot.|
|2019-02-20||The single-cell transcriptional landscape of mammalian organogenesis||Cao J, Spielmann M, Qiu X, Huang X, Ibrahim DM, Hill AJ, Zhang F, Mundlos S, Christiansen L, Steemers FJ, Trapnell C & Shendure J||TMC-Cal Tech||Mammalian organogenesis is a remarkable process. Within a short timeframe, the cells of the three germ layers transform into an embryo that includes most of the major internal and external organs. Here we investigate the transcriptional dynamics of mouse organogenesis at single-cell resolution. Using single-cell combinatorial indexing, we profiled the transcriptomes of around 2 million cells derived from 61 embryos staged between 9.5 and 13.5 days of gestation, in a single experiment. The resulting ‘mouse organogenesis cell atlas’ (MOCA) provides a global view of developmental processes during this critical window. We use Monocle 3 to identify hundreds of cell types and 56 trajectories, many of which are detected only because of the depth of cellular coverage, and collectively define thousands of corresponding marker genes. We explore the dynamics of gene expression within cell types and trajectories over time, including focused analyses of the apical ectodermal ridge, limb mesenchyme and skeletal muscle.|
|2019-02-15||Dhaka: Variational Autoencoder for Unmasking Tumor Heterogeneity from Single Cell Genomic Data.||Rashid S, Shah S, Bar-Joseph Z, Pandya R.||HIVE TC-CMU||MOTIVATION:
Intra-tumor heterogeneity is one of the key confounding factors in deciphering tumor evolution. Malignant cells exhibit variations in their gene expression, copy numbers, and mutation even when originating from a single progenitor cell. Single cell sequencing of tumor cells has recently emerged as a viable option for unmasking the underlying tumor heterogeneity. However, extracting features from single cell genomic data in order to infer their evolutionary trajectory remains computationally challenging due to the extremely noisy and sparse nature of the data.
Here we describe 'Dhaka', a variational autoencoder method which transforms single cell genomic data to a reduced dimension feature space that is more efficient in differentiating between (hidden) tumor subpopulations. Our method is general and can be applied to several different types of genomic data including copy number variation from scDNA-Seq and gene expression from scRNA-Seq experiments. We tested the method on synthetic and 6 single cell cancer datasets where the number of cells ranges from 250 to 6000 for each sample. Analysis of the resulting feature space revealed subpopulations of cells and their marker genes. The features are also able to infer the lineage and/or differentiation trajectory between cells greatly improving upon prior methods suggested for feature extraction and dimensionality reduction of such data.
AVAILABILITY AND IMPLEMENTATION:
All the datasets used in the paper are publicly available and developed software package and supporting info is available on Github https://github.com/MicrosoftGenomics/Dhaka.
|2019-02-05||Data visualization literacy: Definitions, conceptual frameworks, exercises, and assessments||Börner K, Bueckle A and Ginda M.||HIVE MC-IU||In the information age, the ability to read and construct data visualizations becomes as important as the ability to read and write text. However, while standard definitions and theoretical frameworks to teach and assess textual, mathematical, and visual literacy exist, current data visualization literacy (DVL) definitions and frameworks are not comprehensive enough to guide the design of DVL teaching and assessment. This paper introduces a data visualization literacy framework (DVL-FW) that was specifically developed to define, teach, and assess DVL. The holistic DVL-FW promotes both the reading and construction of data visualizations, a pairing analogous to that of both reading and writing in textual literacy and understanding and applying in mathematical literacy. Specifically, the DVL-FW defines a hierarchical typology of core concepts and details the process steps that are required to extract insights from data. Advancing the state of the art, the DVL-FW interlinks theoretical and procedural knowledge and showcases how both can be combined to design curricula and assessment measures for DVL. Earlier versions of the DVL-FW have been used to teach DVL to more than 8,500 residential and online students, and results from this effort have helped revise and validate the DVL-FW presented here.|
|2019-02||Protein identification strategies in MALDI imaging mass spectrometry: a brief review.||Ryan DJ, Spraggins JM, Caprioli RM.||TMC-Vanderbilt||Matrix assisted laser desorption/ionization (MALDI) imaging mass spectrometry (IMS) is a powerful technology used to investigate the spatial distributions of thousands of molecules throughout a tissue section from a single experiment. As proteins represent an important group of functional molecules in tissue and cells, the imaging of proteins has been an important point of focus in the development of IMS technologies and methods. Protein identification is crucial for the biological contextualization of molecular imaging data. However, gas-phase fragmentation efficiency of MALDI generated proteins presents significant challenges, making protein identification directly from tissue difficult. This review highlights methods and technologies specifically related to protein identification that have been developed to overcome these challenges in MALDI IMS experiments.|
|2018-12-19||Cell Hashing with barcoded antibodies enables multiplexing and doublet detection for single cell genomics||Stoeckius M, Zheng S, Houck-Loomis B, Hao S, Yeung BZ, Mauck WM, Smibert P, Satija R.||HIVE MC-NYGC||Despite rapid developments in single cell sequencing, sample-specific batch effects, detection of cell multiplets, and experimental costs remain outstanding challenges. Here, we introduce Cell Hashing, where oligo-tagged antibodies against ubiquitously expressed surface proteins uniquely label cells from distinct samples, which can be subsequently pooled. By sequencing these tags alongside the cellular transcriptome, we can assign each cell to its original sample, robustly identify cross-sample multiplets, and “super-load” commercial droplet-based systems for significant cost reduction. We validate our approach using a complementary genetic approach and demonstrate how hashing can generalize the benefits of single cell multiplexing to diverse samples and experimental designs.|
|2018-12-10||Forecasting innovations in science, technology, and education||Börner K, Rouse WB, Trunfio P, Stanley HE.||HIVE MC-IU||Human survival depends on our ability to predict future outcomes so that we can make informed decisions. Human cognition and perception are optimized for local, short-term decision-making, such as deciding when to fight or flight, whom to mate, or what to eat. For more elaborate decisions (e.g., when to harvest, when to go to war or not, and whom to marry), people used to consult oracles—prophetic predictions of the future inspired by the gods. Over time, oracles were replaced by models of the structure and dynamics of natural, technological, and social systems. In the 21st century, computational models and visualizations of model results inform much of our decision-making: near real-time weather forecasts help us decide when to take an umbrella, plant, or harvest; where to ground airplanes; or when to evacuate inhabitants in the path of a hurricane, tornado, or flood. Long-term weather and climate forecasts predict a future with increasing torrential rains, stronger winds, and more frequent drought, landslides, and forest fires as well as rising sea levels, enabling decision makers to prepare for these changes by building dikes, moving cities and roads, and building larger water reservoirs and better storm sewers.|
|2018-10-29||Identification of spatially associated subpopulations by combining scRNAseq and sequential fluorescence in situ hybridization data.||Zhu Q, Shah S, Dries R, Cai L, Yuan GC.||TTD-Cal Tech||How intrinsic gene-regulatory networks interact with a cell's spatial environment to define its identity remains poorly understood. We developed an approach to distinguish between intrinsic and extrinsic effects on global gene expression by integrating analysis of sequencing-based and imaging-based single-cell transcriptomic profiles, using cross-platform cell type mapping combined with a hidden Markov random field model. We applied this approach to dissect the cell-type- and spatial-domain-associated heterogeneity in the mouse visual cortex region. Our analysis identified distinct spatially associated, cell-type-independent signatures in the glutamatergic and astrocyte cell compartments. Using these signatures to analyze single-cell RNA sequencing data, we identified previously unknown spatially associated subpopulations, which were validated by comparison with anatomical structures and Allen Brain Atlas images.|