Schedule for: 20w5197 - Mathematical Frameworks for Integrative Analysis of Emerging Biological Data Types (Online)

Beginning on Sunday, June 14 and ending Friday June 19, 2020

All times in Banff, Alberta time, MDT (UTC-6).

Timezones
Monday, June 15
05:00 - 05:05 Welcome to BIRS Online, by BIRS Staff
A brief welcome and introduction by BIRS Staff.
(Online)
05:05 - 05:25 Elana Fertig: Introduction: aims of workshop
Dr Elana Fertig (JHU, https://fertiglab.com/) will introduce the main aims of the workshop and information on how the workshop will run.
(Online)
05:25 - 05:29 Chair: Elana Fertig, CORTEX seq-FISH session
Dr. Fertig is a co-chair of this meeting and is an Associate Professor of Oncology and Assistant Director of the Research Program in Quantitative Sciences at Johns Hopkins University, with secondary appointments in Biomedical Engineering and Applied Mathematics and Statistics, affiliations in the Institute of Computational Medicine, Center for Computational Genomics, Machine Learning, Mathematical Institute for Data Science, and the Center for Computational Biology. Homepage: https://fertiglab.com Twitter: @FertigLab
(.)
05:30 - 06:30 Guocheng Yuan: Keynote Talk (Cortex seq-FISH study)
Dr Guocheng Yuan (Dana-Farber Cancer Institute, Harvard TH Chan School of Public Health) Lab Website: http://bcb.dfci.harvard.edu/~gcyuan GC will present the SeqFish hackathon study
(Online)
06:30 - 07:00 break (Contributed talks, pls check your shared slides on zoom)
07:00 - 07:20 Alexis Coullomb: CORTEX seq-FISH: clustering
Alexis Coullomb is a member of Vera Pancaldi's Lab at Cancer Research Centre of Toulouse, INSERM, France, https://www.crct-inserm.fr/personne/alexis-coullomb/ Alexis Coullomb will present analysis in which addressed the questions: * Can scRNA-seq data be overlaid onto seqFISH for resolution enhancement? * What is the minimal number of genes needed for data integration? Alexis Coullomb was also interested in how could we detect specific spatial areas given the seqFISH gene expression data and by reconstruction the spatial network of cells. Code is available at: https://github.com/AlexCoul/multiOmics_integration
(Online)
07:20 - 07:40 Hang Xu: CORTEX seq-FISH: selection of spatial coherent genes
Dr Hang Xu is a postdoctoral Fellow in Christina Curtis's Lab (Stanford). Hang obtained her PhD in bioinformatics at the University of Nottingham, UK. She then trained as a postdoctoral research fellow in the Francis Crick Institute with Charles Swanton. Hang is interested in studying cancer evolutionary dynamics. Her research asked the following questions; 1. Can scRNA-seq data be overlaid onto seqFISH for resolution enhancement? 2. What is the minimal number of genes needed for data integration? She followed the approaches that described in Zhu's paper (Zhu et al 2018) which integrated scRNAseq and smFISH data. By following the approach, she randomly selected a subset of differently expressed genes and applied a SVM model to estimated the minimal number of genes that are required data integration. Code is available at https://github.com/gooday23/smfishscRNAHackathon/
(Online)
07:40 - 08:00 Dario Righelli: CORTEX seq-FISH: software structure and data integration
Dr. Dario Righelli is a postdoctoral fellow in the Department of Statistics, University of Padua, Italy (https://www.researchgate.net/profile/Dario_Righelli) in the lab of Dr. Davide Risso His work focused on the software infrastructure needed to easily analyze spatial datasets (such as seqFISH, 10X Visium, etc.) and to integrate spatial and non-spatial datasets Code is available at https://github.com/drighelli/SpatialAnalysis
(Online)
08:00 - 08:20 break (20 mins) (Contributed talks, pls check your shared slides on zoom)
08:20 - 08:40 Amrit Singh: CORTEX seq-FISH: integration with scRNA-seq data
Dr. Amrit Singh (https://amritsingh.ca) is a post-doctoral research fellow at the PROOF Centre of Excellence. University of British Columbia, Pathology. He uses statistical methodologies to extract signals from big biological data. He have a background in biology, math, and programming (R/Python/Node) and has developed new methods to integrate multi-source biological data as part of the mixOmics data integration project . He is interested in using web and voice to develop interactive user interfaces for users to extract maximal information from their data. Dr. Amrit Singh work addressed the question if scRNA-seq data be overlaid onto seqFISH for resolution enhancement? The published approach trained a multiclass SVM on the scRNAseq data and applied it to the seqFISH data to estimate the cell-types labels. My approach uses a penalized regression method (glmnet) with a semi-supervised approach in order to build a model using both the scRNAseq+seqFISH data. This strategy uses a recursive approach that involves multiple rounds of training glmnet models using labeled data (label and imputed) and predicting the cell-type labels of unlabeled data. At each iteration, cell-type labels with high confidence (probability > 0.5) are retained for the next iteration, where a new glmnet model is trained with the scRNAseq data and seqFISH data with imputed cell-type labels with high confidence. This process is repeated until all cell-types in the seqFISH data have been labeled or until 50 iterations have been reached (in order to reduce compute times). The advantage of this approach is that more data in used for model training such that the resulting model may generalize better to new data. The performance of this approach was estimated using cross-validation, using only the scRNAseq data as the test set. This work was done in collaboration with Prof Kim-Anh Le Cao (University of Melbourne) . Code is available at https://github.com/singha53/ssenet
(Online)
08:40 - 09:00 Joshua Sodicoff: CORTEX seq-FISH: integration with scRNA-seq data
Mr. Joshua Sodicoff is a Data Sciences BS Student in the lab of Joshua Welsh at the University of Michigan Medical School (https://welch-lab.github.io/people/ ). He addressed the first two questions listed on the github page for the seqfish data. Our primary goal was to integrate seqFISH data with scRNA-seq data to increase resolution and utilize LIGER to account for dataset-specific differences in expression. We also attempted to determine how the number of genes reported in the spatial data impacts the quality of the integrated data and of cell type mappings generated by our method. To address these questions, we analyzed both the provided seqFISH and scRNA-seq datasets, as well as additional scRNA-seq data (from the more recent Tasic visual cortex publication) and STARmap data Code is available at https://github.com/jsodicoff/birs_spatial_integration
(Online)
09:00 - 10:00 Brainstorm: Cortex Seq-fish data led by Guochen Yuan
Will not be recorded.
(Breakout room 1)
Tuesday, June 16
05:00 - 05:05 Chair: Aedin Culhane, sc targeted proteomics session (Online)
05:00 - 05:30 Guocheng Yuan: Debrief of Brainstorming Session (CORTEX seq-fish data)
Dr. GC Yuan is an Associate Professor in the Department of Data Sciences, Dana-Farber Cancer Institute and Biostatistics Harvard TH Chan School of Public Health. Recent software includes Giotto: a pipeline for single-cell spatial transcriptomic data analysis and visualization, STREAM: a method for trajectory analysis from single-cell RNAseq and ATACseq data and GiniClust3: this updated version of GiniClust provides a fast and memory-efficient tool for rare and common cell type identification. (http://bcb.dfci.harvard.edu/~gcyuan/software.html) http://bcb.dfci.harvard.edu/~gcyuan/
(Online)
05:30 - 06:30 Bernd Bodenmiller: Keynote Talk (single cell proteomics)
http://www.bodenmillerlab.com/ will present recent work single cell proteomics work from his lab
(Online)
06:30 - 07:00 break (Contributed talks, pls check your shared slides on zoom)
07:00 - 07:20 Yingxin Lin: sc targeted proteomics: Predicting outcome, survival from 3 proteomics datasets (Keren, Jackson, Wagner)
Ms. Yingxin Lin is a PhD candidate in Statistics under the supervision of Prof. Jean Yang, Dr. John Ormerod and Dr. Rachel Wang at the University of Sydney. Her main research interest is in normalisation and statistical modelling of single-cell RNA-seq data. https://yingxinlin.github.io/ She analyzed the three sc targeted proteomics datasets (Keren et al., Jackson et al., and Wagner et al.) which all presented comprehensive portraits of breast cancer tumor immune microenvironment, utilizing different methods to identify and characterize different subtypes of patients with the evidence associated with survival. Code is available at https://yingxinlin.github.io/BIRS_analysis
(Online)
07:20 - 07:35 Chen Meng: sc targeted proteomics: comparing multi-block PCA, linear regression
Dr. Chen Meng is Head of Bioinformatics at the Bavarian Center for Biomolecular Mass Spectrometry, TU Munich, Freising, Germany. (https://www.baybioms.tum.de/about-us/people/) Dr. Chen Meng mainly worked approach integrating partially overlapping proteomic data collected on different patients with similar phenotypes using two methods: simple linear regression (as a baseline/control) and multi-block PCA (MBPCA; including multiple co-inertia, multiple canonical correspondence analysis as special cases). In theory, MBPCA should outperform simple linear regression because it finds the correlated pattern across multiple datasets, preventing the potential problem of overfitting to one dataset. Code is available at https://github.com/mengchen18/BIRSBioIntegrationWorkshop
(Online)
07:35 - 07:50 Pratheepa Jeganathan: sc targeted proteomics:Stan model for latent Dirichlet allocation
Dr. Pratheepa Jeganathan received her masters (2013) and PhD (2016) from Texas Tech University and is currently a postdoctoral research fellow working with Prof Susan Holmes at Stanford University (https://profiles.stanford.edu/pratheepa-jeganathan) Her work considered solutions for 1) how should we approach integrating partially-overlapping proteomic data collected on different patients with similar phenotypes? 2) Without including the spatial x-y coordinate data, how well can we predict cell co-location? She will illustrate the topic modeling on discretized targeted proteomics data and the method to infer cell co-location. We integrated the two SingleCellExperiment using MultiAssayExperiment class in the R/Bioconductor package. We converted the normalized data to original protein expression and discretized (for the preliminary analysis, we added a minimum of the normalized value for each marker, but we need to know the sample mean and standard deviation of marker expressions in the MIBI data). We considered each cell is a document and wrote a Stan model for latent Dirichlet allocation. Using posterior samples of topic proportions, we inferred the latent topics with a higher proportion in each cell. We proposed a solution to the alignment issue. Code is available at https://github.com/PratheepaJ/Banff_proteomics
(Online)
07:50 - 08:05 Kris Sankaran: sc targeted proteomics: spatial analysis
Dr Kris Sankaran will start as an Assistant Professor in the Statistics Department at the University of Wisconsin, Madison in August 2020. He recently completed his postdoc at the Quebec AI Institute, working in Yoshua Bengio's lab. Her previously completed his PhD in Statistics at Stanford under the supervision of Susan Holmes, focusing on latent variable methods in the microbiome. His talk will explore adapting exploratory methods, interactive visualization, and supervised learning to relate complementary data sources when integrating multiple instruments (Mass Cytometry and MIBI-TOF) and multiple scales (cells, tissues, human populations), He will highlight the challenge of measuring the degree to which different data sources provide redundant (or novel) information, and propose some preliminary approaches. Kris’s interactive tool: https://observablehq.com/@krisrs1128/spatial-vs-expression-map
(Online)
08:00 - 08:20 break (20 mins) (Contributed talks, pls check your shared slides on zoom)
08:20 - 08:40 Lauren Hsu: sc targeted proteomics: matrix factorization
Ms. Lauren Hsu is recently received her master's degree in Master of Science in Computational Biology and Quantitative Genetics from Harvard TH Chan School Public Health (Boston, MA) and will begin her PhD studies at Harvard TH Chan School Public Health in the fall. She examined the utility of encoding the spatial data as a spatial weighting matrix that define the spatial relations among a set of co-ordinates or spatially resolved cells and integrating the sc proteomics data with the spatial information using matrix factorization Code is available at https://github.com/laurenhsu1/whitePaper
(Online)
08:40 - 09:00 Duncan Forster: Networks - learning salient gene and protein features from network topologies.
Duncan Forster is PhD student in Molecular Genetics co-supervised by Prof Gary Bader and Charlie Boone at the University of Toronto. https://baderlab.org/Members His work has addressed the following questions. Firstly, we wanted to determine whether recent deep learning architectures (namely graph neural networks/graph convolutional networks) could be used to learn salient gene and protein features from network topologies. If so, these features could be integrated in a trainable, end-to-end fashion allowing for effective integration of biological networks. These recent deep learning architectures have shown substantial improvements over previous network feature learning approaches on a range of tasks, which motivates their use in biological domains. Secondly, we wanted to determine more effective evaluation strategies in order to compare integration approaches. This is a challenging task due to differences in input network sizes and standard coverage, biases and quality of the standards, differences in method outputs (networks vs. features), and biases in the current evaluation strategies themselves. Code is available at https://github.com/bowang-lab/BIONIC
(Online)
09:00 - 09:02 Group Photo
It is tradition at BIRS to take a group photo at every meeting, which gets posted on the workshop web page. So please turn on your cameras for a screenshot of the Zoom Gallery.
(Online)
09:05 - 10:05 Brainstorm: sc targeted proteomics led by Aedin Culhane and Olga Vitek
Will not be recorded.
(Breakout room 1)
Wednesday, June 17
05:00 - 05:05 Chair: K-A Lê Cao, scNMT-seq session
Dr. Kim-Anh Le Cao is co-chair of the meeting and is an associate professor at the School of Mathematics and Statistics in the University of Melbourne. Kim-Anh is an NHMRC Career Development fellow, and an awardee of the 2019 Moran medal from the Australian Academy of Science. Since 2009, her team has been working on developing the R toolkit mixOmics dedicated to the integrative analysis of `omics' data to help researchers mine and make sense of biological data. http://mixomics.org/
(Online)
05:01 - 05:31 Debrief of Brainstorming Session (sc-targeted proteomics) Culhane/Vitek (Online)
05:30 - 06:30 Oliver Stegle: Keynote Talk (scNMT-seq study)
Dr. Oliver Stegle is a group leader in Statistical genomics and systems genetic at the European Bioinformatics Institute, Cambridge, UK (https://www.ebi.ac.uk/research/stegle/) His lab published Argelaguet et al. 2019 Multi-omics profiling of mouse gastrulation at single-cell resolution Nature volume 576, pages487–491(2019) https://www.nature.com/articles/s41586-019-1825-8 which was the basis of the scNMT-seq study challenge https://github.com/BIRSBiointegration/Hackathon/tree/master/scNMT-seq
(Online)
06:30 - 07:00 break (Contributed talks: pls check your shared slides on zoom)
07:00 - 07:20 Al J Abadi: scNMT-seq: multivariate integrative analyses
Dr. Al J Abadi is a Research Fellow and software developer in Computational Genomics in the lab of Prof Kim-Anh Lê Cao at the University of Melbourne, Australia (https://lecao-lab.science.unimelb.edu.au/) He addressed the challenges of i) Identification of multi-omics signatures that characterize lineage, stage or both: We applied a regularised partial least square analysis which can find key markers which characterize the coordinated lineage and stage-specific changes in different modalities ii) Dealing with missing values: In integrative analyses, we applied an iterative algorithm which can handle missing values without potentially inducing spurious correlations in the datasets while allowing for select for variables that are correlated across data modalities and characterize the stage and/or lineage of the cells Code: https://github.com/ajabadi/scNMT_seq_gastrulation
(Online)
07:20 - 07:40 Joshua Welch: scNMT-seq: LIGER
Dr. Joshua Welch, is Assistant Professor of Computational Medicine and Bioinformatics in Department of Computational Medicine and Bioinformatics, University of Michigan (https://welch-lab.github.io/). Most recently, his lab has focused on developing open-source software for the processing, analysis, and modeling of single-cell sequencing data. Key contributions in this area include SingleSplice, the first computational method for single-cell splicing analysis; SLICER, an algorithm for inferring developmental trajectories; and LIGER, a general approach for integrating single-cell transcriptomic, epigenomic and spatial transcriptomic data. We used our previously published algorithm LIGER for this analysis. The advantage of our method is that it can integrate different single-cell modalities measured on different single cells. The corresponding disadvantage is that we do not leverage the known correspondence information from true multi-omic measurements. We tried multiple data processing strategies for the scNMT accessibility data. We observed limited alignment with all processing strategies, but the more differentiated cell types showed more correspondence. We also analyzed a different single-cell multi-omic dataset, SNARE-seq (RNA+ATAC) from mouse frontal cortex. LIGER was able to effectively integrate this dataset, finding corresponding cell types between RNA and ATAC data without using the known cell correspondences. We are further investigating the possible biological and technical explanations for these differences. Code is available https://github.com/jw156605/scNMT
(Online)
07:40 - 08:00 Arshi Arora: scNMT-seq:MOSAIC, or Multi-Omic Supervised Integrative Clustering
Arshi Arora is a Research Biostatistician in Dr. Ronglai Shen's lab at Memorial Sloan Kettering Cancer Center, https://www.mskcc.org/profile/arshi-arora Her research addressed the following question; We wish to address the problem of identifying localized molecular signatures with respect to an outcome of interest such as stage and lineage. This poses an interesting challenge in understanding heterogeneity in cell populations across multiple data modalities. We aim to illustrate that the application of a supervised integrative clustering will provide a more accurate delineation of cell subpopulation across genomic, epigenomic, and transcriptomic landscape that is directly relevant to the biological outcome of interest. Code is available at https://github.com/arorarshi/scNMT_seq_MOSAIC
(Online)
08:00 - 08:10 break (10 mins) (Contributed talks: pls check your shared slides on zoom)
08:10 - 08:30 Wouter Meuleman: DNase-seq data as a scaffold for complementary (single cell) datasets
Dr. Wouter Meuleman is an investigator at the Altius Institute for Biomedical Sciences. His research focuses on how the regulatory genome is organized, and what the functional implications of this organization are. To address this question, he analyzed a large DNase I chromatin accessibility dataset (733 biosamples) to delineate and annotate putative regulatory elements in the human genome. Importantly for integrative analyses, this has resulted in a common coordinate system for regulatory DNA, providing a scaffold for complementary datasets and analyses. These data can be used for the annotation of protein-coding and non-coding genes, the interpretation of genetic variation and the study of compartmentalization of the regulatory genome. Prior to joining Altius, Wouter did postdoctoral work at MIT and the Broad Institute. He obtained his PhD in Computational Biology from Delft University of Technology, the Netherlands. The presented data are available via https://www.meuleman.org/, and code is available at https://github.com/Altius/Index and https://github.com/Altius/Vocabulary
(Online)
08:30 - 09:30 Brainstorm: scNMT-seq led by Oliver Stegle and Ricard Argelaguet
Will not be recorded.
(Breakout room 1)
17:00 - 18:00 Brainstorm: Summary of analyses and methods led by Casey Green and Kim-Anh Lê Cao
Will not be recorded.
(Breakout room 1)
Thursday, June 18
04:59 - 05:00 Chair: Stephanie Hicks (Johns Hopkins Bloomberg School of Public Health), computational challenges session
Dr. Stephanie Hicks is an Assistant Professor in the Department of Biostatistics at Johns Hopkins Bloomberg School of Public Health. I'm also a faculty member of the Johns Hopkins Data Science Lab, co-host of The Corresponding Author podcast and co-founder of R-Ladies Baltimore.
(Online)
05:00 - 06:00 Debrief of Brainstorming Sessions: sc-NMT seq (Stegle/Argelaguet) and Summary of analyses and Methods (Green/Le Cao) (Online)
06:00 - 07:00 Susan Holmes: Keynote Talk: Computational Challenges
Professor Susan Holmes is a Professor of Statistics and member of BioX, at Stanford University, a John Henry Samter University Fellow in Undergraduate Education, a Fellow of the Fields Institute. Moderator for the stat.AP arxiv. Slides at https://spholmes.github.io/
(Online)
07:00 - 07:30 break (Speakers: pls check your shared slides on zoom)
07:30 - 08:00 Michael Love: Benchmarking (Online)
08:00 - 08:10 Casey Greene: Writing with Manubot (Online)
08:10 - 09:00 Discussions about white paper led by organisers
Will not be recorded.
(Breakout room 1)
09:00 - 10:00 Brainstorm: Interpretation Challenges led by Susan Holmes
Will not be recorded.
(Breakout room 1)
17:00 - 18:00 Brainstorm: Benchmarking led by Michael Love and Matthew Ritchie
Will not be recorded.
(Breakout room 1)
Friday, June 19
05:00 - 05:05 Chair: Michael Love, Future Directions session (Online)
05:00 - 06:00 Debrief of Brainstorming Session; Challenges (Holmes) and Benchmarking (Love/Ritchie) (Online)
06:00 - 07:00 Vincent Carey: Software Infastructure
Vincent Carey is Professor of Medicine (Biostatistics) in the Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School. He is former Editor-in-Chief of The R Journal and is a co-founder of the Bioconductor project.
(Online)
07:00 - 07:30 Break (Online)
07:30 - 08:30 Brainstorm: Future directions led by Elana Fertig (parallel)
Will not be recorded.
(Breakout room 2)
07:30 - 08:30 Brainstorm: Software infrastructure led by Vincent Carey (parallel)
Will not be recorded.
(Breakout room 1)
08:30 - 09:30 Debrief of Brainstorming session: Software (Carey) and Future directions (Fertig) (Online)
09:30 - 10:00 Closing remarks (Organizers: Culhane/Fertig/Le Cao) (Online)