Home »

Loading Events

« All Events

  • This event has passed.

Learning sparse log-ratios for high-throughput sequencing data

December 7, 2021 @ 11:00 am - 1:00 pm AEDT

This event is funded by Intellectual Climate Fund La Trobe University

About this event

We present CoDaCoRe, a novel software package for identifying sparse predictive biomarkers in high-throughput sequencing data (available at https://github.com/egr95/R-codacore).

The automatic discovery of sparse biomarkers that are associated with an outcome of interest is a central goal of bioinformatics. In the context of high-throughput sequencing (HTS) data, and compositional data (CoDa) more generally, an important class of biomarkers are the log-ratios between the input variables. However, identifying predictive log-ratio biomarkers from HTS data is a combinatorial optimization problem, which is computationally challenging. Existing methods are slow to run and scale poorly with the dimension of the input, which has limited their application to low- and moderate-dimensional metagenomic datasets. Building on recent advances from the field of deep learning, we develop CoDaCoRe, a novel learning algorithm that identifies sparse, interpretable, and predictive log-ratio biomarkers. Our algorithm exploits a continuous relaxation to approximate the underlying combinatorial optimization problem. This relaxation can then be optimized efficiently using the modern ML toolbox, in particular, gradient descent. As a result, CoDaCoRe runs several orders of magnitude faster than competing methods, all while achieving state-of-the-art performance in terms of predictive accuracy and sparsity. We verify the outperformance of CoDaCoRe across a wide range of microbiome, metabolite, and microRNA benchmark datasets, as well as a particularly high-dimensional dataset that is outright computationally intractable for existing sparse log-ratio selection methods.

 

Bio

Elliott Gordon-Rodriguez is a PhD student in Statistics at Columbia University. His research interests originated in machine learning, but have led to collaborations with bioinformaticians, geneticists, and applied statisticians. As well as bioinformatics, his current research interests include, more broadly, explainable artificial intelligence (XAI) and data mining

Zoom link – link will be provided by a calendar invite

Venue

Zoom

Organiser

La Trobe University
View Organiser Website