Skip to main content

Bioinformatics/AI to mine the public-data and boost the Lab of the Future

In the era of data-driven science, leveraging publicly available biological data is the key to boost future R&D activities (i.e. ‘Lab of the Future’). The design of Covid-2 specific primers and their validation against novel Covid-2 variants is relevant enough to emphasize the impact pooling global data (Covid-2 genomic sequences). With the advent of omics platforms, producing genetic and epigenetic data (DNA sequence, gene-expression, methylation etc.), it’s imperative to develop capabilities to retrieve, process and analyze the Big-data (size range GBs-TBs), harmonize the data sourced from different labs (batch & platform effect correction) and apply classical and deep machine learning algorithms to discover novel biological patterns.

One of such omics data type is the cell-free DNA (cfDNA) sequencing data. The blood cfDNA, a mix of free-floating fragmented DNA originated from multiple tissues, carries the tissue specific genetic and epigenetic information. Deconvoluting the proportion of each tissue type in the cfDNA mix is essential to monitor the progression of diseases such Cancer and solid organ transplant rejection. We developed an algorithm (applying Quadratic programming) to deconvolute the fraction of 39 cell types in the cfDNA mix, using the position specific methylation signal. A successful evaluation our algorithm required methylation data with known proportions of multiple cell-types. We have developed another new algorithm to simulate the methylation based biological mixtures data and thoroughly tested our deconvolution algorithm using our simulated and real cfDNA methylation data.

It’s undeniable that healthcare and pharma R&D is driven by our ability to apply the known and develop new ML algorithms. The algorithm described above (biological mixture deconvolution) finds an additional application to compute the tumor fraction in cell-free DNA data, helping early detection and monitoring tumor management. We conclude that AI and Bioinformatics techniques will keep boosting the idea of the “Lab of the Future” and help solve multiple complex biological problems. 

Back to top