Séminaire virtuel: vendredi 2 septembre 2022

Meeting ID: 867 6409 6440
passcode: 149120

13h00 - 13h50 – Carito Guziolowski (LS2N, École centrale de Nantes)

Machine Learning, Logic Programming and Learning Boolean Networks

This talk will present a work published on 2018 (Chebouba et al., BMC Bioinformatics), applied to discriminate the response of Acute Myeloid Leukemia (AML) patients. The data we used was obtained from the DREAM 9 challenge, specifically: given a phosphoproteomic dataset composed of 231 measured proteins across 191 AML individuals, and knowing which prognostic (“complete remission” or “resistant”) the individuals had after receiving treatment, find the computational model that fits both prognostics. Our model was a family of Boolean networks (BNs) that explained the data of remission/resistant classes of AML patients. We later confirmed that the different rewiring of protein PIK3CA present in the learned BN families, confirmed biological reality.
The core of this research focused on how phosphoproteomic data was handled in order to produce discriminant data points. Such discriminant data points constitute an essential input of a tool, caspo, we developed to learn BNs. Retrieving discriminant models is classically done by machine learning and statistical methods. For the Dream 9 challenge these methods were proposed and focused mainly on clinical variables, since they found them more discriminant of AML patient response to treatment than proteomics data. Our graph-based approach applied a machine learnign logical program to obtain a discriminant dataset, used later by caspo to discover boolean gates between discriminant species connected in a prior regulatory network.
This approach is interesting in our current research because it allows to obtain discriminant input-data, focusing on a specific subset of species across a specific subset of individuals, to learn BNs. It relied on the high dimensionality of the dataset: 231 species measured for 191 AML individuals. Similar problems, for example considering single-cell data, have larger dimensions (e.g. 30000 genes across 1500 cells) and may generate multiple choices for input- data as well. Current machine learning approaches to handle single-cell data focus on similarites or co-expression regions (e.g. WGCNA) to reduce the dimension of genes. We believe it can be also interesting to focus on the differences, mutual-exclusiveness and contrasts of single-cell expression, across multiple states to select genes behaviors that allow us to learn a consensual model or families of BNs that explain different paths taken by cells to arrive to the same cell-fate.

Dernière modification le 02/09/2022