More reliable and interpretable causal discovery methods
The reliability and interpretability of machine learning methods have become major issues in Artificial Intelligence. These questions are particularly important for AI approaches applied to sensitive data, such as medical data of patients for which AI-assisted recommendations can hardly rely on black box classifiers only and need to be explainable in terms of intelligible medical rationales.
Our specific interest lies in the reliability and interpretability of causal discovery methods from purely observational data. These constraint-based structure learning methods start with a first step, which consists in pruning a complete graph to obtain an undirected graph skeleton, that is subsequently oriented, Figure 1. All causal discovery methods perform this first step of removing dispensable edges, iteratively, whenever a separating set and corresponding conditional independence can be found. Yet, causal discovery methods lack robustness over sampling noise and are prone to uncover spurious conditional independences in finite datasets. In particular, there is no guarantee that the separating sets identified during the iterative pruning step remain consistent with the final graph in terms of indirect paths through these nodes.
To this end, we first proposed a simple modification of classical causal discovery algorithms to ensure that all separating sets identified to remove dispensable edges are consistent with the final graph, thus enhancing the explainability of causal discovery methods. It is achieved by repeating the constraint-based causal structure learning scheme, iteratively, while searching for separating sets that are consistent with the graph obtained at the previous iteration, Figure 1.
Figure 1. Iterative constraint-based structure learning algorithms with consistent separating sets. (Li et al. NeurIPS 2019)
Ensuring the consistency of separating sets can be done at a limited complexity cost, through the use of block-cut tree decomposition of graph skeletons, and is found to increase their validity in terms of actual d-separation. It also significantly improves the sensitivity of constraint-based methods while retaining good overall structure learning performance. Finally and foremost, ensuring sepset consistency improves the interpretability of constraint-based models for real-life applications, as discussed below.
More recently, we have significantly extended our original causal discovery method, MIIC. The novel, more reliable and interpretable method, iMIIC, outlined on Figure 2, brings a number of important improvements, which enhance its causal discovery performance on large scale synthetic and real-life datasets (Cabeli et al. 2021, Ribeiro-Dantas et al. 2024).
Figure 2. Algorithm scheme of iMIIC. (Ribeiro-Dantas et al. 2024).
First, iMIIC quantitatively improves the reliability of inferred orientations, based on a general information-theoretic principle (Cabeli et al. 2021, Ribeiro-Dantas et al. 2024). This implies to rectify all negative mutual information regularized for finite size effect, in Figure 2 (step 1), and results in only a few percents of false positive orientations, in Figure 2 (step 2), on challenging benchmarks adapted from real-world healthcare data. Second, iMIIC is uniquely able to distinguish "genuine" causes from "putative" and "latent" causal effects, in Figure 2 (step 3). This is an essential distinction to disambiguate the causal interpretation of oriented edges in inferred networks. Third, iMIIC quantifies indirect effects, while ensuring their consistency with the global network structure through the iterative scheme outlined in Figure 1. This is important to interpret indirect contributions in term of indirect paths through the corresponding contributor nodes in the inferred network, which is generally not possible with other causal discovery methods.
We showcased iMIIC causal discovery performance on synthetic and real-world healthcare data originating from more than 400,000 medical records of breast cancer patients from the Surveillance, Epidemiology, and End Results (SEER) program. The resulting breast cancer clinical network, Figure 3, provides a robust graphical model between 51 clinical, socio-economic and outcome variables. More than 90% of predicted causal effects appear correct, based on expert knowledge, while the remaining more unexpected direct and indirect causal effects can in fact be interpreted in terms of diagnostic procedures, therapeutic timing, patient preference or socio-economic disparity (Ribeiro-Dantas et al. 2024).
Figure 3. SEER breast cancer networks inferred by iMIIC. The 51 node network inferred by iMIIC from SEER dataset including 396,179 breast cancer patients diagnosed between 2010 and 2016. An expert knowledge validation of the causal effects inferred by iMIIC indicates that about 90% of predicted causal effects are correct, while an additional 8% of cause-effect relations seem plausible, based on clinical and epidemiological knowledge (Ribeiro-Dantas et al. 2024).
Hence, iMIIC provides a detailed and validated interpretation across all variables selected in this nation-wide cohort of 400,000 breast cancer patients. This exhaustive analysis uncovers many expected causal relations, such as the adverse consequence of metastasis and the protecting effect of ER+ and specifically PR+ status on death due to breast cancer, or the fact that year of birth is the primary reason for death due to other causes by the end of the study.
On the other hand, the effects of insurance coverage or marital status, which have been reported to reduce the risk of death due to breast cancer, are found to be entirely indirect and mainly mediated by treatments (60-80%), notably, surgery (>50%). In fact, surgery appears as the cornerstone of breast cancer therapy by first helping refine histological types, then guide therapeutic decisions on radiotherapy and breast reconstruction and ultimately prolong the survival delays of patients. Yet, iMIIC also correctly infers that the type of surgery (lumpectomy or mastectomy) at the primary site largely depends on the personal choice of early stage breast cancer patients between breast conservation or reconstruction alternatives. By contrast, other treatments, such as radiotherapy and chemotherapy, seem to have less decisive impacts on breast cancer outcome, which might be due in part to some under-reported treatment information in the SEER database. Radiotherapy even appears to be a consequence, not a cause, of vital status, suggesting that early death within the first few months after diagnosis may prevent radiotherapy for some patients who might have otherwise received this treatment, have they lived longer (Ribeiro-Dantas et al. 2024). Finally, iMIIC recovers direct associations between socio-economic county variables (such as median family income and cost of living index) and patient specific variables (such as tumor grade, radiotherapy, breast reconstruction, insurance), highlighting the healthcare system integration into the global economy. While higher costs of living are on average associated to more favorable cancer prognosis, presumably due to better preventive healthcare and more comprehensive insurance coverage, iMIIC also uncovers large disparities between family income and cost of living indices across counties (e.g. for L.A. county), leading to exacerbated financial burden with patients giving up expensive treatments or even dropping out of treatment (7% in L.A. versus 1.5% nationwide).
Application to RNAseq transcriptomic data
We have also applied iMIIC causal discovery method on bulk and single-cell transcriptomic data (Cosgrove et al. 2023; Miladinovic et al. 2024; Dupuis et al. 2025; Donada et al. 2026; Fusilier et al. 2026; Manfroi et al. 2026).
For instance, we studied the role of metabolism in hematopoietic differentiation in collaboration with the Perie lab, Institut Curie. To this end, we analyzed how metabolic heterogeneity in hematopoietic multipotent progenitors fuels innate immune cell production. iMIIC network predictions (Figure 4, left) have suggested that metabolic heterogeneity is an important regulatory component of hematopoiesis, which was subsequently demonstrated in vitro and in vivo (Figure 4, right).
Figure 4. On the role of metabolism in hematopoietic differentiation. iMIIC network uncovers metabolic gene association with myeloid / erythroid-lymphoid differentiation fate (left), using bulk RNA seq profiles from the haemopedia database. This suggests that metabolic heterogeneity is an important regulatory component of hematopoiesis, which was subsequently demonstrated in vitro and in vivo (right) (Cosgrove et al. bioRxiv 2023).
Another recent application concerns the analysis of scRNAseq data from a cellular therapy against multiple sclerosis in a mouse model in collaboration with the Fillatreau lab, Institut Necker Enfants Malades (Manfroi et al. 2026).
Other recent scRNAseq applications concern the analysis of gene regulatory pathways driving Extracellular Matrix remodeling by cancer cells and fibroblasts under macrophage depletion (Fusilier et al. 2026), and an integration of scRNAseq data with image-derived synchronicity markers to investigate cellular memory in human hematopoietic stem cells (Donada et al. 2026).
Finally, we have also developed the CausalCCC web server to reconstruct gene-gene interaction pathways across two or more interacting cell types from single-cell or spatial transcriptomic data, Figure 5 (Dupuis & Debeaupuis et al. 2025).
Figure 5. CausalCCC analysis: CausalCCC network reconstruction in Myocardiac Infraction, using (Kuppe et al. 2022) spatial transcriptomic data (10X visum, Patient 9). See CausalCCC web server (Dupuis & Debeaupuis et al. 2025).
Extension to time series data with applications to time-lapse microscopy imaging data
Another recent extension of MIIC algorithm concerns its temporal version (tMIIC) to analyze time series data, in either stationary (Simon et al. 2025) or non-stationary (Parent et al. 2025) regimes. We have applied this temporal causal discovery approach to time-lapse microscopy images of tumor-on-chip data (Simon et al. 2025) and tumoroid data (Parent et al. 2025). In particular, we have developed the CausalXtract pipeline to reconstruct causal networks from images of individual cancer cells in order to understand how treatment and the tumor microenvironment affect their morphology, division and death, as well as their interactions with immune cells, Figure 6 (Simon et al. 2025). We also reconstructed non-stationary temporal networks from bright field images of lab-grown tumor spheroids (tumoroids) in order to study their response to drug treatment over time and identify the morphological features that are most informative of drug efficacy (Parent et al. 2025).
Figure 6. CausalXtract pipeline (Simon et al. 2025) first segments individual cancer and immune cells and extract morphodynamic features from time-lapse images. a. A temporal MIIC network is then reconstructed from the extracted features and the experimentally controlled variables, such as Treatment and the presence Cancer Associated Fibroblasts (CAF). b. In particular, we found that CAFs in the tumor microenvironment directly inhibit cancer cell apoptosis, independently from treatment, which had not been reported so far (Simon et al. 2025).
Related Publications
Li H, Cabeli V, Sella N, Isambert H: Constraint-based Causal Structure Learning with Consistent Separating Set. Advances in Neural Information Processing Systems (NeurIPS) 32, 14257-14266 (2019). Pubmed | DOI | pdf Cabeli V, Li H, Ribeiro-Dantas M, Simon F, Isambert H: Reliable causal discovery based on mutual information supremum principle for finite dataset. Why21 @ NeurIPS2021 (2021). Pubmed | DOI | pdf | supp Ribeiro-Dantas MdC, Li H, Cabeli V, Dupuis L, Simon F, Hettal L, Hamy A-S and Isambert H: Learning interpretable causal networks from very large datasets, application to 400,000 medical records of breast cancer patients iScience, 27(5):109736 (2024). Pubmed | DOI | arxiv | pdf | pdf+supp Cosgrove J, Lyne A-M, Rodriguez I, Cabeli V, Conrad C, Tenreira-Bento S, Tubeuf E, Russo E, Tabarin F, Belloucif Y, Maleki-Toyserkani S, Reed S, Monaco F, Ager A, Lobry C, Bousso P, Fernandez-Marcos PJ, Isambert H, Arguello RJ, Perie L: Metabolically primed multipotent hematopoietic progenitors fuel innate immunity bioRxiv (2023). Pubmed | bioRxiv | pdf | supp Miladinovic O, Canto P-Y, Pouget C, Piau O, Radic N, Freschu P, Megherbi A, Prats CB, Jacques S, Hirsinger E, Geeverding A, Dufour S, Petit L, Souyri M, North T, Isambert H, Traver D, Jaffredo T, Charbord P, Durand C: A multistep computational approach reveals a neuro-mesenchymal cell population in the embryonic hematopoietic stem cell niche Development, 151, dev202614 (2024). Pubmed | DOI| pdf| supp Parent C, Honari H, Tocci T, Simon F, Zaidi S, Jan A, Aubert V, Delattre O, Isambert H, Wilhelm C, Viovy J-L: Label-free Machine Learning prediction of chemotherapy on tumor spheroids using a microfluidics droplet platform Small Science, 2500173 (2025). Pubmed | DOI | pdf| supp Simon F, Comes MC, Tocci T, Dupuis L, Cabeli V, Lagrange N, Mencattini A, Parrini MC, Martinelli E, Isambert H: CausalXtract: a flexible pipeline to extract causal effects from live-cell time-lapse imaging data eLife, 13:RP95485 (2025). Pubmed | DOI | bioRxiv | pdf | pdf+supp Dupuis L, Debeaupuis O, Simon F, Isambert H: CausalCCC: a web server to explore intracellular causal pathways enabling cell-cell communication Nucleic Acids Res, 53, W125 (2025). Pubmed | DOI | pdf Fusilier Z, Clement A, Simon F, Calvente I, Jean-Marie R, Mathieu M, Calmettes V, Piastra-Facon F, Quintana-Perez Y, George Clement J, Crestey L, Lumineau E, Henninger R, Tonani M, Manriquez V, Lacerda L, Bensaid L, de Villemagne P, Piaggio E, Semetey V, Coscoy S, Martini E, Scita G, Gelly J-C, Ivaska J, Isambert H, Goudot C, Pierobon P, Lennon-Dumenil A-M, Moreau HD: Macrophages restrict tumor immune infiltration by controlling collagen topography Science Immunology, 11 (117):eadw8291 (2026). Pubmed | DOI | bioRxiv | pdf Donada A, Hermange G, Tocci T, Midoun A, Prevedello G, Hadj Abed L, Dupre D, Sun W, Milo I, Tenreira Bento S, Pospori C, Innes A, Willekens C, Vargaftig J, Michonneau D, Lo Celso C, Servant N, Duffy KR, Isambert H, Cournede PH, Laplane L, Perie L: Clonal memory of cell division in humans diverges between healthy haematopoiesis and acute myeloid leukaemia Nature Commun, in press (2026). Pubmed | DOI | bioRxiv | pdf | supp Manfroi B, Dang VD, Bui-Thi C, Jungmann A, Borzakian S, Tong Y, Beauvineau C, Dupuis L, Guffart E, Borst K, Nguyen NT, Alvarez-Simon D, Chyzak G, Schäfer S, Gerlach RG, Frischbutter S, Hamann A, Salem Wehbe L, El Behi M, Luka M, Ménager M, Jouneau L, Boudinot P, Jung S, Winkler TH, Liblau R, Marignier R, Isambert H, Prinz M, von Kries P, Specker E, Walter J, Mahuteau-Betzer F, Fillatreau S: Induced regulatory B cells stably expressing IL-10 cure CNS autoimmunity by targeting local myeloid cells Immunity, in press (2026). Pubmed | DOI | pdf | supp