CausalCCC

FROM CELL-CELL COMMUNICATION METHODS TO INTEGRATIVE CAUSAL PATHWAYS

CausalCCC documentation

Welcome to the CausalCCC documentation!

A more comprehensive picture of cell-cell communication (CCC) lies in the integration of upstream and downstream intracellular pathways from the different ligand-receptor pairs of sender and receiver cells.

CausalCCC reconstruct gene-gene interaction networks across interacting cell types from single-cell or spatial transcriptomic data

You can also use our demo dataset, directly in the workbench page !

Here you will find the following vignettes :

  • Quick start: a beginner-friendly introduction

  • More Advanced: a more detailed vignette with flexibility for your biological question

  • A tutorial on how to interpret your CausalCCC network

CausalCCC integrates a robust causal network reconstruction method, MIIC (Multivariate Information-based Inductive Causation), with CCC analysis.

CausalCCC output


Quick start

How do I run CausalCCC ?

This vignette guides you to prepare your Single-Cell data to use CausalCCC. This is a beginner-friendly presentation. See Advanced mode to personalize the CausalCCC pipeline to your biological question.


To create a CausalCCC network, you only need a list of L-R links from your CCC analysis and the raw count expression matrices of the sender genes and the receiver genes you want to study.

example of causalCCC input files

If needed, we offer in our package:

  • an optional wrapper function to filter your single-cell data (Seurat, CSV, Excel …).

  • a feature selection option to help you select genes of interest.

  • a CCC analysis option to help you select L-R links of interest.

You can perform each or all of these options within one single function call.

my_files <- causalCCC.wrapper(SeuratObject)

With this wrapper function you can transform your single-cell object (Seurat example) into CausalCCC input files (wrapper output example).

Preparing your data to submit a causalCCC job

When you have your L-R links, and the genes and cells you want to study, and you just need help formatting your single-cell object.

You can run causalCCC.wrapper() and automatically prepare files to reconstruct an causalCCC network.

Make sure you can load the following packages:

### Mandatory Libraries 
library(tidyverse)
library(data.table)
library(dplyr)

### If you use Seurat
library(Seurat)


# You need to download both the MIIC R package adapted for CausalCCC and the CausalCCC package containing preprocessing automatic functions:
Sys.setenv(GITHUB_PAT = 'YOUR_GITHUB_API_TOKEN_HERE')
devtools::install_github("miicTeam/miic_R_package@causalccc", force = T)
devtools::install_github("miicTeam/CausalCCC", force = T)

library(miic)
library(causalCCC)

With a Seurat object

Download Seurat example

#demo dataset
load("causalCCC_demo_Seurat.RData")

Download L-R links dataframe example

# Define your senders, your receivers and the metadata where they can be found

senders <-  c('B') 
receivers <-  c('CD8 T') 
interact_ident <- "seurat_annotations" 

#Define the genes you are interested in reconstructing network (we offer a feature selection tool if needed, see below)

genes_senders <- c("CD74", "RPL11", "RPS2", "RPS15A", "EEF1A1", "RPL10", "GAPDH", 
                  "RPL6", "RPL13A", "RPS14", "RPLP1", "RPS12", "RPL31", "RPS3", 
                  "COTL1", "PTMA", "MALAT1", "RPS4X", "RPS7", "RPL32", "RPS8")

genes_receivers <- c("GZMH", "HIPK1", "PCNA", "BAG2", "NAAA", "ASCL2", "SAMM50", 
                    "CCL4L1", "CD83", "ZNF175", "ACAD9", "SLC12A2", "CD151", "CCL5", 
                    "ATG7", "FABP5", "ZNF438", "NKG7", "FBXO28", "PROK2", "MARS2", 
                    "PRDM15", "NSMCE2", "B2M", "HLA-C", "HLA-A", "HLA-B", "HLA-DRA") 

#Load your CCC analysis (we offer a CCC analysis tool if needed, see below)

interact_edges <- read.table("causalCCC_demo_CCClinks.tsv", sep = "\t", header = T)

The causalCCC.wrapper() create a folder named “CausalCCC_files” with the wrapper output files in your working directory.

my_files <- causalCCC::causalCCC.wrapper(SeuratObject,
                               interact_ident = interact_ident,
                               senders_name = senders,
                               receivers_name = receivers,
                               genes_senders = genes_senders,
                               genes_receivers = genes_receivers, 
                               interact_edges = interact_edges)

They are many parameters you can specify for the wrapper function, you can read about them in the R documentation vignette or in Advanced mode.

Now you can run the job on the CausalCCC webserver.

On the workbench you can directly upload the files you just created with the wrapper.

example of the demo causalCCC network on the webserver

With a dataframe

The dataframe can be whichever extension file you like. It must be structured such that columns are genes or metadata and rows are cells.

Download dataframe example

#demo dataset
load("causalCCC_demo_df.RData")
#or
df <- read.csv("causalCCC_demo_df.csv")
#or
df <- read.csv("causalCCC_demo_table.csv")
#or
df <- read_excel("causalCCC_demo_df.xlsx")

Download L-R links dataframe example

# Define your senders, your receivers and the metadata where they can be found

senders <-  c('B') 
receivers <-  c('CD8 T') 
interact_ident <- "seurat_annotations" 

#Define the genes you are interested in reconstructing network (we offer a feature selection tool if needed, see below)

genes_senders <- c("CD74", "RPL11", "RPS2", "RPS15A", "EEF1A1", "RPL10", "GAPDH", 
                  "RPL6", "RPL13A", "RPS14", "RPLP1", "RPS12", "RPL31", "RPS3", 
                  "COTL1", "PTMA", "MALAT1", "RPS4X", "RPS7", "RPL32", "RPS8")

genes_receivers <- c("GZMH", "HIPK1", "PCNA", "BAG2", "NAAA", "ASCL2", "SAMM50", 
                    "CCL4L1", "CD83", "ZNF175", "ACAD9", "SLC12A2", "CD151", "CCL5", 
                    "ATG7", "FABP5", "ZNF438", "NKG7", "FBXO28", "PROK2", "MARS2", 
                    "PRDM15", "NSMCE2", "B2M", "HLA-C", "HLA-A", "HLA-B", "HLA-DRA") 

#Load your CCC analysis (we offer a CCC analysis tool if needed, see below)

interact_edges <- read.table("causalCCC_demo_CCClinks.tsv", sep = "\t", header = T)

The causalCCC.wrapper() create a folder named “CausalCCC_files” with the wrapper output files in your working directory.

my_files <- causalCCC.wrapper(df,
                               interact_ident = interact_ident,
                               senders_name = senders,
                               receivers_name = receivers,
                               genes_senders = genes_senders,
                               genes_receivers = genes_receivers, 
                               interact_edges = interact_edges)

If you need Cell-cell communication analysis

We integrate a CCC analysis using the LIANA pipeline.

### For CCC selection
library(liana)

Now you can specify which CCC method you want from LIANA available options :

Code Name
connectome Connectome
logfc iTALK inspired 1-vs-rest LogFC score
natmi Network Analysis Toolkit for the Multicellular Interactions
sca SingleCellSignalR
cellphonedb CellPhoneDB

LIANA is always integrating new CCC methods and this list will be updated as LIANA release new CCC options.

Download Seurat example

#demo dataset
load("causalCCC_demo_Seurat.RData")

Define the parameters as usual, here we show you how to write metadata lists if you want to add one:

# Specify the CCC method you want to use to select L-R edges (optional)
CCC_method <- "natmi"

# Set your working directory here
wd_path <- "~/CausalCCC/vignettes"

# Precise assay of interest
assay_name <- 'RNA'

# Specify the species genes format (mouse or human)
species <- "human"

# Define your senders, your receivers and the metadata where they can be found

senders <-  c('B') 
receivers <-  c('CD8 T') 
interact_ident <- "seurat_annotations" 

#genes and metadata of interest for the senders population

goi_senders <- c('HLA-DQA1') 
metadata_senders <- list()

#genes and metadata of interest for the receivers population

goi_receivers <- c('LAG3') 
metadata_receivers <- list()


# exemple:
# metadata <- list(
#   treatment = c("Control", "Treated"),
#   another_meta = c("Level1", "Level2", "Level3"),
#   continuous_meta = NULL
# )

By setting do_CCC = TRUE, the wrapper will perform both CCC analysis and feature selection as we assume that without L-R links upstream and downstream genes are undefined.

my_files <-  causalCCC.wrapper(SeuratObject,
                                assay_name = assay_name,
                                interact_ident = interact_ident,
                                senders_name = senders,
                                receivers_name = receivers,
                                species =  species,
                                do_CCC = TRUE,
                                CCC_method = "sca",
                                goi_senders = goi_senders,
                                goi_receivers = goi_receivers,
                                metadata_senders = metadata_senders,
                                metadata_receivers = metadata_receivers,
                                wd_path = wd_path)

If you want feature selection

Maybe you obtained L-R links, but do not know which other genes to include. We offer a unsupervized feature selection tool that will select upstream and downstream genes based on the list of ligands and receptors of interest. Optionally, you may also extend the selection based on a small list of genes and metadata (eg experimental condition, treatment vs control, etc) interesting for your biological question.

By setting do_MIselect = TRUE, the wrapper will chose informative genes based on the variables of interest using mutual information computation.

We showcase it here with a Seurat object but it can be done with dataframes too.

Download Seurat example

#demo dataset
load("causalCCC_demo_Seurat.RData")

Download L-R links dataframe example

# Set your working directory here

wd_path <- "~/CausalCCC/vignettes/"

# Precise assay of interest

assay_name <- 'RNA'

# Specify the species genes format (mouse or human)

species <- "human"


# Define your senders, your receivers and the metadata where they can be found

senders <-  c('B') 
receivers <-  c('CD8 T') 
interact_ident <- "seurat_annotations" 

#Define optional variables of interest 

goi_senders <- c("CD74")

goi_receivers <- NULL

#Load your CCC analysis (we offer a CCC analysis tool if needed, see below)

interact_edges <- read.table("causalCCC_demo_CCClinks.tsv", sep = "\t", header = T)
my_files <-  causalCCC.wrapper(SeuratObject,
                                assay_name = assay_name,
                                interact_ident = interact_ident,
                                senders_name = senders,
                                receivers_name = receivers,
                                species =  species,
                                interact_edges = interact_edges,
                                do_MIselect = TRUE,
                                goi_senders = goi_senders,
                                wd_path = wd_path)
Advanced mode: tailor your CausalCCC network

This vignette goes deeper into the preparation of your Single-Cell object when using CausalCCC. This is an optional workflow to fine tune CausalCCC’s pipeline to your biological question.


To create a CausalCCC network, you only need a list of L-R links from your CCC analysis and the raw count expression matrices of the sender genes and receiver genes you want to study.

example of causalCCC input files

First read the Quick start section before exploring how to tailor your CausalCCC network.

In Quick start, we introduced a wrapper function causalCCC.wrapper() that performs the full CausalCCC preparation pipeline. This is actually a wrapper of 5 CausalCCC functions that can help you tailor your input files :

details of the wrapper function

In this vignette, we will detail how to create your own pipeline by modifying the default parameters of these functions.

Reminder: the goal is to select Ligand-Receptor links and genes of interest to reconstruct the most informative signaling pathways. CausalCCC networks highlight upstream and downstream gene interaction pathways underlying cell-cell communication (CCC) between sender and receiver cells.

CausalCCC does not require preprocessing and offers detailed help in its R package.

Initialization

Make sure you can load the following packages :

knitr::opts_chunk$set(echo = T)



### Mandatory Libraries 
library(tidyverse)
library(data.table)
library(dplyr)

### If you use Seurat
library(Seurat)


### For CCC selection - optional if you want to use your own L-R list
library(liana)

### For network aesthetic display
library(rjson)



# You need to download both the MIIC R package adapted for CausalCCC and the CausalCCC package containing preprocessing automatic functions:
Sys.setenv(GITHUB_PAT = 'YOUR_GITHUB_API_TOKEN_HERE')
devtools::install_github("miicTeam/miic_R_package@causalccc", force = T)
devtools::install_github("miicTeam/CausalCCC", force = T)

library(miic)
library(causalCCC)
#Set your working directory here
wd_path <- "~/CausalCCC/vignettes/demo_data"

# Create directories
dir.create(file.path(wd_path,"plots/"))
dir.create(file.path(wd_path,"MI_tables/"))
dir.create(file.path(wd_path,"CausalCCC_files/"))

First, load your single-cell object, define the sender and receiver populations of interest, reported in the metadata “interact_ident” column.

Download Seurat example

#demo dataset
load("causalCCC_demo_Seurat.RData")

If you have multiple subtypes of senders (or receivers), give the general population name of the senders (or receivers), and create a “subtype” metadata variable with the specific subpopulation names.

Add any metadata of interest to the metadata list, with ordered Levels for categorical variables (this can be arbitrary and does not influence computation but only visualization of CausalCCC networks: red edges for activation / correlation and blue edges for repression / anticorrelation). If the metadata is continuous, just put NULL instead of levels (see example below).

#Precise assay of interest

assay_name <- 'RNA'

#Define your senders, your receivers and the metadata where they can be found

senders <-  c('B') 
receivers <-  c('CD8 T') 
interact_ident <- "seurat_annotations" 
Idents(SeuratObject) <- interact_ident

# Specify the species genes format (mouse or human)

species <- "human"

### YOUR BIOLOGICAL QUESTION

#genes and metadata of interest for the senders population

goi_senders <- c('HLA-DQA1') 
metadata_senders <- list()

#genes and metadata of interest for the receivers population

goi_receivers <- c('LAG3') 
metadata_receivers <- list()

# exemple:
# metadata <- list(
#   treatment = c("Control", "Treated"),
#   another_meta = c("Level1", "Level2", "Level3",
#   continuous_meta = NULL)
# )
cat(paste("Wrapping your Seurat object to run causalCCC:\n",
  senders, "are the senders population\n",
  receivers, "are the receivers population\n",
  "and are present in the", interact_ident, "column.\n"))
  Sys.sleep(2)

  Idents(SeuratObject) <- interact_ident
  SeuratObject <- subset(SeuratObject, idents = c(senders, receivers))

Opt: Ligand-Receptors LIANA selection

CausalCCC is an integration of the MIIC network reconstruction algorithm with CCC analysis. So naturally the first step is to retrieve L-R links from CCC analysis. You can use knowledge-based links or L-R links computed from any CCC method. In our use cases, we use NicheNet and CellChat CCC methods. CausalCCC R package offers an integration with LIANA which encompasses many other CCC methods like iTALK and CellphoneDB.

Here is how to use our integration of LIANA with causalCCC.links(). This function is only available for Seurat objects.

Now you can specify which CCC method you want from LIANA available options :

Code Name
connectome Connectome
logfc iTALK inspired 1-vs-rest LogFC score
natmi Network Analysis Toolkit for the Multicellular Interactions
sca SingleCellSignalR
cellphonedb CellPhoneDB

The parameters of causalCCC.links() are:

  • seurat_object A Seurat object containing single-cell data.

  • species A character string indicating the species. Either “human” (default) or “mouse”.

  • assay_name A character string specifying the assay name in the Seurat object. Default is “RNA”.

  • senders A non-empty character string of sender cell types.

  • receivers A non-empty character string of receiver cell types.

  • interact_ident A character string specifying the identity column in the Seurat object where senders and receivers can be found.

  • CCC_method A character string specifying the CCC method to use, see show_methods() of LIANA

  • n_CCClinks A numeric specifiying how many L-R to keep. Default is 30.

The function returns the full liana output and we need to create the interact_edges file with only ligands and receptors.

#Get the LIANA significant found interactions from senders to receivers

result <- causalCCC.links(SeuratObject,
                                    species = species,
                                    assay_name = assay_name,
                                    interact_ident = interact_ident,
                                    senders = senders,
                                    receivers = receivers,
                                    CCC_method = "sca",
                                    n_CCClinks = 20)


write.table(result, file = file.path(wd_path,"LIANA_output.csv"), quote = F, sep = ",")

Extract from LIANA output consensus interaction edges to include in the CausalCCC network:

#Extract ligands and receptors

ligands <- unique(as.vector(result$ligand.complex))
receptors <- unique(as.vector(result$receptor.complex))
# some receptors are multiple, we need to unlist them
receptors <- unique(unlist(str_split(receptors, "_")))


#Create the list of CCC links. 

interact_edges <- data.frame(ligands = character(), receptors = character())

for (i in 1:nrow(result)) {
  oneligand <- result$ligand.complex[i]
  onereceptor <- result$receptor.complex[i]
  onereceptor <- unique(unlist(str_split(onereceptor, "_")))
  for (onerecp in onereceptor) {
    interact_edges[nrow(interact_edges) +1,] <- c(oneligand, onerecp)
  }
}

interact_edges <- interact_edges[!(duplicated(interact_edges)),]

write.table(interact_edges, file.path(wd_path,"CausalCCC_files/demo_interactEdges.tsv"), row.names=F, quote=F, sep="\t")

Opt: Genes feature selection

The second step is to define which upstream and downstream genes you want to include in your CausalCCC network. These genes can be knowledge-based, or the result of a feature selection method. You can use the feature selection method of your choice or use our method causalCCC.MIselection(). This function offers an unsupervized feature selection tool based on fast pairwise mutual information computation.

Explanation: an unsupervized way of discovering feature importance in your dataset is to calculate the Mutual Information (MI) between all pairs of variables. Variables that share a lot of information with many other variables are necessary to understand the underlying interactions in your dataset (these signals could be artefacts or true interactions, both needs to be discovered to analyze your dataset).

How to: As computing mutual information between all pairs of variables (25000*25000) takes too long and might highlight strong associations unrelated to cell-cell interactions, we select up to 15 genes (including ligand and/or receptor genes) and up to 15 metadata as features of interest and compute their shared information with all the others variables (30*25000).

The parameters of causalCCC.Miselection() are:

  • data_input [a dataframe or a Seurat object] A Single-Cell transcriptomics object, dataframe or Seurat. If dataframe must contains genes and metadata as variables and cells as observations.

  • assay_name [a string]. For Seurat objects, gives the name of the assay to take the transcriptomics raw counts from (default is ‘RNA’)

  • interact_ident [a string]. Gives the name of the metadata containing the celltypes population

  • oneinteract [a string]. Either the senders tag or the receivers tag

  • goi [a vector] A list of genes of interest in your dataset. Must be of length <15. These genes will be individually used as pivot variable to look for mutual information in your dataset.

  • metadata_list [a vector] A list of metadata names of interest in your dataset. Must be of length <15. These metadata will be individually used as pivot variable to look for mutual information in your dataset.

  • n_genes [a numeric] The number of genes to keep after pairwise mutual information ranking. Default depends on the length of goi (between 15 and 100).

  • plot [a boolean] A boolean to specify if you want a heatmap plot of the highest mutual information found with the features of interest, default is FALSE. Will be saved in output_dir

  • color_heatmap [a string] A color for the high mutual informations values in the color scale.

  • save [a boolean] A boolean to specify if you want to save the full mutual information table, default is FALSE. Will be saved in output_dir

  • output_dir [a string] A string to specify an output directory to export optional outputs to, default is “MI_tables”

  • return_full [a boolean] If true the function returns the full table of MI computation instead of only the top genes

The function returns a vector of genes of length n_genes sharing the highest mutual information with the given features of interest. Optionnally, it returns the full dataframe of MI computation, MI heatmaps plots and saves csv files.

#Find genes that share the most information with your biological question :
MI_senders_genes <- causalCCC.MIselection(data_input = SeuratObject,
                                 assay_name = "RNA",
                                 interact_ident = interact_ident,
                                 oneinteract = senders,
                                 goi = c(ligands[1:5],goi_senders),
                                 metadata_list = names(metadata_senders),
                                 save = T,
                                 output_dir = file.path(wd_path, "MI_tables"),
                                 color_heatmap = "darkgreen",
                                 plot = T)

MI_receivers_genes <- causalCCC.MIselection(data_input = SeuratObject,
                                 assay_name = "RNA",
                                 interact_ident = interact_ident,
                                 oneinteract = receivers,
                                 goi = c(receptors,goi_receivers),
                                 metadata_list = names(metadata_receivers),
                                 save = T,
                                 output_dir = file.path(wd_path, "MI_tables"),
                                 color_heatmap = "darkorange",
                                 plot = T)

Mutual Information heatmap of B cells

Mutual Information heatmap of CD8 T cells

Create CausalCCC input files

Finally, you can save the files and are ready to run CausalCCC on the web server.

The causalCCC.mosaic() function only needs the names of the metadata columns but the causalCCC.state_order() needs the named list because this is where the increasing levels are defined.

genes_senders <- unique(c(ligands, MI_senders_genes, goi_senders))
genes_receivers <- unique(c(receptors, MI_receivers_genes, goi_receivers))
  

## Create the input mosaic matrix

causalCCC_df <- causalCCC.mosaic(data_input = SeuratObject,
                                assay_name = assay_name,
                                interact_ident = interact_ident,
                                senders_name = senders,
                                receivers_name = receivers,
                                genes_senders= genes_senders,
                                genes_receivers = genes_receivers,
                                metadata_senders = names(metadata_senders),
                                metadata_receivers = names(metadata_receivers))

## Create the state order

causalCCC_st <- causalCCC.state_order(mosaic_data_table = causalCCC_df,
                                            genes_senders= genes_senders,
                                            genes_receivers = genes_receivers,
                                            ligands = ligands,
                                            receptors = receptors,
                                            metadata_senders = metadata_senders,
                                            metadata_receivers = metadata_receivers)


## Save files

cat(paste("Saving the causalCCC files in", file.path(wd_path,"CausalCCC_files/")))


## Create the network layout

network_layout <- causalCCC.layout(causalCCC_st, network_height = 8)
file <- file(file.path(wd_path,"CausalCCC_files/causalCCC_layout.json"))
writeLines(network_layout, file)
close(file)

write.table(causalCCC_df, file = file.path(wd_path,"CausalCCC_files/causalCCC_df.csv"), quote = F, sep = ",", row.names = F)
write.table(causalCCC_st, file = file.path(wd_path, "CausalCCC_files/causalCCC_st.tsv"), quote = F, sep = "\t", row.names = F)

#Fix ligands and receptors names when duplicated
duplicated_ligands <- intersect(ligands,genes_receivers)
duplicated_receptors <- intersect(receptors, genes_senders)

interact_edges$ligands <- ifelse(interact_edges$ligands %in% duplicated_ligands, paste0(interact_edges$ligands, "_senders"), interact_edges$ligands)
interact_edges$receptors <- ifelse(interact_edges$receptors %in% duplicated_receptors, paste0(interact_edges$receptors, "_receivers"), interact_edges$receptors)

write.table(interact_edges, file.path(wd_path,"CausalCCC_files/causalCCC_interactEdges.tsv"), row.names=F, quote=F, sep="\t")



output_files <- list(
  mosaic_table = causalCCC_df,
  state_order = causalCCC_st,
  interact_edges = interact_edges,
  network_layout = network_layout
)

Now run the job on the MIIC webserver !

You can tailor CausalCCC analysis to your biological question. For instance, one of our use cases showcases cell signaling within a trio of cell populations interacting with one another; It is indeed possible to reconstruct a CausalCCC network including more than two celltypes and where cells can be both senders and receivers :

CausalCCC
CausalCCC
CausalCCC - Python tutorial

Prerequisites and dependencies

CausalCCC preprocessing only requires python >= 3.9 and the following packages. It takes an Anndata object as input.

The same process apply to a Squidpy or MuData object.

import scipy.sparse
import anndata as ad
import scanpy as sc
import pandas as pd
import numpy as np
import os

Step 0 - Load your Anndata object and run a CCC methodology of your choice

If you already have your data prepared, you might be able to skip this step!

You already have:

  • Your cell/gene matrix in a table format? (i.e csv, txt, tsv..?)

  • A list of ligand and receptor pair of your choice?

    Good news, you don’t need this vignette! You can go directly on CausalCCC webserver!

os.chdir("path/to/your/adata")
adata = sc.read_h5ad(file_path)
print(adata)

We encourage you to run any cell-cell communication method you like, or to even bring your own ligand-receptor pair of interest. However, if you need assistance with running cell-cell communication methods, we here suggest to use the liana+ example.

Suggested Tutorials

For more detailed guidance on running cell-cell communication methods, you can refer to the following python tutorials:

Step 1 - Create your mosaic

Next, you will need to structure your data in the “mosaic” format.

Assigning sender and receiver populations

Before proceeding, ensure that your adata.obs metadata includes a column named ['interact_ident']. This column should specify, for each cell, whether it is a sender, a receiver, or NA (if the cell is neither).

If you have a “celltype” column in your adata.obs metadata, you can assign for example:

senders = ['T CD4 Naive', 'T CD4 Memory'] #and 
receivers = ['B intermediate']. #Then, initialize:
adata.obs['interact_ident'] = 'Other' #  (or NA) for example, and run:
adata.obs.loc[adata.obs['celltype'].isin(senders), 'interact_ident'] = 'senders' adata.obs.loc[adata.obs['celltype'].isin(receivers), 'interact_ident'] = 'receivers' 
#to attribute the said interact_ident. You can finally check with:
adata.obs['interact_ident'].value_counts() #the attribution. 

In a “mosaic” format, data for senders and receivers is arranged so that only the relevant values for each cell type are filled, while non-relevant parts are left as NA. This creates a “mosaic” pattern of filled values and NAs across the table.

Illustration

  1. Sender’s rows:
    • For cells designated as senders, sender-specific data (e.g., ligand expression values, expression values, or metadata values) are filled in.
    • Receiver-specific data (e.g., receptor values) are left as NA.
  2. Receive’rs rows:
    • For cells designated as receivers, receiver-specific data (e.g., receptor expression valuesexpression values, or metadata values) are filled in.
    • Sender-specific data are left as NA.

This arrangement results in a “mosaic” pattern of populated values and NAs across the matrix. Here’s a simplified example:

interact_ident Expression (Sender) Expression (Receiver)
Cell 1 Sender 0.85 NA
Cell 2 Sender 0.22 NA
Cell 3 Sender 0.76 NA
Cell 4 Receiver NA 1.0
Cell 5 Receiver NA 0.93

Here:

  • Sender cells (Cell 1, 2,3) have their expression values filled, with NA for receptor values or receiver’s specific genes.
  • Receiver cells (Cell 4, 5) have their expression values filled, with NA for ligand values or for sender’s specific genes.

    This structure helps separate the information relevant to senders and receivers. If a gene is expressed in both senders and receivers, the function will append a “_sender” or ”_receiver” tag on its name, to create the adequate mosaic format.

    Load the causalCCC_mosaic function
    def causalCCC_mosaic(adata, interact_ident, senders_name, receivers_name, 
                         genes_senders, genes_receivers, metadata_senders=None, metadata_receivers=None):
        """
        Create a mosaic datatable for causalCCC using anndata object.
        
        :param adata: anndata object (scanpy or else)
        :param interact_ident: column name in obs that identifies sender/receiver population. 
        :param senders_name: name identifying senders in interact_ident column
        :param receivers_name: name identifying receivers in interact_ident column
        :param genes_senders: list of genes for senders
        :param genes_receivers: list of genes for receivers
        :param metadata_senders: list of metadata for senders (optional)
        :param metadata_receivers: list of metadata for receivers (optional)
        
        :return: pandas dataframe containing the mosaic datatable
        """
        if interact_ident not in adata.obs.columns:
            raise ValueError(f"The metadata '{interact_ident}' is not found in the anndata object")
    
        duplicated_genes = list(set(genes_senders) & set(genes_receivers))
        duplicated_meta = []
        if metadata_senders and metadata_receivers:
            duplicated_meta = list(set(metadata_senders) & set(metadata_receivers))
        def prepare_table(adata, cell_type, genes, metadata=None):
            sub_adata = adata[adata.obs[interact_ident] == cell_type]
            if sub_adata.shape[0] == 0:
                raise ValueError(f"No cells found for {cell_type}")
            sub_matrix = pd.DataFrame(sub_adata[:, genes].X.T.toarray(), columns=genes)
            sub_matrix = sub_matrix.sample(frac=1).reset_index(drop=TRUE)  
            if metadata:
                for onemeta in metadata:
                    if onemeta not in sub_adata.obs.columns:
                        raise ValueError(f"Metadata '{onemeta}' not found for {cell_type}")
                    sub_matrix[onemeta] = sub_adata.obs[onemeta].values
                    if sub_matrix[onemeta].isna().all():
                        print(f"Warning: Metadata '{onemeta}' contains only NA values and will be excluded")
                        sub_matrix.drop(columns=[onemeta], inplace=TRUE)
            
            return sub_matrix
    
        # Prepare senders and receivers tables
        print(f"Preparing {senders_name} cells table ...")
        senders_table = prepare_table(adata, senders_name, genes_senders, metadata_senders)
        
        print(f"Preparing {receivers_name} cells table ...")
        receivers_table = prepare_table(adata, receivers_name, genes_receivers, metadata_receivers)
    
        # Handle duplicated genes and metadata
        if duplicated_genes:
            print(f"Duplicated genes found: {duplicated_genes}, renaming for senders/receivers ...")
            receivers_table.rename(columns={gene: gene + "_receivers" for gene in duplicated_genes}, inplace=TRUE)
            senders_table.rename(columns={gene: gene + "_senders" for gene in duplicated_genes}, inplace=TRUE)
        
        if duplicated_meta:
            print(f"Duplicated metadata found: {duplicated_meta}, renaming for senders/receivers ...")
            receivers_table.rename(columns={meta: meta + "_receivers" for meta in duplicated_meta}, inplace=TRUE)
            senders_table.rename(columns={meta: meta + "_senders" for meta in duplicated_meta}, inplace=TRUE)
        all_columns = list(set(senders_table.columns) | set(receivers_table.columns))
        
        for col in all_columns:
            if col not in senders_table.columns:
                senders_table[col] = np.nan
            if col not in receivers_table.columns:
                receivers_table[col] = np.nan
    
        mosaic_table = pd.concat([senders_table, receivers_table], ignore_index=TRUE)
        
        return mosaic_table

    You can now run it in with your object in this way:

    mosaic_df = causalCCC_mosaic(adata, interact_ident='interact_ident', senders_name='sender', 
                              receivers_name='receiver', genes_senders=['Gene_0'], 
                              genes_receivers=['Gene_1', 'Gene_2'], 
                              metadata_senders=[], metadata_receivers=[])
    Have you noticed? You can specify genes of interests as a dictionary in genes_senders=[Gene_A], or genes_receivers= [Gene_B, Gene_C]. They will automatically be added to the network.
    • Not sure about how to select your genes of interest? Do not hesitate to look at our mutual information based selection method! See here.
  • Step 2 - Create your state order table

    Then, we have to generate the MIIC state_order. The state order is an optional file that allows you to input optional information for the computation (such as contextual variables, types of variables (otherwise detected automatically)) and information for the display (groups of nodes, levels ordering of categorical data). Here is a brief description:

    • var_type column:
      • 0 is categorical/discrete

      • 1 is continuous

        We advise to put a continuous variable with less than 5 different values as discrete

    • levels_increasing_order column:
      • If the variable is categorical and the categories can be ordered (e.g., “Non-treated”, “Treated”), you can give the order of levels
    • is_contextual column:
      • 0 is non-contextual
      • 1 is contextual (meaning that no variable can cause this one). It could be prior knowledge, e.g., experimental variable
    • group and group_color columns:
      • These are to assign variables in different groups to color-code them on the MIIC network (webserver display only)
      • Note that this is aesthetic only and does not modify the MIIC algorithm.
    Load the causalCCC_state_order function
    def causalCCC_state_order(mosaic_df,
                              metadata_senders,
                              metadata_receivers,
                              genes_senders,
                              ligands,
                              receptors,
                              genes_receivers):
        """
        Create a state order file for causalCCC.
        
        Parameters:
        -----------
        mosaic_df : pd.DataFrame
            An output of causalCCC.mosaic()
        metadata_senders : dict
            A named dictionary of metadata (strings) with levels as items.
        metadata_receivers : dict
            A named dictionary of metadata (strings) with levels as items.
        genes_senders : list of str
            A list of selected genes for the senders cells.
        ligands : list of str
            A list of selected CCC genes for the senders cells.
        receptors : list of str
            A list of selected CCC genes for the receivers cells.
        genes_receivers : list of str
            A list of selected genes for the receivers cells.
        
        Returns:
        --------
        pd.DataFrame
            A state order that can be used for causalCCC network reconstruction.
        """
        if not isinstance(mosaic_df, pd.DataFrame):
            raise ValueError("mosaic_df must be a data frame.")
        if metadata_senders is not None and not isinstance(metadata_senders, dict):
            raise ValueError("metadata_senders is not a named dictionary")
        if metadata_receivers is not None and not isinstance(metadata_receivers, dict):
            raise ValueError("metadata_receivers is not a named dictionary")
        if not isinstance(genes_senders, list) or not all(isinstance(g, str) for g in genes_senders):
            raise ValueError("genes_senders must be a list of strings.")
        if not isinstance(genes_receivers, list) or not all(isinstance(g, str) for g in genes_receivers):
            raise ValueError("genes_receivers must be a list of strings.")
        if not isinstance(ligands, list) or not all(isinstance(g, str) for g in ligands):
            raise ValueError("ligands must be a list of strings.")
        if not isinstance(receptors, list) or not all(isinstance(g, str) for g in receptors):
            raise ValueError("receptors must be a list of strings.")
        
        # Initialize state_order df
        state_order = pd.DataFrame(columns=["var_names", "var_type", "levels_increasing_order", "is_contextual", "group", "group_color"])
        
        # Add .obs (metadata) information
        duplicated_meta = set(metadata_senders.keys()).intersection(metadata_receivers.keys())
        if metadata_senders:
            for onemeta, levels in metadata_senders.items():
                levels_increasing_order = ",".join(levels)
                meta_name = f"{onemeta}_senders" if onemeta in duplicated_meta else onemeta
                if meta_name in mosaic_df.columns:
                    state_order = state_order.append({
                        "var_names": meta_name,
                        "var_type": 0,
                        "levels_increasing_order": levels_increasing_order,
                        "is_contextual": 0,
                        "group": "metadata",
                        "group_color": "FFE397"
                    }, ignore_index=TRUE)
        
        if metadata_receivers:
            for onemeta, levels in metadata_receivers.items():
                levels_increasing_order = ",".join(levels)
                meta_name = f"{onemeta}_receivers" if onemeta in duplicated_meta else onemeta
                if meta_name in mosaic_df.columns:
                    state_order = state_order.append({
                        "var_names": meta_name,
                        "var_type": 0,
                        "levels_increasing_order": levels_increasing_order,
                        "is_contextual": 0,
                        "group": "metadata",
                        "group_color": "FFD050"
                    }, ignore_index=TRUE)
        
        # Identify duplicated genes (if you have the same gene in both sender and receiver populations)
        dups = [f"{gene}_senders" for gene in set(genes_senders).intersection(genes_receivers)] + \
               [f"{gene}_receivers" for gene in set(genes_senders).intersection(genes_receivers)]
        
        interact_meta = state_order["var_names"].tolist()
    
        for col in set(mosaic_df.columns) - set(interact_meta):
            group = ""
            color = ""
            if col in dups:
                gene, family = col.rsplit("_", 1)
            else:
                gene = col
                family = "senders" if gene in genes_senders else "receivers"
            
            if family == "senders":
                group = "senders genes"
                color = "EBF5D8"
                if gene in ligands:
                    group = "ligands"
                    color = "28827A"
            else:
                group = "receivers genes"
                color = "F0E2B6"
                if gene in receptors:
                    group = "receptors"
                    color = "DE6343"
    
            if mosaic_df[col].nunique() <= 5:
                state_order = state_order.append({
                    "var_names": col,
                    "var_type": 0,
                    "levels_increasing_order": "",
                    "is_contextual": 0,
                    "group": group,
                    "group_color": color
                }, ignore_index=TRUE)
            else:
                state_order = state_order.append({
                    "var_names": col,
                    "var_type": 1,
                    "levels_increasing_order": "",
                    "is_contextual": 0,
                    "group": group,
                    "group_color": color
                }, ignore_index=TRUE)
        
        return state_order

    Now you can directly create your state order this way:

    state_order = causalCCC_state_order(mosaic_df,
                              metadata_senders=list(),
                              metadata_receivers=list(),
                              genes_senders,
                              ligands,
                              receptors,
                              genes_receivers)

    Tip

    You can put your own colors for the nodes later on :) See the Advanced Mode tab.

    How to interpret a CausalCCC network

    This vignette guides you to navigate the result page of a CausalCCC network. We offer many visualization tools for your CausalCCC network. Above your CausalCCC network you will see multiple tabs which we describe below :

    Graph

    example of an causalCCC output. We reconstruct the intracellular network in integration with the Ligands-Receptors edges found by the CCC methods

    CausalCCC webserver provides a unique interactive visualization tool to explore your CausalCCC networks. In particular, you can explore the reconstructed gene interaction pathways upstream of a specific ligand or downstream of a specific receptor using your mouse buttons:

    With the left click you select a gene (or an link) and see its neighborhood.

    With the right click you see a drop-down menu with 3 options :

    • Distribution: plot the distribution of the node you clicked on:
    Example of a distribution


    • Joint Distribution: after selecting “Plot joint distribution with”, you have to left-click on a second node to obtain the joint distribution between the 2 variables
    Example of a joint distribution. You can see the discretization of the continuous values into bins, cf MIIC method


    • See conditioned links: MIIC network reconstruction is able to decipher direct from indirect effects. Direct effects correspond to edges in the network, while unconnected variables might be associated through indirect effects. When selecting “See conditioned links”, the visible nodes are the ones that shared information with your variable. When you hover the cursor over a second node, it shows the variables (circled in green) accounting for the indirect association. For instance, starting from BTG1 and then hovering over HLA-DRA highlights the indirect paths, BTG1 - FAU - HLA-DRA, implying that BTG1 and HLA-DRA are related but become independent when conditioning on FAU.
    Example of a joint distribution. You can see the discretization of the continuous values into bins, cf MIIC method



    If you used an optional state_order file, you can find below the graph the legend of your color palette :

    Example of group colors

    By clicking once on it you highlight the nodes of a considered group. If you double-click, it highlights the considered group and their immediate neighbors :

    Neighborhood of the ligands group

    Summary

    The table contains the information about all pairs of variables either directly or indirectly associated and the corresponding indirect contributions:

    You can find the multivariate information values associated with the links

    • X or Y : you can select a variable to see the other variables it is directly or indirectly associated to

    • Type : P (positive), if there is a link between X and Y; N (negative), if there is no link but an indirect association

    • Ai : the variables indirectly contributing to the XY pair information [proportion of their contribution, sum up to 100% if type=N]

    • CCC edge : P (positive), if the considered edge is a CCC link; N (negative), if the considered edge is a MIIC-computed edge.

    • Info : the mutual information between X and Y computed by MIIC in log confidence unit (ie info = n_samples * ln(2) * info_bits).

    • Info bits : the mutual information between X and Y computed by MIIC in bits.

    • Info shifted : the residual mutual information after substracting the indirect contributions and the model complexity taking into account the number of samples (see MIIC publications for details). If Info shifted equals 0 (or is smaller than a small confidence threshold, set in Algorithm advanced settings) the link is removed. Otherwise there is a link representing the direct association between X and Y

    • Info shifted bits : Info shifted in bits.

    • Group X : If a state order was uploaded, the group label of x. For instance, if x is labeled as a “sender genes” in the state order, group x is equal to “sender genes”.

    • Group Y : If a state order was uploaded, the group label of y.

    Probabilities

    This table lists the orientation probabilities of the network links (an advanced feature of causal network reconstruction). The signature of causality in observational data is based on v-structure motifs: X -> Z <- Y (see eg Causality by J Pearl 2009). MIIC combines causal discovery and information theory frameworks and uniquely defines “orientation score” probabilities to confidently orient certain edges, see MIIC publications.

    Variable plots

    Here you can find the distributions of the variables in the CausalCCC network.

    Data dictionary

    This table lists the properties of all variables. If you uploaded a state order file, this is exactly this state order file, otherwise the variable types are detected automatically. CausalCCC wrapper function creates this file for you.

    Download

    In this tab you can download the different files obtained after reconstruction of the network.

    CausalCCC demo datasets

    Welcome to the datasets library!

    Here are listed all datasets used during the tutorials and for the use cases published in the CausalCCC paper.

    Use cases datasets:

    For each use case is linked the data source (from the original paper) and the CausalCCC input files used to obtain the networks in the CausalCCC paper.