FROM CELL-CELL COMMUNICATION METHODS TO INTEGRATIVE CAUSAL PATHWAYS
A more comprehensive picture of cell-cell communication (CCC) lies in the integration of upstream and downstream intracellular pathways from the different ligand-receptor pairs of sender and receiver cells.
CausalCCC reconstruct gene-gene interaction networks across interacting cell types from single-cell or spatial transcriptomic data
You can also use our demo dataset, directly in the workbench page !
Quick start: a beginner-friendly introduction
More Advanced: a more detailed vignette with flexibility for your biological question
A tutorial on how to interpret your CausalCCC network
CausalCCC integrates a robust causal network reconstruction method, MIIC (Multivariate Information-based Inductive Causation), with CCC analysis.
This vignette guides you to prepare your Single-Cell data to use CausalCCC. This is a beginner-friendly presentation. See Advanced mode to personalize the CausalCCC pipeline to your biological question.
To create a CausalCCC network, you only need a list of L-R links from your CCC analysis and the raw count expression matrices of the sender genes and the receiver genes you want to study.
example of causalCCC input files
If needed, we offer in our package:
an optional wrapper function to filter your single-cell data (Seurat, CSV, Excel …).
a feature selection option to help you select genes of interest.
a CCC analysis option to help you select L-R links of interest.
You can perform each or all of these options within one single function call.
my_files <- causalCCC.wrapper(SeuratObject)
With this wrapper function you can transform your single-cell object (Seurat example) into CausalCCC input files (wrapper output example).
When you have your L-R links, and the genes and cells you want to study, and you just need help formatting your single-cell object.
You can run causalCCC.wrapper()
and automatically prepare files to reconstruct an causalCCC network.
Make sure you can load the following packages:
### Mandatory Libraries
library(tidyverse)
library(data.table)
library(dplyr)
### If you use Seurat
library(Seurat)
# You need to download both the MIIC R package adapted for CausalCCC and the CausalCCC package containing preprocessing automatic functions:
Sys.setenv(GITHUB_PAT = 'YOUR_GITHUB_API_TOKEN_HERE')
devtools::install_github("miicTeam/miic_R_package@causalccc", force = T)
devtools::install_github("miicTeam/CausalCCC", force = T)
library(miic)
library(causalCCC)
#demo dataset
load("causalCCC_demo_Seurat.RData")
Download L-R links dataframe example
# Define your senders, your receivers and the metadata where they can be found
senders <- c('B')
receivers <- c('CD8 T')
interact_ident <- "seurat_annotations"
#Define the genes you are interested in reconstructing network (we offer a feature selection tool if needed, see below)
genes_senders <- c("CD74", "RPL11", "RPS2", "RPS15A", "EEF1A1", "RPL10", "GAPDH",
"RPL6", "RPL13A", "RPS14", "RPLP1", "RPS12", "RPL31", "RPS3",
"COTL1", "PTMA", "MALAT1", "RPS4X", "RPS7", "RPL32", "RPS8")
genes_receivers <- c("GZMH", "HIPK1", "PCNA", "BAG2", "NAAA", "ASCL2", "SAMM50",
"CCL4L1", "CD83", "ZNF175", "ACAD9", "SLC12A2", "CD151", "CCL5",
"ATG7", "FABP5", "ZNF438", "NKG7", "FBXO28", "PROK2", "MARS2",
"PRDM15", "NSMCE2", "B2M", "HLA-C", "HLA-A", "HLA-B", "HLA-DRA")
#Load your CCC analysis (we offer a CCC analysis tool if needed, see below)
interact_edges <- read.table("causalCCC_demo_CCClinks.tsv", sep = "\t", header = T)
The causalCCC.wrapper()
create a folder named “CausalCCC_files” with the wrapper output files in your working directory.
my_files <- causalCCC::causalCCC.wrapper(SeuratObject,
interact_ident = interact_ident,
senders_name = senders,
receivers_name = receivers,
genes_senders = genes_senders,
genes_receivers = genes_receivers,
interact_edges = interact_edges)
They are many parameters you can specify for the wrapper function, you can read about them in the R documentation vignette or in Advanced mode.
Now you can run the job on the CausalCCC webserver.
On the workbench you can directly upload the files you just created with the wrapper.
example of the demo causalCCC network on the webserver
The dataframe can be whichever extension file you like. It must be structured such that columns are genes or metadata and rows are cells.
#demo dataset
load("causalCCC_demo_df.RData")
#or
df <- read.csv("causalCCC_demo_df.csv")
#or
df <- read.csv("causalCCC_demo_table.csv")
#or
df <- read_excel("causalCCC_demo_df.xlsx")
Download L-R links dataframe example
# Define your senders, your receivers and the metadata where they can be found
senders <- c('B')
receivers <- c('CD8 T')
interact_ident <- "seurat_annotations"
#Define the genes you are interested in reconstructing network (we offer a feature selection tool if needed, see below)
genes_senders <- c("CD74", "RPL11", "RPS2", "RPS15A", "EEF1A1", "RPL10", "GAPDH",
"RPL6", "RPL13A", "RPS14", "RPLP1", "RPS12", "RPL31", "RPS3",
"COTL1", "PTMA", "MALAT1", "RPS4X", "RPS7", "RPL32", "RPS8")
genes_receivers <- c("GZMH", "HIPK1", "PCNA", "BAG2", "NAAA", "ASCL2", "SAMM50",
"CCL4L1", "CD83", "ZNF175", "ACAD9", "SLC12A2", "CD151", "CCL5",
"ATG7", "FABP5", "ZNF438", "NKG7", "FBXO28", "PROK2", "MARS2",
"PRDM15", "NSMCE2", "B2M", "HLA-C", "HLA-A", "HLA-B", "HLA-DRA")
#Load your CCC analysis (we offer a CCC analysis tool if needed, see below)
interact_edges <- read.table("causalCCC_demo_CCClinks.tsv", sep = "\t", header = T)
The causalCCC.wrapper()
create a folder named “CausalCCC_files” with the wrapper output files in your working directory.
my_files <- causalCCC.wrapper(df,
interact_ident = interact_ident,
senders_name = senders,
receivers_name = receivers,
genes_senders = genes_senders,
genes_receivers = genes_receivers,
interact_edges = interact_edges)
We integrate a CCC analysis using the LIANA pipeline.
### For CCC selection
library(liana)
Now you can specify which CCC method you want from LIANA available options :
Code | Name |
---|---|
connectome |
Connectome |
logfc |
iTALK inspired 1-vs-rest LogFC score |
natmi |
Network Analysis Toolkit for the Multicellular Interactions |
sca |
SingleCellSignalR |
cellphonedb |
CellPhoneDB |
LIANA is always integrating new CCC methods and this list will be updated as LIANA release new CCC options.
#demo dataset
load("causalCCC_demo_Seurat.RData")
Define the parameters as usual, here we show you how to write metadata lists if you want to add one:
# Specify the CCC method you want to use to select L-R edges (optional)
CCC_method <- "natmi"
# Set your working directory here
wd_path <- "~/CausalCCC/vignettes"
# Precise assay of interest
assay_name <- 'RNA'
# Specify the species genes format (mouse or human)
species <- "human"
# Define your senders, your receivers and the metadata where they can be found
senders <- c('B')
receivers <- c('CD8 T')
interact_ident <- "seurat_annotations"
#genes and metadata of interest for the senders population
goi_senders <- c('HLA-DQA1')
metadata_senders <- list()
#genes and metadata of interest for the receivers population
goi_receivers <- c('LAG3')
metadata_receivers <- list()
# exemple:
# metadata <- list(
# treatment = c("Control", "Treated"),
# another_meta = c("Level1", "Level2", "Level3"),
# continuous_meta = NULL
# )
By setting do_CCC = TRUE
, the wrapper will perform both CCC analysis and feature selection as we assume that without L-R links upstream and downstream genes are undefined.
my_files <- causalCCC.wrapper(SeuratObject,
assay_name = assay_name,
interact_ident = interact_ident,
senders_name = senders,
receivers_name = receivers,
species = species,
do_CCC = TRUE,
CCC_method = "sca",
goi_senders = goi_senders,
goi_receivers = goi_receivers,
metadata_senders = metadata_senders,
metadata_receivers = metadata_receivers,
wd_path = wd_path)
Maybe you obtained L-R links, but do not know which other genes to include. We offer a unsupervized feature selection tool that will select upstream and downstream genes based on the list of ligands and receptors of interest. Optionally, you may also extend the selection based on a small list of genes and metadata (eg experimental condition, treatment vs control, etc) interesting for your biological question.
By setting do_MIselect = TRUE
, the wrapper will chose informative genes based on the variables of interest using mutual information computation.
We showcase it here with a Seurat object but it can be done with dataframes too.
#demo dataset
load("causalCCC_demo_Seurat.RData")
Download L-R links dataframe example
# Set your working directory here
wd_path <- "~/CausalCCC/vignettes/"
# Precise assay of interest
assay_name <- 'RNA'
# Specify the species genes format (mouse or human)
species <- "human"
# Define your senders, your receivers and the metadata where they can be found
senders <- c('B')
receivers <- c('CD8 T')
interact_ident <- "seurat_annotations"
#Define optional variables of interest
goi_senders <- c("CD74")
goi_receivers <- NULL
#Load your CCC analysis (we offer a CCC analysis tool if needed, see below)
interact_edges <- read.table("causalCCC_demo_CCClinks.tsv", sep = "\t", header = T)
my_files <- causalCCC.wrapper(SeuratObject,
assay_name = assay_name,
interact_ident = interact_ident,
senders_name = senders,
receivers_name = receivers,
species = species,
interact_edges = interact_edges,
do_MIselect = TRUE,
goi_senders = goi_senders,
wd_path = wd_path)
This vignette goes deeper into the preparation of your Single-Cell object when using CausalCCC. This is an optional workflow to fine tune CausalCCC’s pipeline to your biological question.
To create a CausalCCC network, you only need a list of L-R links from your CCC analysis and the raw count expression matrices of the sender genes and receiver genes you want to study.
example of causalCCC input files
First read the Quick start section before exploring how to tailor your CausalCCC network.
In Quick start, we introduced a wrapper function causalCCC.wrapper()
that performs the full CausalCCC preparation pipeline. This is actually a wrapper of 5 CausalCCC functions that can help you tailor your input files :
details of the wrapper function
In this vignette, we will detail how to create your own pipeline by modifying the default parameters of these functions.
Reminder: the goal is to select Ligand-Receptor links and genes of interest to reconstruct the most informative signaling pathways. CausalCCC networks highlight upstream and downstream gene interaction pathways underlying cell-cell communication (CCC) between sender and receiver cells.
CausalCCC does not require preprocessing and offers detailed help in its R package.
Make sure you can load the following packages :
knitr::opts_chunk$set(echo = T)
### Mandatory Libraries
library(tidyverse)
library(data.table)
library(dplyr)
### If you use Seurat
library(Seurat)
### For CCC selection - optional if you want to use your own L-R list
library(liana)
### For network aesthetic display
library(rjson)
# You need to download both the MIIC R package adapted for CausalCCC and the CausalCCC package containing preprocessing automatic functions:
Sys.setenv(GITHUB_PAT = 'YOUR_GITHUB_API_TOKEN_HERE')
devtools::install_github("miicTeam/miic_R_package@causalccc", force = T)
devtools::install_github("miicTeam/CausalCCC", force = T)
library(miic)
library(causalCCC)
#Set your working directory here
wd_path <- "~/CausalCCC/vignettes/demo_data"
# Create directories
dir.create(file.path(wd_path,"plots/"))
dir.create(file.path(wd_path,"MI_tables/"))
dir.create(file.path(wd_path,"CausalCCC_files/"))
First, load your single-cell object, define the sender and receiver populations of interest, reported in the metadata “interact_ident” column.
#demo dataset
load("causalCCC_demo_Seurat.RData")
If you have multiple subtypes of senders (or receivers), give the general population name of the senders (or receivers), and create a “subtype” metadata variable with the specific subpopulation names.
Add any metadata of interest to the metadata list, with ordered Levels for categorical variables (this can be arbitrary and does not influence computation but only visualization of CausalCCC networks: red edges for activation / correlation and blue edges for repression / anticorrelation). If the metadata is continuous, just put NULL instead of levels (see example below).
#Precise assay of interest
assay_name <- 'RNA'
#Define your senders, your receivers and the metadata where they can be found
senders <- c('B')
receivers <- c('CD8 T')
interact_ident <- "seurat_annotations"
Idents(SeuratObject) <- interact_ident
# Specify the species genes format (mouse or human)
species <- "human"
### YOUR BIOLOGICAL QUESTION
#genes and metadata of interest for the senders population
goi_senders <- c('HLA-DQA1')
metadata_senders <- list()
#genes and metadata of interest for the receivers population
goi_receivers <- c('LAG3')
metadata_receivers <- list()
# exemple:
# metadata <- list(
# treatment = c("Control", "Treated"),
# another_meta = c("Level1", "Level2", "Level3",
# continuous_meta = NULL)
# )
cat(paste("Wrapping your Seurat object to run causalCCC:\n",
senders, "are the senders population\n",
receivers, "are the receivers population\n",
"and are present in the", interact_ident, "column.\n"))
Sys.sleep(2)
Idents(SeuratObject) <- interact_ident
SeuratObject <- subset(SeuratObject, idents = c(senders, receivers))
CausalCCC is an integration of the MIIC network reconstruction algorithm with CCC analysis. So naturally the first step is to retrieve L-R links from CCC analysis. You can use knowledge-based links or L-R links computed from any CCC method. In our use cases, we use NicheNet and CellChat CCC methods. CausalCCC R package offers an integration with LIANA which encompasses many other CCC methods like iTALK and CellphoneDB.
Here is how to use our integration of LIANA with causalCCC.links()
. This function is only available for Seurat objects.
Now you can specify which CCC method you want from LIANA available options :
Code | Name |
---|---|
connectome |
Connectome |
logfc |
iTALK inspired 1-vs-rest LogFC score |
natmi |
Network Analysis Toolkit for the Multicellular Interactions |
sca |
SingleCellSignalR |
cellphonedb |
CellPhoneDB |
The parameters of causalCCC.links()
are:
seurat_object
A Seurat object containing single-cell data.
species
A character string indicating the species. Either “human” (default) or “mouse”.
assay_name
A character string specifying the assay name in the Seurat object. Default is “RNA”.
senders
A non-empty character string of sender cell types.
receivers
A non-empty character string of receiver cell types.
interact_ident
A character string specifying the identity column in the Seurat object where senders and receivers can be found.
CCC_method
A character string specifying the CCC method to use, see show_methods()
of LIANA
n_CCClinks
A numeric specifiying how many L-R to keep. Default is 30.
The function returns the full liana output and we need to create the interact_edges file with only ligands and receptors.
#Get the LIANA significant found interactions from senders to receivers
result <- causalCCC.links(SeuratObject,
species = species,
assay_name = assay_name,
interact_ident = interact_ident,
senders = senders,
receivers = receivers,
CCC_method = "sca",
n_CCClinks = 20)
write.table(result, file = file.path(wd_path,"LIANA_output.csv"), quote = F, sep = ",")
Extract from LIANA output consensus interaction edges to include in the CausalCCC network:
#Extract ligands and receptors
ligands <- unique(as.vector(result$ligand.complex))
receptors <- unique(as.vector(result$receptor.complex))
# some receptors are multiple, we need to unlist them
receptors <- unique(unlist(str_split(receptors, "_")))
#Create the list of CCC links.
interact_edges <- data.frame(ligands = character(), receptors = character())
for (i in 1:nrow(result)) {
oneligand <- result$ligand.complex[i]
onereceptor <- result$receptor.complex[i]
onereceptor <- unique(unlist(str_split(onereceptor, "_")))
for (onerecp in onereceptor) {
interact_edges[nrow(interact_edges) +1,] <- c(oneligand, onerecp)
}
}
interact_edges <- interact_edges[!(duplicated(interact_edges)),]
write.table(interact_edges, file.path(wd_path,"CausalCCC_files/demo_interactEdges.tsv"), row.names=F, quote=F, sep="\t")
The second step is to define which upstream and downstream genes you want to include in your CausalCCC network. These genes can be knowledge-based, or the result of a feature selection method. You can use the feature selection method of your choice or use our method causalCCC.MIselection()
. This function offers an unsupervized feature selection tool based on fast pairwise mutual information computation.
Explanation: an unsupervized way of discovering feature importance in your dataset is to calculate the Mutual Information (MI) between all pairs of variables. Variables that share a lot of information with many other variables are necessary to understand the underlying interactions in your dataset (these signals could be artefacts or true interactions, both needs to be discovered to analyze your dataset).
How to: As computing mutual information between all pairs of variables (25000*25000) takes too long and might highlight strong associations unrelated to cell-cell interactions, we select up to 15 genes (including ligand and/or receptor genes) and up to 15 metadata as features of interest and compute their shared information with all the others variables (30*25000).
The parameters of causalCCC.Miselection()
are:
data_input
[a dataframe or a Seurat object] A Single-Cell transcriptomics object, dataframe or Seurat. If dataframe must contains genes and metadata as variables and cells as observations.
assay_name
[a string]. For Seurat objects, gives the name of the assay to take the transcriptomics raw counts from (default is ‘RNA’)
interact_ident
[a string]. Gives the name of the metadata containing the celltypes population
oneinteract
[a string]. Either the senders tag or the receivers tag
goi
[a vector] A list of genes of interest in your dataset. Must be of length <15. These genes will be individually used as pivot variable to look for mutual information in your dataset.
metadata_list
[a vector] A list of metadata names of interest in your dataset. Must be of length <15. These metadata will be individually used as pivot variable to look for mutual information in your dataset.
n_genes
[a numeric] The number of genes to keep after pairwise mutual information ranking. Default depends on the length of goi (between 15 and 100).
plot
[a boolean] A boolean to specify if you want a heatmap plot of the highest mutual information found with the features of interest, default is FALSE. Will be saved in output_dir
color_heatmap
[a string] A color for the high mutual informations values in the color scale.
save
[a boolean] A boolean to specify if you want to save the full mutual information table, default is FALSE. Will be saved in output_dir
output_dir
[a string] A string to specify an output directory to export optional outputs to, default is “MI_tables”
return_full
[a boolean] If true the function returns the full table of MI computation instead of only the top genes
The function returns a vector of genes of length n_genes sharing the highest mutual information with the given features of interest. Optionnally, it returns the full dataframe of MI computation, MI heatmaps plots and saves csv files.
#Find genes that share the most information with your biological question :
MI_senders_genes <- causalCCC.MIselection(data_input = SeuratObject,
assay_name = "RNA",
interact_ident = interact_ident,
oneinteract = senders,
goi = c(ligands[1:5],goi_senders),
metadata_list = names(metadata_senders),
save = T,
output_dir = file.path(wd_path, "MI_tables"),
color_heatmap = "darkgreen",
plot = T)
MI_receivers_genes <- causalCCC.MIselection(data_input = SeuratObject,
assay_name = "RNA",
interact_ident = interact_ident,
oneinteract = receivers,
goi = c(receptors,goi_receivers),
metadata_list = names(metadata_receivers),
save = T,
output_dir = file.path(wd_path, "MI_tables"),
color_heatmap = "darkorange",
plot = T)
Mutual Information heatmap of B cells
Mutual Information heatmap of CD8 T cells
Finally, you can save the files and are ready to run CausalCCC on the web server.
The causalCCC.mosaic()
function only needs the names of the metadata columns but the causalCCC.state_order()
needs the named list because this is where the increasing levels are defined.
genes_senders <- unique(c(ligands, MI_senders_genes, goi_senders))
genes_receivers <- unique(c(receptors, MI_receivers_genes, goi_receivers))
## Create the input mosaic matrix
causalCCC_df <- causalCCC.mosaic(data_input = SeuratObject,
assay_name = assay_name,
interact_ident = interact_ident,
senders_name = senders,
receivers_name = receivers,
genes_senders= genes_senders,
genes_receivers = genes_receivers,
metadata_senders = names(metadata_senders),
metadata_receivers = names(metadata_receivers))
## Create the state order
causalCCC_st <- causalCCC.state_order(mosaic_data_table = causalCCC_df,
genes_senders= genes_senders,
genes_receivers = genes_receivers,
ligands = ligands,
receptors = receptors,
metadata_senders = metadata_senders,
metadata_receivers = metadata_receivers)
## Save files
cat(paste("Saving the causalCCC files in", file.path(wd_path,"CausalCCC_files/")))
## Create the network layout
network_layout <- causalCCC.layout(causalCCC_st, network_height = 8)
file <- file(file.path(wd_path,"CausalCCC_files/causalCCC_layout.json"))
writeLines(network_layout, file)
close(file)
write.table(causalCCC_df, file = file.path(wd_path,"CausalCCC_files/causalCCC_df.csv"), quote = F, sep = ",", row.names = F)
write.table(causalCCC_st, file = file.path(wd_path, "CausalCCC_files/causalCCC_st.tsv"), quote = F, sep = "\t", row.names = F)
#Fix ligands and receptors names when duplicated
duplicated_ligands <- intersect(ligands,genes_receivers)
duplicated_receptors <- intersect(receptors, genes_senders)
interact_edges$ligands <- ifelse(interact_edges$ligands %in% duplicated_ligands, paste0(interact_edges$ligands, "_senders"), interact_edges$ligands)
interact_edges$receptors <- ifelse(interact_edges$receptors %in% duplicated_receptors, paste0(interact_edges$receptors, "_receivers"), interact_edges$receptors)
write.table(interact_edges, file.path(wd_path,"CausalCCC_files/causalCCC_interactEdges.tsv"), row.names=F, quote=F, sep="\t")
output_files <- list(
mosaic_table = causalCCC_df,
state_order = causalCCC_st,
interact_edges = interact_edges,
network_layout = network_layout
)
Now run the job on the MIIC webserver !
You can tailor CausalCCC analysis to your biological question. For instance, one of our use cases showcases cell signaling within a trio of cell populations interacting with one another; It is indeed possible to reconstruct a CausalCCC network including more than two celltypes and where cells can be both senders and receivers :
CausalCCC preprocessing only requires python >= 3.9 and the following packages. It takes an Anndata object as input.
The same process apply to a Squidpy or MuData object.
import scipy.sparse
import anndata as ad
import scanpy as sc
import pandas as pd
import numpy as np
import os
If you already have your data prepared, you might be able to skip this step!
You already have:
Your cell/gene matrix in a table format? (i.e csv, txt, tsv..?)
A list of ligand and receptor pair of your choice?
Good news, you don’t need this vignette! You can go directly on CausalCCC webserver!
os.chdir("path/to/your/adata")
adata = sc.read_h5ad(file_path)
print(adata)
We encourage you to run any cell-cell communication method you like, or to even bring your own ligand-receptor pair of interest. However, if you need assistance with running cell-cell communication methods, we here suggest to use the liana+ example.
For more detailed guidance on running cell-cell communication methods, you can refer to the following python tutorials:
Next, you will need to structure your data in the “mosaic” format.
Assigning sender and receiver populations
Before proceeding, ensure that your adata.obs
metadata includes a column named ['interact_ident']
. This column should specify, for each cell, whether it is a sender
, a receiver
, or NA
(if the cell is neither).
If you have a “celltype” column in your adata.obs metadata, you can assign for example:
senders = ['T CD4 Naive', 'T CD4 Memory'] #and
receivers = ['B intermediate']. #Then, initialize:
adata.obs['interact_ident'] = 'Other' # (or NA) for example, and run:
adata.obs.loc[adata.obs['celltype'].isin(senders), 'interact_ident'] = 'senders' adata.obs.loc[adata.obs['celltype'].isin(receivers), 'interact_ident'] = 'receivers'
#to attribute the said interact_ident. You can finally check with:
adata.obs['interact_ident'].value_counts() #the attribution.
In a “mosaic” format, data for senders and receivers is arranged so that only the relevant values for each cell type are filled, while non-relevant parts are left as NA
. This creates a “mosaic” pattern of filled values and NA
s across the table.
NA
.NA
.This arrangement results in a “mosaic” pattern of populated values and NA
s across the matrix. Here’s a simplified example:
interact_ident | Expression (Sender) | Expression (Receiver) | |
---|---|---|---|
Cell 1 | Sender | 0.85 | NA |
Cell 2 | Sender | 0.22 | NA |
Cell 3 | Sender | 0.76 | NA |
Cell 4 | Receiver | NA | 1.0 |
Cell 5 | Receiver | NA | 0.93 |
Here:
NA
for receptor values or receiver’s specific genes.NA
for ligand values or for sender’s specific genes.
This structure helps separate the information relevant to senders and receivers. If a gene is expressed in both senders and receivers, the function will append a “_sender” or ”_receiver” tag on its name, to create the adequate mosaic format.
def causalCCC_mosaic(adata, interact_ident, senders_name, receivers_name,
genes_senders, genes_receivers, metadata_senders=None, metadata_receivers=None):
"""
Create a mosaic datatable for causalCCC using anndata object.
:param adata: anndata object (scanpy or else)
:param interact_ident: column name in obs that identifies sender/receiver population.
:param senders_name: name identifying senders in interact_ident column
:param receivers_name: name identifying receivers in interact_ident column
:param genes_senders: list of genes for senders
:param genes_receivers: list of genes for receivers
:param metadata_senders: list of metadata for senders (optional)
:param metadata_receivers: list of metadata for receivers (optional)
:return: pandas dataframe containing the mosaic datatable
"""
if interact_ident not in adata.obs.columns:
raise ValueError(f"The metadata '{interact_ident}' is not found in the anndata object")
duplicated_genes = list(set(genes_senders) & set(genes_receivers))
duplicated_meta = []
if metadata_senders and metadata_receivers:
duplicated_meta = list(set(metadata_senders) & set(metadata_receivers))
def prepare_table(adata, cell_type, genes, metadata=None):
sub_adata = adata[adata.obs[interact_ident] == cell_type]
if sub_adata.shape[0] == 0:
raise ValueError(f"No cells found for {cell_type}")
sub_matrix = pd.DataFrame(sub_adata[:, genes].X.T.toarray(), columns=genes)
sub_matrix = sub_matrix.sample(frac=1).reset_index(drop=TRUE)
if metadata:
for onemeta in metadata:
if onemeta not in sub_adata.obs.columns:
raise ValueError(f"Metadata '{onemeta}' not found for {cell_type}")
sub_matrix[onemeta] = sub_adata.obs[onemeta].values
if sub_matrix[onemeta].isna().all():
print(f"Warning: Metadata '{onemeta}' contains only NA values and will be excluded")
sub_matrix.drop(columns=[onemeta], inplace=TRUE)
return sub_matrix
# Prepare senders and receivers tables
print(f"Preparing {senders_name} cells table ...")
senders_table = prepare_table(adata, senders_name, genes_senders, metadata_senders)
print(f"Preparing {receivers_name} cells table ...")
receivers_table = prepare_table(adata, receivers_name, genes_receivers, metadata_receivers)
# Handle duplicated genes and metadata
if duplicated_genes:
print(f"Duplicated genes found: {duplicated_genes}, renaming for senders/receivers ...")
receivers_table.rename(columns={gene: gene + "_receivers" for gene in duplicated_genes}, inplace=TRUE)
senders_table.rename(columns={gene: gene + "_senders" for gene in duplicated_genes}, inplace=TRUE)
if duplicated_meta:
print(f"Duplicated metadata found: {duplicated_meta}, renaming for senders/receivers ...")
receivers_table.rename(columns={meta: meta + "_receivers" for meta in duplicated_meta}, inplace=TRUE)
senders_table.rename(columns={meta: meta + "_senders" for meta in duplicated_meta}, inplace=TRUE)
all_columns = list(set(senders_table.columns) | set(receivers_table.columns))
for col in all_columns:
if col not in senders_table.columns:
senders_table[col] = np.nan
if col not in receivers_table.columns:
receivers_table[col] = np.nan
mosaic_table = pd.concat([senders_table, receivers_table], ignore_index=TRUE)
return mosaic_table
You can now run it in with your object in this way:
mosaic_df = causalCCC_mosaic(adata, interact_ident='interact_ident', senders_name='sender',
receivers_name='receiver', genes_senders=['Gene_0'],
genes_receivers=['Gene_1', 'Gene_2'],
metadata_senders=[], metadata_receivers=[])
Then, we have to generate the MIIC state_order. The state order is an optional file that allows you to input optional information for the computation (such as contextual variables, types of variables (otherwise detected automatically)) and information for the display (groups of nodes, levels ordering of categorical data). Here is a brief description:
var_type
column:
0
is categorical/discrete
1
is continuous
We advise to put a continuous variable with less than 5 different values as discrete
levels_increasing_order
column:
is_contextual
column:
0
is non-contextual1
is contextual (meaning that no variable can cause this one). It could be prior knowledge, e.g., experimental variablegroup
and group_color
columns:
def causalCCC_state_order(mosaic_df,
metadata_senders,
metadata_receivers,
genes_senders,
ligands,
receptors,
genes_receivers):
"""
Create a state order file for causalCCC.
Parameters:
-----------
mosaic_df : pd.DataFrame
An output of causalCCC.mosaic()
metadata_senders : dict
A named dictionary of metadata (strings) with levels as items.
metadata_receivers : dict
A named dictionary of metadata (strings) with levels as items.
genes_senders : list of str
A list of selected genes for the senders cells.
ligands : list of str
A list of selected CCC genes for the senders cells.
receptors : list of str
A list of selected CCC genes for the receivers cells.
genes_receivers : list of str
A list of selected genes for the receivers cells.
Returns:
--------
pd.DataFrame
A state order that can be used for causalCCC network reconstruction.
"""
if not isinstance(mosaic_df, pd.DataFrame):
raise ValueError("mosaic_df must be a data frame.")
if metadata_senders is not None and not isinstance(metadata_senders, dict):
raise ValueError("metadata_senders is not a named dictionary")
if metadata_receivers is not None and not isinstance(metadata_receivers, dict):
raise ValueError("metadata_receivers is not a named dictionary")
if not isinstance(genes_senders, list) or not all(isinstance(g, str) for g in genes_senders):
raise ValueError("genes_senders must be a list of strings.")
if not isinstance(genes_receivers, list) or not all(isinstance(g, str) for g in genes_receivers):
raise ValueError("genes_receivers must be a list of strings.")
if not isinstance(ligands, list) or not all(isinstance(g, str) for g in ligands):
raise ValueError("ligands must be a list of strings.")
if not isinstance(receptors, list) or not all(isinstance(g, str) for g in receptors):
raise ValueError("receptors must be a list of strings.")
# Initialize state_order df
state_order = pd.DataFrame(columns=["var_names", "var_type", "levels_increasing_order", "is_contextual", "group", "group_color"])
# Add .obs (metadata) information
duplicated_meta = set(metadata_senders.keys()).intersection(metadata_receivers.keys())
if metadata_senders:
for onemeta, levels in metadata_senders.items():
levels_increasing_order = ",".join(levels)
meta_name = f"{onemeta}_senders" if onemeta in duplicated_meta else onemeta
if meta_name in mosaic_df.columns:
state_order = state_order.append({
"var_names": meta_name,
"var_type": 0,
"levels_increasing_order": levels_increasing_order,
"is_contextual": 0,
"group": "metadata",
"group_color": "FFE397"
}, ignore_index=TRUE)
if metadata_receivers:
for onemeta, levels in metadata_receivers.items():
levels_increasing_order = ",".join(levels)
meta_name = f"{onemeta}_receivers" if onemeta in duplicated_meta else onemeta
if meta_name in mosaic_df.columns:
state_order = state_order.append({
"var_names": meta_name,
"var_type": 0,
"levels_increasing_order": levels_increasing_order,
"is_contextual": 0,
"group": "metadata",
"group_color": "FFD050"
}, ignore_index=TRUE)
# Identify duplicated genes (if you have the same gene in both sender and receiver populations)
dups = [f"{gene}_senders" for gene in set(genes_senders).intersection(genes_receivers)] + \
[f"{gene}_receivers" for gene in set(genes_senders).intersection(genes_receivers)]
interact_meta = state_order["var_names"].tolist()
for col in set(mosaic_df.columns) - set(interact_meta):
group = ""
color = ""
if col in dups:
gene, family = col.rsplit("_", 1)
else:
gene = col
family = "senders" if gene in genes_senders else "receivers"
if family == "senders":
group = "senders genes"
color = "EBF5D8"
if gene in ligands:
group = "ligands"
color = "28827A"
else:
group = "receivers genes"
color = "F0E2B6"
if gene in receptors:
group = "receptors"
color = "DE6343"
if mosaic_df[col].nunique() <= 5:
state_order = state_order.append({
"var_names": col,
"var_type": 0,
"levels_increasing_order": "",
"is_contextual": 0,
"group": group,
"group_color": color
}, ignore_index=TRUE)
else:
state_order = state_order.append({
"var_names": col,
"var_type": 1,
"levels_increasing_order": "",
"is_contextual": 0,
"group": group,
"group_color": color
}, ignore_index=TRUE)
return state_order
Now you can directly create your state order this way:
state_order = causalCCC_state_order(mosaic_df,
metadata_senders=list(),
metadata_receivers=list(),
genes_senders,
ligands,
receptors,
genes_receivers)
You can put your own colors for the nodes later on :) See the Advanced Mode tab.
This vignette guides you to navigate the result page of a CausalCCC network. We offer many visualization tools for your CausalCCC network. Above your CausalCCC network you will see multiple tabs which we describe below :
CausalCCC webserver provides a unique interactive visualization tool to explore your CausalCCC networks. In particular, you can explore the reconstructed gene interaction pathways upstream of a specific ligand or downstream of a specific receptor using your mouse buttons:
With the left click you select a gene (or an link) and see its neighborhood.
With the right click you see a drop-down menu with 3 options :
If you used an optional state_order file, you can find below the graph the legend of your color palette :
By clicking once on it you highlight the nodes of a considered group. If you double-click, it highlights the considered group and their immediate neighbors :
The table contains the information about all pairs of variables either directly or indirectly associated and the corresponding indirect contributions:
You can find the multivariate information values associated with the links
X or Y : you can select a variable to see the other variables it is directly or indirectly associated to
Type : P (positive), if there is a link between X and Y; N (negative), if there is no link but an indirect association
Ai : the variables indirectly contributing to the XY pair information [proportion of their contribution, sum up to 100% if type=N]
CCC edge : P (positive), if the considered edge is a CCC link; N (negative), if the considered edge is a MIIC-computed edge.
Info : the mutual information between X and Y computed by MIIC in log confidence unit (ie info = n_samples * ln(2) * info_bits).
Info bits : the mutual information between X and Y computed by MIIC in bits.
Info shifted : the residual mutual information after substracting the indirect contributions and the model complexity taking into account the number of samples (see MIIC publications for details). If Info shifted equals 0 (or is smaller than a small confidence threshold, set in Algorithm advanced settings) the link is removed. Otherwise there is a link representing the direct association between X and Y
Info shifted bits : Info shifted in bits.
Group X : If a state order was uploaded, the group label of x. For instance, if x is labeled as a “sender genes” in the state order, group x is equal to “sender genes”.
Group Y : If a state order was uploaded, the group label of y.
This table lists the orientation probabilities of the network links (an advanced feature of causal network reconstruction). The signature of causality in observational data is based on v-structure motifs: X -> Z <- Y (see eg Causality by J Pearl 2009). MIIC combines causal discovery and information theory frameworks and uniquely defines “orientation score” probabilities to confidently orient certain edges, see MIIC publications.
Here you can find the distributions of the variables in the CausalCCC network.
This table lists the properties of all variables. If you uploaded a state order file, this is exactly this state order file, otherwise the variable types are detected automatically. CausalCCC wrapper function creates this file for you.
In this tab you can download the different files obtained after reconstruction of the network.
Here are listed all datasets used during the tutorials and for the use cases published in the CausalCCC paper.
For each use case is linked the data source (from the original paper) and the CausalCCC input files used to obtain the networks in the CausalCCC paper.