This vignette guides you to prepare your Single-Cell data to use interactMIIC. This is a more personalized pipeline for more experienced bioinformaticians, another vignette more beginner-friendly is available here.
The idea of interactMIIC is to reconstruct causal graphs similar to Gene Regulatory Networks (GRNs) in your “senders” and “receivers” populations, and to link the found “intracellular mechanisms” with “intercellular interactions” in integration with Cell-Cell Communications (CCC) methods (refs).
Here a schematic network of what you can obtain with interactMIIC:
Make sure you can load the following packages :
knitr::opts_chunk$set(echo = T)
### Mandatory Libraries
library(Seurat)
library(ggplot2)
library(tidyverse)
library(data.table)
library(dplyr)
library(igraph)
### For CCC selection
library(liana)
### For network aesthetic display
library(rjson)
#Download interactMIIC
# Sys.setenv(GITHUB_PAT = 'YOUR_GITHUB_API_TOKEN_HERE')
# devtools::install_github("interactMIIC/interactMIIC/miic_R_package", force = T) #latest version of MIIC
library(miic)
set.seed(1511)
#Set your working directory here
wd_path <- "~/multi-oMIICs/tutorial/pbmc"
# Create directories
dir.create(file.path(wd_path,"plots/"))
## Warning in dir.create(file.path(wd_path, "plots/")): 'C:\Users\Louise
## Dupuis\Documents\multi-oMIICs\tutorial\pbmc\plots' existe déjà
dir.create(file.path(wd_path,"MI_tables/"))
## Warning in dir.create(file.path(wd_path, "MI_tables/")): 'C:\Users\Louise
## Dupuis\Documents\multi-oMIICs\tutorial\pbmc\MI_tables' existe déjà
dir.create(file.path(wd_path,"MIIC_networks/"))
## Warning in dir.create(file.path(wd_path, "MIIC_networks/")): 'C:\Users\Louise
## Dupuis\Documents\multi-oMIICs\tutorial\pbmc\MIIC_networks' existe déjà
First, load your Seurat object, define the senders and receivers populations, and the metadata column “interact_ident” in which you can find them.
If you have multiple subtypes of senders (or receivers), give the general population name of the senders (or receivers), and create a “subtype” metadata variable to put the more defined subpopulations names in.
Add any metadata of interest to the metadata named list. Levels should be given in an “increasing” order (this can be arbitrary and do not influence computation). Right now metadata must be categorical but this will be fixed soon to cater for continuous metadata.
### YOUR SEURAT OBJECT
load(file.path(wd_path,"preprocessed_pbmc3k.RData"))
SeuratObject <- pbmc3k
#Precise assay of interest
assay_name <- 'RNA'
#Define your senders, your receivers and the metadata where they can be found
senders <- c('B')
receivers <- c('CD8 T')
interact_ident <- "seurat_annotations"
Idents(SeuratObject) <- interact_ident
# Specify the species genes format (mouse or human)
species <- "human"
# Specify the CCC method you want to use to select L-R edges (optional)
CCC_method <- "cellphonedb"
# If you want to remove ribomosomal genes
ribomosal_genes <- row.names(SeuratObject)[stringr::str_detect(row.names(SeuratObject), ifelse(species == "human", "RPL|RPS","Rpl|Rps") )]
counts <- GetAssayData(SeuratObject, assay = "RNA")
counts <- counts[-(which(rownames(counts) %in% ribomosal_genes)),]
SeuratObject <- subset(SeuratObject, features = rownames(counts))
rm(counts)
### YOUR BIOLOGICAL QUESTION
#genes and metadata of interest for the senders population
goi_senders <- c('HLA-DQA1')
metadata_senders <- list()
#genes and metadata of interest for the receivers population
goi_receivers <- c('LAG3')
metadata_receivers <- list()
# exemple:
# metadata <- list(
# treatment = c("Control", "Treated"),
# another_meta = c("Level1", "Level2", "Level3")
# )
Run this if you don’t already have a Cell-Cell communication analysis of your populations. This function will give you the top L-R links according to a method of choice, the default being cellphoneDB.
#get the LIANA significant found interactions from senders to receivers
result <- interactmiic.CCClinks(SeuratObject,
species = species,
assay_name = assay_name,
interact_ident = interact_ident,
senders = senders,
receivers = receivers,
CCC_method = CCC_method)
write.table(result, file = file.path(wd_path,paste0("pbmc_liana_", CCC_method, ".csv")), quote = F, sep = ",")
Extract from the CCC method output top interact edges to add the to interactMIIC network:
interact_edges <- data.frame(ligands = character(), receptors = character())
for (i in 1:nrow(result)) {
oneligand <- result$ligand.complex[i]
onereceptor <- result$receptor.complex[i]
onereceptor <- unique(unlist(str_split(onereceptor, "_")))
for (onerecp in onereceptor) {
interact_edges[nrow(interact_edges) +1,] <- c(oneligand, onerecp)
}
}
interact_edges <- interact_edges[!(duplicated(interact_edges)),]
#final genes lists
ligands <- unique(interact_edges$ligand)
receptors <- unique(interact_edges$receptors)
Run this if you don’t know which genes to put into your interactMIIC network.
Explanation: an unsupervized way of generating feature importance in your dataset is to calculate the Mutual Information between all pairs of variables. Variables that share a lot of information with a lot of other variables are necessary to understand your dataset dynamics (these signals could be artefacts or true interactions, both needs to be discovered to analyze your dataset). Of course these mutual information could become 0 after conditioning on a separating set but this will be computed later when we construct the network with MIIC. Here it serves the purpose of a preliminary analysis of our dataset.
How to: As computing the pairwise mutual information of all the variables is too long (25000*25000), we select ~5 genes and ~5 metadata as features of interest and compute their shared information with all the others variables (10*25000)
#Find genes that share the most information with your biological question :
MI_sender_genes <- interactmiic.MIselection(seurat_object = SeuratObject,
assay_name = "RNA",
interact_ident = interact_ident,
oneinteract = senders,
goi = unique(c(goi_receivers,ligands[1:min(length(ligands),15)])),
metadata_list = names(metadata_senders),
save = T,
output_dir = file.path(wd_path, "MI_tables"),
plot = T)
MI_receivers_genes <- interactmiic.MIselection(seurat_object = SeuratObject,
assay_name = "RNA",
interact_ident = interact_ident,
oneinteract = receivers,
goi = unique(c(goi_receivers,receptors[1:min(length(receptors),15)])),
metadata_list = names(metadata_receivers),
save = T,
output_dir = file.path(wd_path, "MI_tables"),
plot = T)
Create the necessary interactMIIC files to reconstruct a network.
This chunk merges the variables of interest and the ligands/receptors of each celltype (senders or receivers) of the CCC analysis to look into during interactMIIC network reconstruction
#SENDER
genes_senders <- unique(c(ligands, MI_sender_genes, goi_senders))
message(paste(c("Here are the genes used to reconstruct the senders MIIC network:",paste0(genes_senders, collapse = ", ")), collapse = "\n"))
## Here are the genes used to reconstruct the senders MIIC network:
## B2M, HLA-C, HLA-A, HLA-B, HLA-DRA, HLA-DPB1, HLA-DRB1, HLA-DPA1, HLA-E, HMGB1, HLA-DQA1, HLA-DQB1, HLA-DRB5, MIF, CD48, HLA-F, CD22, HLA-DQA2, LGALS9, ICAM2, GNAS, CD74, EEF1A1, GAPDH, COTL1, PTMA, MALAT1, ACTG1, TPT1, FAU, ARPC2, MT-CO1, OAZ1, NACA, SEC61B, LTB, TMSB10, UBC, ACTB, CYBA, UBA52, LDHB, GNB2L1, NPM1, PPIB, EEF2, FTL, MYL12B, NDUFA13, FTH1, TMSB4X, PFDN5, GLTSCR2, KLF6, MYL12A, BTF3, COX5A, RAC2, VIM, ARPC1B, EEF1B2, LSP1, MT-CO3, H3F3B, CFL1, PFN1, PSMB9, GPR183, LY6E, CD79A, ATP5G2, ATP5L, TCL1A, DYNLL1, SLC25A6, SRGN, EIF1, FIS1, GABARAPL2, SAT1, ANXA2, SERF2, CNBP, ARPC5L, C1orf43, CD52, TMEM258, PSMB1
if (length(metadata_senders) >0) {
message(paste(c("Here are the metadata used to reconstruct the senders MIIC network:",paste0(c(names(metadata_senders)), collapse = ", ")), collapse = "\n"))
}
#RECEIVER
genes_receivers <- unique(c(receptors, MI_receivers_genes, goi_receivers ))
message(paste(c("Here are the genes used to reconstruct the receivers MIIC network:",paste0(genes_receivers, collapse = ", ")), collapse = "\n"))
## Here are the genes used to reconstruct the receivers MIIC network:
## CD3D, LAG3, CD8A, CD8B, CXCR4, CD74, CD2, CD44, PTPRC, ITGAL, ITGB2, PTGDR, FOS, DUSP1, JUNB, JUN, ZFP36, GZMH, TTC16, FCGR3A, KBTBD3, SP140L, CROCCP2, FGFBP2, HLA-DRB5, IER2, GRSF1, ZFP36L2, LTB, NKG7, LZTR1, FKBP1B, SPIN4, HIPK1, VEZF1, PCNA, HLA-DPB1, RP5-882C2.2, ZNF598, S1PR5, ZCCHC2, BTG1, AQP3, IL7R, MZT2A, PDP1, POM121, C12orf75, HCST, ZNF747, CD82, ASCL2, HLA-DRB1, HLA-DRA, ADAM9, SAMM50, CST7, OAZ1, CD83, VPS26A, FAM32A, DDRGK1, HLA-DQA1, ZNF175, C7orf73, SNX10, VEZT, TBC1D19, CUL9, ZFAND2B
if (length(metadata_receivers) >0) {
message(paste(c("Here are the metadata used to reconstruct the receivers MIIC network:",paste0(c(names(metadata_receivers)), collapse = ", ")), collapse = "\n"))
}
Generate the matrix of senders and receivers cells with selected variables in the interactMIIC format:
interactMIIC_df <- interactmiic.mosaic(seurat_object = SeuratObject,
assay_name = assay_name,
interact_ident = interact_ident,
senders_name = senders,
receivers_name = receivers,
genes_senders= genes_senders,
genes_receivers = genes_receivers,
metadata_senders = names(metadata_senders),
metadata_receivers = names(metadata_receivers))
Then, generate the MIIC state_order. The state order is an optional file that allows you to input optional information for the computation (such as contextual variables, types of variables (otherwise detected automatically)… ) and information for the display (groups of nodes, levels ordering of categorical data …). Here is a brief description:
Here, to help the readability of the network, we are going to create 5 groups : metadata,sender_genes,ligands, receptors, receivers_genes.
interactmiic.state_order2 <- function(mosaic_data_table,
metadata_senders,
metadata_receivers,
genes_senders,
ligands,
receptors,
genes_receivers) {
# Input checks
if (!is.data.frame(mosaic_data_table)) {
stop("mosaic_data_table must be a data frame.")
}
if (!is.list(metadata_senders)) {
stop("metadata_senders is not a named list")
}
if (!is.list(metadata_receivers)) {
stop("metadata_receivers is not a named list")
}
if (!is.vector(genes_senders) || !is.character(genes_senders)) {
stop("genes_senders must be a character vector.")
}
if (!is.vector(genes_receivers) || !is.character(genes_receivers)) {
stop("genes_receivers must be a character vector.")
}
if (!is.vector(ligands) || !is.character(ligands)) {
stop("ligands must be a character vector.")
}
if (!is.vector(receptors) || !is.character(receptors)) {
stop("receptors must be a character vector.")
}
# Initialize state_order data frame
state_order <- data.frame(var_names = character(),
var_type = numeric(),
levels_increasing_order = character(),
is_contextual = numeric(),
group = character(),
group_color = character())
# Add metadata information
duplicated_meta <- intersect(names(metadata_senders), names(metadata_receivers))
if (length(metadata_senders) >0) {
for (onemeta in names(metadata_senders)) {
levels_increasing_order <- paste(metadata_senders[[onemeta]], collapse = ",")
if (ifelse(onemeta %in% duplicated_meta, paste0(onemeta, "_senders"), onemeta) %in% colnames(mosaic_data_table)) {
state_order[nrow(state_order) + 1,] <- list(ifelse(onemeta %in% duplicated_meta, paste0(onemeta, "_senders"), onemeta), 0, levels_increasing_order, 0, "metadata", "FFD050")
}
}
}
if (length(metadata_receivers) >0) {
for (onemeta in names(metadata_receivers)) {
levels_increasing_order <- paste(metadata_receivers[[onemeta]], collapse = ",")
if (ifelse(onemeta %in% duplicated_meta, paste0(onemeta, "_receivers"), onemeta) %in% colnames(mosaic_data_table)) {
state_order[nrow(state_order) + 1,] <- list(ifelse(onemeta %in% duplicated_meta, paste0(onemeta, "_receivers"), onemeta), 0, levels_increasing_order, 0, "metadata", "FFD050")
}
}
}
# Identify duplicated genes (special case)
dups <- c(paste0(intersect(genes_senders, genes_receivers), "_senders"),paste0(intersect(genes_senders, genes_receivers), "_receivers"))
interact_meta <- as.vector(state_order$var_names)
# Assign the genes
for(col in setdiff(colnames(mosaic_data_table), interact_meta)) {
# Initialize variables
group <- ""
color <- ""
# Test if gene is duplicated and assign group
if (col %in% dups) {
gene <- stringr::str_split(col, "_")[[1]][1]
family <- stringr::str_split(col, "_")[[1]][2]
} else {
gene <- col
family <- ifelse(gene %in% genes_senders, "senders", "receivers")
}
if (family == "senders") {
group <- "senders genes"
color <- "ABC4AB"
if (gene %in% ligands) {
group <- "ligands"
color <- "40A86A"
}
} else {
group <- "receivers genes"
color <- "F4A289"
if (gene %in% receptors) {
group <- "receptors"
color <- "F26841"
}
}
# Add to state_order
if (unique(dplyr::n_distinct(mosaic_data_table[,col]) <= 5)) {
state_order[nrow(state_order) + 1,] <- list(col, 0, "", 0, group, color)
} else {
state_order[nrow(state_order) + 1,] <- list(col, 1, "", 0, group, color)
}
}
return(state_order)
}
interactMIIC_st <- interactmiic.state_order2(mosaic_data_table = interactMIIC_df,
genes_senders= genes_senders,
genes_receivers = genes_receivers,
ligands = ligands,
receptors = receptors,
metadata_senders = metadata_senders,
metadata_receivers = metadata_receivers)
If you use the webserver, this creates a nice display of your network :
### WORK IN PROGRESS - DO NOT CALL
# Generate empty layout json file
network_layout <- interactmiic.layout(interactMIIC_st, square_height = 8)
file <- file(file.path(wd_path,"MIIC_networks/pbmc_v1_layout.json"))
writeLines(network_layout, file)
close(file) # to absolutely not forget
write.table(interactMIIC_df, file = file.path(wd_path,"MIIC_networks/pbmc_v1.csv"), quote = F, sep = ",", row.names = F)
write.table(interactMIIC_st, file = file.path(wd_path, "MIIC_networks/pbmc_v1_st.tsv"), quote = F, sep = "\t", row.names = F)
duplicated_ligands <- intersect(ligands,genes_receivers)
duplicated_receptors <- intersect(receptors, genes_senders)
interact_edges$ligands <- ifelse(interact_edges$ligands %in% duplicated_ligands, paste0(interact_edges$ligands, "_senders"), interact_edges$ligands)
interact_edges$receptors <- ifelse(interact_edges$receptors %in% duplicated_receptors, paste0(interact_edges$receptors, "_receivers"), interact_edges$receptors)
write.table(interact_edges, file.path(wd_path,"MIIC_networks/pbmc_v1_interactEdges.tsv"), row.names=F, quote=F, sep="\t")
Now we have 2 options: we can use interactMIIC in R studio but the computation time can be long depending on your machine.
We recommend putting the consistent option to “skeleton” meaning interactMIIC will take an extra step to make sure it is not removing necessary links to explain indirect pathways and give a consistent skeleton of the graph.
miic.res <- miic(input_data = interactMIIC_df, #input dataset
state_order = interactMIIC_st, #additional information
interact_edges = interact_edges, #CCC links to add
consistent = "skeleton",
conf_threshold = 0.1, #threshold to filter the less probable edges in skeleton step
n_shuffles=100,
ori_proba_ratio = 0.1) #remove orientations that do not meet this threshold
## [1] "the interactEdges file is detected in miic.R"
## [1] "the optional weight column in interactEdges file is not a problem in miic.reconstruct.R"
## Search all pairs for unconditional independence relations...
## Did not remove 0---70edge in initializeEdges() because status is 1
## Did not remove 0---71edge in initializeEdges() because status is 1
## Did not remove 0---72edge in initializeEdges() because status is 1
## Did not remove 0---73edge in initializeEdges() because status is 1
## Did not remove 1---74edge in initializeEdges() because status is 1
## Did not remove 1---75edge in initializeEdges() because status is 1
## Did not remove 1---76edge in initializeEdges() because status is 1
## Did not remove 1---77edge in initializeEdges() because status is 1
## Did not remove 1---80edge in initializeEdges() because status is 1
## Did not remove 1---81edge in initializeEdges() because status is 1
## Did not remove 1---82edge in initializeEdges() because status is 1
## Did not remove 1---87edge in initializeEdges() because status is 1
## Did not remove 2---71edge in initializeEdges() because status is 1
## Did not remove 2---72edge in initializeEdges() because status is 1
## Did not remove 2---73edge in initializeEdges() because status is 1
## Did not remove 2---78edge in initializeEdges() because status is 1
## Did not remove 2---85edge in initializeEdges() because status is 1
## Did not remove 3---71edge in initializeEdges() because status is 1
## Did not remove 3---72edge in initializeEdges() because status is 1
## Did not remove 3---73edge in initializeEdges() because status is 1
## Did not remove 3---78edge in initializeEdges() because status is 1
## Did not remove 3---85edge in initializeEdges() because status is 1
## Did not remove 4---79edge in initializeEdges() because status is 1
## Did not remove 4---83edge in initializeEdges() because status is 1
## Did not remove 5---83edge in initializeEdges() because status is 1
## Did not remove 6---84edge in initializeEdges() because status is 1
## Did not remove 7---83edge in initializeEdges() because status is 1
## Did not remove 8---86edge in initializeEdges() because status is 1
## Did not remove 8---88edge in initializeEdges() because status is 1
## Did not remove 9---89edge in initializeEdges() because status is 1
## Did not remove 10---89edge in initializeEdges() because status is 1
## Did not remove 11---90edge in initializeEdges() because status is 1
## Search for candidate separating nodes...
## Search for conditional independence relations...
## Compute confidence cut with permutations...95 edges cut.
## Search for edge directions...
## Iteration 0 Number of edges: 281
## Search for candidate separating nodes...
## Search for conditional independence relations...
## Compute confidence cut with permutations...101 edges cut.
## Search for edge directions...
## Iteration 1 Number of edges: 281
## cycle found of size 1
plot(miic.res)
This other possibility is to run the job on the MIIC webserver.
On the MIIC webserver you can’t specify interacting edges (yet), so the result do not show ligands-receptors links. But here is a way to add them to the vizualisation:
First you need to download the MIIC results from the webserver : edgesList.miic.summary.txt and add it to your working directory folder.
miic.summary <- read.csv(file.path(wd_path, "MIIC_results/edgesList.miic.summary.txt"), sep = "\t")
state_order <- read.csv(file.path(wd_path,"MIIC_networks/pbmc_v1_st.tsv"), sep = "\t")
interactEdges <- read.csv(file.path(wd_path,"MIIC_networks/pbmc_v1_interactEdges.tsv"), sep = "\t")
max_info <- max(miic.summary$info_shifted)
interact.summary <- setNames(data.frame(matrix(ncol = ncol(miic.summary), nrow = 0)), colnames(miic.summary))
for (i in 1:nrow(interactEdges)) {
oneligand = interactEdges$ligand[i]
onereceptor = interactEdges$receptor[i]
onetype = "P"
oneinfo_shifted = max_info
onepartial = 0
interact.summary[nrow(interact.summary) +1,] <- list(oneligand, onereceptor, onetype, NA, oneinfo_shifted, NA, NA, NA, oneinfo_shifted, 2, NA, NA, NA, onepartial,"Y","1;1",NA)
}
total.summary <- rbind(miic.summary, interact.summary)
write.table(total.summary, file.path(wd_path,"MIIC_results/edgesList.interactmiic.totalsummary.txt"), quote = F, row.names = F, sep = '\t')
Now you can upload this updated summary to the visualisation tool ( interactMIIC network with CCC links ) :
Attention renommer “labelSize” en “nodeLabelSize” si on veut mettre un layout sur vis_NL.php car il y aune différence de version …