Reconstructing GRN with GEO datasets for Multiple Sclerosis

Author

Adnan Raza

Introduction

Multiple sclerosis (MS) is a complex and debilitating neurological disorder that affects millions of people worldwide. As a student of bioinformatics with a keen interest in genomics and computational biology, I chose to explore MS for my assignment due to its intricate genetic and environmental interactions. The disease’s multifactorial nature makes it a compelling subject for bioinformatics-driven investigations, such as transcriptomic and genomic analyses, which can provide insights into disease progression, potential biomarkers, and therapeutic targets.

MS is an autoimmune condition where the immune system mistakenly attacks the myelin sheath surrounding nerve fibers, leading to inflammation, neurodegeneration, and progressive disability (Dobson & Giovannoni, 2019). Despite extensive research, the exact etiology of MS remains elusive, necessitating continued exploration using modern computational techniques. By leveraging high-throughput sequencing data and machine learning algorithms, bioinformatics plays a crucial role in identifying genetic variants, transcriptomic changes, and molecular pathways involved in MS pathogenesis (Baranzini & Oksenberg, 2017).

Furthermore, given the increasing accessibility of publicly available transcriptomic datasets, studying MS offers an opportunity to apply data science methodologies to real-world biomedical challenges. This assignment aims to integrate bioinformatics tools to analyze MS-related gene expression patterns, contributing to a deeper understanding of its molecular basis.

Methodology

Package Installation and Setup

# if (!require("BiocManager", quietly = TRUE))
#     install.packages("BiocManager")
# BiocManager::install(version = "3.20")
# install.packages("tidyverse")
library(tidyverse) # For data manipulation
# BiocManager::install("GEOquery")
library(GEOquery)     # For retrieving GEO datasets
# BiocManager::install("DESeq2")
library(DESeq2)       # For gene expression dataset normalization and filtering
# install.packages("igraph")
library(igraph)       # For network graph construction
# install.packages("data.table")
library(data.table)   # For efficient data handling
# BiocManager::install("minet")
library(minet)
# install.packages("poweRlaw")
library(poweRlaw)

1 Dataset Retrieval and Preprocessing

  • GEOquery is used to download datasets using their accession IDs.

  • The dataset is converted into an expression matrix.

  • Metadata is processed to map probe IDs to gene symbols.

  • Genes with multiple probe IDs are aggregated by averaging expression values.

  • Missing values are handled by replacing them with zero to maintain matrix integrity.

# Function to load and preprocess GEO dataset
load_dataset <- function(accession_id) {
  cat("Retrieving", accession_id, "from GEO.\n")
  
  # Retrieve GEO dataset
  gds <- getGEO(accession_id)
  gds_set <- GDS2eSet(gds)
  
  # Extract gene expression matrix
  expression <- exprs(gds_set)
  
  # Extract and clean metadata (gene annotations)
  annot <- fData(gds_set)
  annot <- annot[, c("ID", "Gene symbol")]
  
  # Ensure that if multiple gene symbols are listed, only the last one is retained
  annot$`Gene symbol` <- sapply(str_split(annot$`Gene symbol`, "///"), function(x) trimws(tail(x, 1)))
  
  # Convert expression matrix into a tidy dataframe
  expression_df <- as.data.frame(expression) |>
    rownames_to_column(var = "Probe_ID") |>
    merge(annot, by.x = "Probe_ID", by.y = "ID", all.x = TRUE) |>
    drop_na(`Gene symbol`) |>
    dplyr::select(-Probe_ID) |>
    group_by(`Gene symbol`) |>
    summarise(across(where(is.numeric), \(x) mean(x, na.rm = TRUE))) |>  # Aggregate probes by averaging expression
    column_to_rownames(var = "Gene symbol")
  
  # Remove genes that have all missing values across samples
  expression_df <- expression_df[(rowSums(is.na(expression_df)) < ncol(expression_df)) & row.names(expression_df) != "", ]
  
  # Replace remaining NA values with 0 for consistency
  expression_df[is.na(expression_df)] <- 0
  
  cat("Loaded", accession_id, "as a dataframe.\n")
  return(expression_df)
}

# Load multiple GEO datasets and store them in a list
datasets <- list(
  load_dataset("GDS4994"),
  load_dataset("GDS4218"),
  load_dataset("GDS3920"),
  load_dataset("GDS4152"),
  load_dataset("GDS4151"),
  load_dataset("GDS3886"),
  load_dataset("GDS4150"),
  load_dataset("GDS4147"),
  load_dataset("GDS4146"),
  load_dataset("GDS4145"),
  load_dataset("GDS4142"),
  load_dataset("GDS2978"),
  load_dataset("GDS2419")
)
Retrieving GDS4994 from GEO.
Loaded GDS4994 as a dataframe.
Retrieving GDS4218 from GEO.
Using locally cached version of GPL570 found here:
/tmp/RtmpkCAMQk/GPL570.annot.gz 
Loaded GDS4218 as a dataframe.
Retrieving GDS3920 from GEO.
Using locally cached version of GPL570 found here:
/tmp/RtmpkCAMQk/GPL570.annot.gz 
Loaded GDS3920 as a dataframe.
Retrieving GDS4152 from GEO.
Using locally cached version of GPL570 found here:
/tmp/RtmpkCAMQk/GPL570.annot.gz 
Loaded GDS4152 as a dataframe.
Retrieving GDS4151 from GEO.
Loaded GDS4151 as a dataframe.
Retrieving GDS3886 from GEO.
Using locally cached version of GPL6244 found here:
/tmp/RtmpkCAMQk/GPL6244.annot.gz 
Loaded GDS3886 as a dataframe.
Retrieving GDS4150 from GEO.
Using locally cached version of GPL570 found here:
/tmp/RtmpkCAMQk/GPL570.annot.gz 
Loaded GDS4150 as a dataframe.
Retrieving GDS4147 from GEO.
Using locally cached version of GPL570 found here:
/tmp/RtmpkCAMQk/GPL570.annot.gz 
Loaded GDS4147 as a dataframe.
Retrieving GDS4146 from GEO.
Loaded GDS4146 as a dataframe.
Retrieving GDS4145 from GEO.
Loaded GDS4145 as a dataframe.
Retrieving GDS4142 from GEO.
Using locally cached version of GPL570 found here:
/tmp/RtmpkCAMQk/GPL570.annot.gz 
Loaded GDS4142 as a dataframe.
Retrieving GDS2978 from GEO.
Using locally cached version of GPL96 found here:
/tmp/RtmpkCAMQk/GPL96.annot.gz 
Loaded GDS2978 as a dataframe.
Retrieving GDS2419 from GEO.
Loaded GDS2419 as a dataframe.

2 Data Normalization

  • If datasets are not normalized, DESeq2 is used to apply variance-stabilizing transformation (VST).

  • Highly variable genes (top 0.1%) are selected to focus on biologically significant changes.

# Function to filter top 0.1% most variable genes in the merged dataset
filter_top_variable_genes <- function(expression, variance = 0.999) {
  expression_matrix <- as.matrix(expression)  # Ensure matrix format
  
  # Ensure row names exist and store them
  if (is.null(rownames(expression_matrix))) {
    stop("Error: Row names (gene symbols) are missing in the dataset!")
  }
  
  gene_symbols <- rownames(expression_matrix)
  
  # Calculate variance across genes
  rv_wpn <- rowVars(expression_matrix)
  
  # Select top 0.1% most variable genes
  q_wpn <- quantile(rv_wpn, variance)  
  selected_genes <- rv_wpn > q_wpn  # Logical vector of selected genes
  
  # Apply filtering and explicitly assign row names
  expr_filtered <- expression_matrix[selected_genes, , drop = FALSE]
  
  # Explicitly ensure rownames are character strings, not numeric
  rownames(expr_filtered) <- as.character(gene_symbols[selected_genes])
  
  return(expr_filtered)
}

# Function to normalize gene expression data
normalize_expression <- function(expression) {
  # Explicitly ensure gene symbols are character type before any processing
  if (!is.null(rownames(expression))) {
    gene_symbols <- as.character(rownames(expression))
  } else {
    stop("Error: Input data must have gene symbols as row names!")
  }
  
  # If dataset is already normalized, skip DESeq2 processing but still filter
  if (max(expression) < 100) {  
    message("Dataset appears to be pre-normalized. Skipping DESeq2 processing.")
    result <- filter_top_variable_genes(expression)
    
    # Explicitly ensure rownames are preserved as character
    rownames(result) <- as.character(rownames(result))
    
    return(result)
  }
  
  # Ensure integer values for DESeq2 processing
  de_input <- round(expression)
  
  # Create DESeq2 dataset for normalization
  dds <- DESeqDataSetFromMatrix(
    countData = de_input, 
    colData = data.frame(row.names = colnames(de_input)), 
    design = ~1
  )
  
  # Explicitly set rownames of dds to the original gene symbols
  rownames(dds) <- gene_symbols
  
  # Estimate size factors for normalization
  dds <- estimateSizeFactors(dds)
  
  # Handle potential errors in DESeq processing
  tryCatch({
    dds <- DESeq(dds)
  }, error = function(e) {
    message("Standard DESeq dispersion estimation failed. Using gene-wise dispersion estimates...")
    dds <- estimateDispersionsGeneEst(dds)
    dispersions(dds) <- mcols(dds)$dispGeneEst
  })
  
  # Apply variance-stabilizing transformation
  wpn_vsd <- getVarianceStabilizedData(dds)
  
  # Explicitly ensure rownames are preserved as character
  rownames(wpn_vsd) <- as.character(gene_symbols)
  
  result <- filter_top_variable_genes(wpn_vsd)
  
  return(result)
}

3 Merging Multiple Datasets

  • Expression matrices from multiple datasets are merged based on gene symbols.

  • Data tables (data.table) are used to efficiently combine large matrices.

  • After merging, a normalization step is applied to remove batch effects.

# Function to merge multiple datasets into a single matrix
merge_expression_data <- function(datasets) {
  # Convert each dataset into a data.table for efficient merging
  datasets <- lapply(datasets, function(df) setDT(as.data.frame(df), keep.rownames = TRUE))
  
  # Merge all datasets on gene names using an outer join
  merged_data <- Reduce(function(x, y) merge(x, y, by = "rn", all = TRUE), datasets)
  
  # Convert back to data frame and set gene names as row names
  merged_data <- as.data.frame(merged_data)
  rownames(merged_data) <- merged_data$rn
  merged_data$rn <- NULL  # Remove redundant column
  
  return(merged_data)
}

# Merge all datasets into a single matrix
merged_expression <- merge_expression_data(datasets)

merged_expression[is.na(merged_expression)] <- 0

nm_expression <- normalize_expression(merged_expression)
converting counts to integer mode
Warning in DESeq(dds): the design is ~ 1 (just an intercept). is this intended?
using pre-existing size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
-- replacing outliers and refitting for 367 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)
estimating dispersions
fitting model and testing
# Check dimensions and summary of merged dataset
dim(nm_expression)
[1]  24 422

Network Construction

1 Mutual Information Calculation

  • The Mutual Information Matrix (MIM) is calculated using the minet package.

  • This captures statistical dependencies between gene expression profiles.

# Transpose the matrix for network inference
normalized_expression <- t(nm_expression)

# Compute Mutual Information Matrix for gene-gene interactions
mi_matrix <- minet::build.mim(normalized_expression)

2 Network Inference

  • The ARACNe algorithm is applied to infer gene-gene interactions.

  • Weak interactions (below a threshold) are removed to retain strong regulatory relationships.

  • The result is an adjacency matrix where edges represent inferred regulatory interactions.

# Infer regulatory interactions using ARACNe algorithm with stricter thresholding
network <- minet::aracne(mi_matrix, eps = 0.05)

3 Network Graph Construction

  • The adjacency matrix is converted into an edge list for visualization.

  • The network is modeled using igraph, treating genes as nodes and interactions as edges.

  • A force-directed layout (Fruchterman-Reingold) is used for graph visualization.

# Extract edges from the adjacency matrix
network_edges <- which(network != 0, arr.ind = TRUE) |>
  as.data.frame() |>
  setNames(c("Gene1", "Gene2")) |>
  distinct(Gene1, Gene2, .keep_all = TRUE)

# Convert row indices to gene names
network_edges$Gene1 <- rownames(nm_expression)[network_edges$Gene1]
network_edges$Gene2 <- rownames(nm_expression)[network_edges$Gene2]

# Assign edge weights based on interaction strength
network_edges$Weight <- network[cbind(network_edges$Gene1, network_edges$Gene2)]

# Ensure unique edges by ordering gene pairs alphabetically
network_edges <- network_edges |>
  mutate(
    Gene1 = pmin(Gene1, Gene2),
    Gene2 = pmax(Gene1, Gene2)
  ) |>
  distinct(Gene1, Gene2, .keep_all = TRUE) |>
  filter(Gene1 != Gene2)  # Remove self-interactions

# Display final network size
dim(network_edges)
[1] 46  3
# Construct the gene interaction graph
graph <- graph_from_data_frame(network_edges, directed = FALSE)

# Compute edge density (indicates network sparsity)
cat("Edge Density:", edge_density(graph))
Edge Density: 0.1666667

Visualization and Analysis

1 Network Visualization

  • Nodes represent genes, and edges represent inferred regulatory interactions.

  • Edge width is scaled based on interaction strength.

  • Self-loops are removed to focus on inter-gene interactions.

# Visualize the network graph
plot(graph, 
     layout = layout_with_fr(graph),  # Force-directed layout for better visualization
     vertex.size = 5,
     vertex.label.dist = 1.5,
     edge.width = network_edges$Weight * 5,  # Scale edge width based on interaction strength
     main = "GRN of Multiple Sclerosis",
     margin = 0,
     edge.color = "gray")

2 Topological Analysis

  1. Node Degree – Measures how many connections each gene has.

  2. Betweenness Centrality – Identifies genes that act as bridges between communities.

  3. Closeness Centrality – Determines how easily a gene can influence others.

  4. Clustering Coefficient – Measures local connectivity (tendency to form clusters).

  5. Network Density – Indicates how interconnected the network is.

  6. Community Detection – Identifies functional gene clusters using fast-greedy clustering.

  7. Network Diameter – Longest shortest path in the network, showing maximum interaction distance.

  8. Assortativity – Measures whether highly connected genes tend to interact with other highly connected genes.

# Basic Network Properties
cat("Basic Network Properties:\n")
Basic Network Properties:
cat("Nodes:", vcount(graph), "\nEdges:", ecount(graph), "\n")
Nodes: 24 
Edges: 46 
cat("Edge Density:", edge_density(graph), "\n")
Edge Density: 0.1666667 
# Degree of genes and hubs
deg <- degree(graph)
cat("Degree of Graph", deg, "\n")
Degree of Graph 9 2 6 2 3 6 6 2 2 6 6 4 4 3 4 4 6 1 2 2 2 3 5 2 
hist(deg, col = "lightblue", main = "Degree Distribution", xlab = "Degree", ylab = "Frequency")

hubs <- sort(degree(graph), decreasing = TRUE)
print(hubs)
  COX1  HLA-A RPL13A RPL23A  RPL41  RPS18   TPT1 TMSB10   RPS3  RPS4X   RPSA 
     9      6      6      6      6      6      6      5      4      4      4 
TMSB4X  HLA-B    FTL  RPS17   HBA1    HBB   RPL3  RPS15    ND4  RPL32  RPL39 
     4      3      3      3      2      2      2      2      2      2      2 
   UBB  HLA-C 
     2      1 
# Centrality of Nodes
degree_centrality <- degree(graph, mode = "all")
top_degree <- sort(degree_centrality, decreasing = TRUE)[1:10]
cat("Degree Centrality:", top_degree, "\n")
Degree Centrality: 9 6 6 6 6 6 6 5 4 4 
betweenness_centrality <- betweenness(graph, directed = FALSE)
top_betweenness <- sort(betweenness_centrality, decreasing = TRUE)[1:10]
cat("Betwennes: ", top_betweenness, "\n")
Betwennes:  61.95 61.72857 53.69048 48.41667 42.49286 38.02381 34.45476 30.88571 21.25 20.92381 
closeness_centrality <- closeness(graph, normalized = TRUE)
top_closeness <- sort(closeness_centrality, decreasing = TRUE)[1:10]
cat("Closeness: ", top_closeness, "\n")
Closeness:  0.5111111 0.4893617 0.4791667 0.4509804 0.4509804 0.4423077 0.4423077 0.4339623 0.4107143 0.3898305 
clust_coeff <- transitivity(graph, type = "global")
cat("Clustering Coefficient:", clust_coeff, "\n")
Clustering Coefficient: 0.3728814 
communities <- cluster_fast_greedy(graph)
cat("Number of communities detected:", length(communities), "\n")
Number of communities detected: 4 
diameter_graph <- diameter(graph)
mean_distance <- mean_distance(graph)
cat("Network Diameter:", diameter_graph, "\n")
Network Diameter: 5 
cat("Average Path Length:", mean_distance, "\n")
Average Path Length: 2.688406 
assort <- assortativity_degree(graph)
cat("Assortativity:", assort, "\n")
Assortativity: -0.1542322 

3 Impact of Hub Genes

  • Hub genes are highly connected nodes with potential biological significance.

  • Removing hub genes tests how robust the network remains.

graph_no_hubs <- delete_vertices(graph, names(top_degree))
cat("New Network Size After Hub Removal:", vcount(graph_no_hubs), "\n")
New Network Size After Hub Removal: 14 

4 Small World Analysis

  • Comparison with Random Network: A random graph with the same nodes and edges is generated for benchmarking.

  • Clustering Coefficient (C): Measures the tendency of nodes to form tightly connected groups.

-   Higher clustering coefficient suggests modular structures in the network.
  • Average Path Length (L): Measures the average shortest path between nodes in the network.

  • Small-World Coefficient (σ):

    • Defined as σ = (Creal / Crand) / (Lreal / Lrand).

    • If σ > 1, the network exhibits small-world properties, meaning it is highly clustered with short path lengths.

# Generate a random graph with the same number of nodes & edges
random_graph <- sample_gnm(vcount(graph), ecount(graph))

# Compute clustering coefficient & path length
C_real <- transitivity(graph, type = "global")  # Clustering Coefficient (GRN)
C_rand <- transitivity(random_graph, type = "global")  # Clustering Coefficient (Random)

L_real <- mean_distance(graph)  # Average Path Length (GRN)
L_rand <- mean_distance(random_graph)  # Average Path Length (Random)

# Compute Small-World Coefficient
sigma <- (C_real / C_rand) / (L_real / L_rand)

cat("Small-World Coefficient (σ):", sigma, "\n")
Small-World Coefficient (σ): 1.27154 
if (sigma > 1) {
  cat("Your GRN exhibits small-world properties!\n")
} else {
  cat("Your GRN is not small-world. Consider checking the filtering thresholds.\n")
}
Your GRN exhibits small-world properties!

5 Scale Free Property

  • Degree Distribution: Measures how many connections (edges) each gene (node) has.
  • Power-Law Fitting: The network’s degree distribution is tested for a power-law relationship, which characterizes scale-free networks.
-   A power-law exponent (**α**) between **2 and 3** suggests **scale-free behavior** (i.e., a few hub genes with many connections and many genes with few connections).
  • Visualization:
-   The degree distribution is plotted on a **log-log scale**.

-   The **red power-law fit line** is overlaid to check if the network follows a power-law.
# Extract degree distribution
degree_dist <- degree(graph)

# Convert to table for frequency count
deg_freq <- as.data.frame(table(degree_dist))
colnames(deg_freq) <- c("degree", "count")

# Convert to numeric for log-log plot
deg_freq$degree <- as.numeric(as.character(deg_freq$degree))
deg_freq$count <- as.numeric(as.character(deg_freq$count))

# Fit power-law distribution
pl_fit <- displ$new(deg_freq$degree)
est <- estimate_xmin(pl_fit)
pl_fit$xmin <- est$xmin
pl_fit$pars <- estimate_pars(pl_fit)

# Plot log-log degree distribution
plot(deg_freq$degree, deg_freq$count, log="xy", col="blue",
     xlab="Degree (log)", ylab="Frequency (log)", main="Power-Law Distribution")

# Overlay power-law fit
lines(pl_fit, col="red")

cat("Estimated Power-Law Exponent (α):", pl_fit$pars, "\n")
Estimated Power-Law Exponent (α): 3.09813 
# Check if the network follows scale-free behavior
if (pl_fit$pars >= 2 & pl_fit$pars <= 3) {
  cat("Your GRN exhibits scale-free properties!\n")
} else {
  cat("Your GRN does NOT strictly follow power-law behavior.\n")
}
Your GRN does NOT strictly follow power-law behavior.

6 Network Neighborhood Analysis

  • Average Neighbor Degree: Computes how well-connected the neighbors of each gene are.

    • This helps in identifying whether hubs are interacting with other hubs or with low-degree genes.
  • Degree Assortativity Coefficient:

    • Measures whether highly connected genes tend to interact with other highly connected genes (assortative) or low-degree genes (disassortative).

    • Assortative Networks: Genes preferentially connect to other genes of similar degree.

    • Disassortative Networks: Hub genes tend to interact with many low-degree genes, which is typical in biological networks.

# Compute Average Neighbor Degree for each gene
avg_neighbor_degree <- knn(graph)$knn

# Plot degree vs. avg. neighbor degree
plot(degree(graph), avg_neighbor_degree, log="xy",
     xlab="Gene Degree (log)", ylab="Avg. Neighbor Degree (log)",
     main="Network Neighborhood Analysis", col="darkgreen")

# Compute Assortativity (Degree Correlation)
assortativity <- assortativity_degree(graph)

cat("Assortativity Coefficient:", assortativity, "\n")
Assortativity Coefficient: -0.1542322 
if (assortativity > 0) {
  cat("Genes preferentially connect to genes of **similar degree** (assortative network).\n")
} else {
  cat("Hub genes tend to connect with low-degree genes (disassortative network).\n")
}
Hub genes tend to connect with low-degree genes (disassortative network).

Results and Discussion

The gene regulatory network (GRN) analysis reveals a sparse yet well-structured network, consisting of 24 nodes and 46 edges, with an edge density of 0.1667. Compared to previous iterations, this refined network maintains a hub-centric topology, while also exhibiting an increase in connectivity and clustering.

🔹 Key Findings:

  • Hubs & Connectivity: COX1 (degree = 9) emerges as the most connected hub, while genes like HLA-A, RPL13A, and RPL23A (degree = 6) act as critical secondary nodes.

  • Centrality & Influence: High betweenness values for COX1 (61.95) and HLA-A (61.73) suggest their roles as major information flow regulators in the network.

  • Community Structure: The detection of 4 distinct communities implies the presence of functionally distinct gene modules, hinting at underlying biological processes.

  • Network Efficiency: With an average path length of 2.69 and diameter of 5, the network exhibits small-world properties, allowing efficient information propagation.

  • Robustness: The negative assortativity (-0.1542) suggests a hierarchical structure where hubs preferentially connect to low-degree nodes, a hallmark of biological networks.

  • Clustering & Modularity: The clustering coefficient of 0.3729 indicates a moderate level of local connectivity, reflecting modular functional interactions.

🔹 Impact of Hub Removal:
Upon removing key hub genes, the network size reduces to 14 nodes, reinforcing the importance of hubs in maintaining network integrity. This suggests that disrupting these highly connected genes could significantly affect the system’s overall functionality.

Conclusion

The constructed gene regulatory network (GRN) exhibits a hub-centric, modular, and small-world topology, with 24 nodes and 46 edges at an edge density of 0.1667. Key hub genes like COX1 (degree = 9), HLA-A, RPL13A, and RPL23A (degree = 6) play central roles in network connectivity, as reflected in their high betweenness centrality (max = 61.95), suggesting their importance in regulatory interactions. The presence of 4 distinct communities indicates functional modularity, while a clustering coefficient of 0.3729 suggests moderate local connectivity. The negative assortativity (-0.1542) confirms a hierarchical structure where hubs preferentially interact with low-degree nodes, typical of biological networks. With an average path length of 2.69 and a diameter of 5, the network retains efficient information flow, characteristic of robust gene interactions. Hub removal reduces the network to 14 nodes, reinforcing the critical role of central genes in maintaining structural stability. These findings highlight COX1, HLA-A, RPL13A, and RPL23A as key regulatory players, warranting further functional enrichment analysis to determine their biological relevance in the studied condition.

References

  1. Albert, R. (2005). Scale-free networks in cell biology. Journal of Cell Science, 118(21), 4947-4957. https://doi.org/10.1242/jcs.02714 2. Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509-512. https://doi.org/10.1126/science.286.5439.509
  2. Baranzini, S. E., & Oksenberg, J. R. (2017). The genetics of multiple sclerosis: From 0 to 200 in 50 years. Trends in Genetics, 33(12), 960-970.
  3. Bioconductor Project. (n.d.). Retrieved from https://www.bioconductor.org/
  4. Bioconductor GEOquery package. (n.d.). Retrieved from https://bioconductor.org/packages/release/bioc/html/GEOquery.html
  5. Bioinformatics Workbook. (n.d.). WGCNA Tutorial. Retrieved from https://bioinformaticsworkbook.org/tutorials/wgcna.html#gsc.tab=0
  6. Csardi, G., & Nepusz, T. (2006). The igraph software package for complex network research. InterJournal, Complex Systems, 1695. Retrieved from https://igraph.org
  7. Davis, S., & Meltzer, P. S. (2007). GEOquery: A bridge between the Gene Expression Omnibus (GEO) and BioConductor. Bioinformatics, 23(14), 1846-1847. https://doi.org/10.1093/bioinformatics/btm254
  8. DESeq2 Bioconductor package. (n.d.). Retrieved from https://bioconductor.org/packages/release/bioc/html/DESeq2.html
  9. Dobson, R., & Giovannoni, G. (2019). Multiple sclerosis – A review. European Journal of Neurology, 26(1), 27-40.
  10. Fortunato, S. (2010). Community detection in graphs. Physics Reports, 486(3-5), 75-174. https://doi.org/10.1016/j.physrep.2009.11.002
  11. GEO Database. (n.d.). Retrieved from https://www.ncbi.nlm.nih.gov/geo/
  12. Gentleman, R. (2008). R Programming for Bioinformatics. Chapman & Hall/CRC.
  13. Love, M. I., Huber, W., & Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15, 550. https://doi.org/10.1186/s13059-014-0550-8
  14. Margolin, A. A., et al. (2006). ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics, 7, S7. https://doi.org/10.1186/1471-2105-7-S1-S7
  15. Meyer, P. E., Lafitte, F., & Bontempi, G. (2008). minet: A R/Bioconductor package for mutual information based network inference. Bioinformatics, 24(3), 300-302. https://doi.org/10.1093/bioinformatics/btm554
  16. Newman, M. E. J. (2006). Modularity and community structure in networks. Proceedings of the National Academy of Sciences, 103(23), 8577-8582. https://doi.org/10.1073/pnas.0601602103
  17. Quarto Documentation. (n.d.). Retrieved from https://quarto.org/docs/
  18. Stuart, J. M., Segal, E., Koller, D., & Kim, S. K. (2003). A gene-coexpression network for global discovery of conserved genetic modules. Science, 302(5643), 249-255. https://doi.org/10.1126/science.1087447