# if (!require("BiocManager", quietly = TRUE))
# install.packages("BiocManager")
# BiocManager::install(version = "3.20")
# install.packages("tidyverse")
library(tidyverse) # For data manipulation
# BiocManager::install("GEOquery")
library(GEOquery) # For retrieving GEO datasets
# BiocManager::install("DESeq2")
library(DESeq2) # For gene expression dataset normalization and filtering
# install.packages("igraph")
library(igraph) # For network graph construction
# install.packages("data.table")
library(data.table) # For efficient data handling
# BiocManager::install("minet")
library(minet)
# install.packages("poweRlaw")
library(poweRlaw)
Reconstructing GRN with GEO datasets for Multiple Sclerosis
Introduction
Multiple sclerosis (MS) is a complex and debilitating neurological disorder that affects millions of people worldwide. As a student of bioinformatics with a keen interest in genomics and computational biology, I chose to explore MS for my assignment due to its intricate genetic and environmental interactions. The disease’s multifactorial nature makes it a compelling subject for bioinformatics-driven investigations, such as transcriptomic and genomic analyses, which can provide insights into disease progression, potential biomarkers, and therapeutic targets.
MS is an autoimmune condition where the immune system mistakenly attacks the myelin sheath surrounding nerve fibers, leading to inflammation, neurodegeneration, and progressive disability (Dobson & Giovannoni, 2019). Despite extensive research, the exact etiology of MS remains elusive, necessitating continued exploration using modern computational techniques. By leveraging high-throughput sequencing data and machine learning algorithms, bioinformatics plays a crucial role in identifying genetic variants, transcriptomic changes, and molecular pathways involved in MS pathogenesis (Baranzini & Oksenberg, 2017).
Furthermore, given the increasing accessibility of publicly available transcriptomic datasets, studying MS offers an opportunity to apply data science methodologies to real-world biomedical challenges. This assignment aims to integrate bioinformatics tools to analyze MS-related gene expression patterns, contributing to a deeper understanding of its molecular basis.
Methodology
Package Installation and Setup
1 Dataset Retrieval and Preprocessing
GEOquery is used to download datasets using their accession IDs.
The dataset is converted into an expression matrix.
Metadata is processed to map probe IDs to gene symbols.
Genes with multiple probe IDs are aggregated by averaging expression values.
Missing values are handled by replacing them with zero to maintain matrix integrity.
# Function to load and preprocess GEO dataset
<- function(accession_id) {
load_dataset cat("Retrieving", accession_id, "from GEO.\n")
# Retrieve GEO dataset
<- getGEO(accession_id)
gds <- GDS2eSet(gds)
gds_set
# Extract gene expression matrix
<- exprs(gds_set)
expression
# Extract and clean metadata (gene annotations)
<- fData(gds_set)
annot <- annot[, c("ID", "Gene symbol")]
annot
# Ensure that if multiple gene symbols are listed, only the last one is retained
$`Gene symbol` <- sapply(str_split(annot$`Gene symbol`, "///"), function(x) trimws(tail(x, 1)))
annot
# Convert expression matrix into a tidy dataframe
<- as.data.frame(expression) |>
expression_df rownames_to_column(var = "Probe_ID") |>
merge(annot, by.x = "Probe_ID", by.y = "ID", all.x = TRUE) |>
drop_na(`Gene symbol`) |>
::select(-Probe_ID) |>
dplyrgroup_by(`Gene symbol`) |>
summarise(across(where(is.numeric), \(x) mean(x, na.rm = TRUE))) |> # Aggregate probes by averaging expression
column_to_rownames(var = "Gene symbol")
# Remove genes that have all missing values across samples
<- expression_df[(rowSums(is.na(expression_df)) < ncol(expression_df)) & row.names(expression_df) != "", ]
expression_df
# Replace remaining NA values with 0 for consistency
is.na(expression_df)] <- 0
expression_df[
cat("Loaded", accession_id, "as a dataframe.\n")
return(expression_df)
}
# Load multiple GEO datasets and store them in a list
<- list(
datasets load_dataset("GDS4994"),
load_dataset("GDS4218"),
load_dataset("GDS3920"),
load_dataset("GDS4152"),
load_dataset("GDS4151"),
load_dataset("GDS3886"),
load_dataset("GDS4150"),
load_dataset("GDS4147"),
load_dataset("GDS4146"),
load_dataset("GDS4145"),
load_dataset("GDS4142"),
load_dataset("GDS2978"),
load_dataset("GDS2419")
)
Retrieving GDS4994 from GEO.
Loaded GDS4994 as a dataframe.
Retrieving GDS4218 from GEO.
Using locally cached version of GPL570 found here:
/tmp/RtmpkCAMQk/GPL570.annot.gz
Loaded GDS4218 as a dataframe.
Retrieving GDS3920 from GEO.
Using locally cached version of GPL570 found here:
/tmp/RtmpkCAMQk/GPL570.annot.gz
Loaded GDS3920 as a dataframe.
Retrieving GDS4152 from GEO.
Using locally cached version of GPL570 found here:
/tmp/RtmpkCAMQk/GPL570.annot.gz
Loaded GDS4152 as a dataframe.
Retrieving GDS4151 from GEO.
Loaded GDS4151 as a dataframe.
Retrieving GDS3886 from GEO.
Using locally cached version of GPL6244 found here:
/tmp/RtmpkCAMQk/GPL6244.annot.gz
Loaded GDS3886 as a dataframe.
Retrieving GDS4150 from GEO.
Using locally cached version of GPL570 found here:
/tmp/RtmpkCAMQk/GPL570.annot.gz
Loaded GDS4150 as a dataframe.
Retrieving GDS4147 from GEO.
Using locally cached version of GPL570 found here:
/tmp/RtmpkCAMQk/GPL570.annot.gz
Loaded GDS4147 as a dataframe.
Retrieving GDS4146 from GEO.
Loaded GDS4146 as a dataframe.
Retrieving GDS4145 from GEO.
Loaded GDS4145 as a dataframe.
Retrieving GDS4142 from GEO.
Using locally cached version of GPL570 found here:
/tmp/RtmpkCAMQk/GPL570.annot.gz
Loaded GDS4142 as a dataframe.
Retrieving GDS2978 from GEO.
Using locally cached version of GPL96 found here:
/tmp/RtmpkCAMQk/GPL96.annot.gz
Loaded GDS2978 as a dataframe.
Retrieving GDS2419 from GEO.
Loaded GDS2419 as a dataframe.
2 Data Normalization
If datasets are not normalized, DESeq2 is used to apply variance-stabilizing transformation (VST).
Highly variable genes (top 0.1%) are selected to focus on biologically significant changes.
# Function to filter top 0.1% most variable genes in the merged dataset
<- function(expression, variance = 0.999) {
filter_top_variable_genes <- as.matrix(expression) # Ensure matrix format
expression_matrix
# Ensure row names exist and store them
if (is.null(rownames(expression_matrix))) {
stop("Error: Row names (gene symbols) are missing in the dataset!")
}
<- rownames(expression_matrix)
gene_symbols
# Calculate variance across genes
<- rowVars(expression_matrix)
rv_wpn
# Select top 0.1% most variable genes
<- quantile(rv_wpn, variance)
q_wpn <- rv_wpn > q_wpn # Logical vector of selected genes
selected_genes
# Apply filtering and explicitly assign row names
<- expression_matrix[selected_genes, , drop = FALSE]
expr_filtered
# Explicitly ensure rownames are character strings, not numeric
rownames(expr_filtered) <- as.character(gene_symbols[selected_genes])
return(expr_filtered)
}
# Function to normalize gene expression data
<- function(expression) {
normalize_expression # Explicitly ensure gene symbols are character type before any processing
if (!is.null(rownames(expression))) {
<- as.character(rownames(expression))
gene_symbols else {
} stop("Error: Input data must have gene symbols as row names!")
}
# If dataset is already normalized, skip DESeq2 processing but still filter
if (max(expression) < 100) {
message("Dataset appears to be pre-normalized. Skipping DESeq2 processing.")
<- filter_top_variable_genes(expression)
result
# Explicitly ensure rownames are preserved as character
rownames(result) <- as.character(rownames(result))
return(result)
}
# Ensure integer values for DESeq2 processing
<- round(expression)
de_input
# Create DESeq2 dataset for normalization
<- DESeqDataSetFromMatrix(
dds countData = de_input,
colData = data.frame(row.names = colnames(de_input)),
design = ~1
)
# Explicitly set rownames of dds to the original gene symbols
rownames(dds) <- gene_symbols
# Estimate size factors for normalization
<- estimateSizeFactors(dds)
dds
# Handle potential errors in DESeq processing
tryCatch({
<- DESeq(dds)
dds error = function(e) {
}, message("Standard DESeq dispersion estimation failed. Using gene-wise dispersion estimates...")
<- estimateDispersionsGeneEst(dds)
dds dispersions(dds) <- mcols(dds)$dispGeneEst
})
# Apply variance-stabilizing transformation
<- getVarianceStabilizedData(dds)
wpn_vsd
# Explicitly ensure rownames are preserved as character
rownames(wpn_vsd) <- as.character(gene_symbols)
<- filter_top_variable_genes(wpn_vsd)
result
return(result)
}
3 Merging Multiple Datasets
Expression matrices from multiple datasets are merged based on gene symbols.
Data tables (data.table) are used to efficiently combine large matrices.
After merging, a normalization step is applied to remove batch effects.
# Function to merge multiple datasets into a single matrix
<- function(datasets) {
merge_expression_data # Convert each dataset into a data.table for efficient merging
<- lapply(datasets, function(df) setDT(as.data.frame(df), keep.rownames = TRUE))
datasets
# Merge all datasets on gene names using an outer join
<- Reduce(function(x, y) merge(x, y, by = "rn", all = TRUE), datasets)
merged_data
# Convert back to data frame and set gene names as row names
<- as.data.frame(merged_data)
merged_data rownames(merged_data) <- merged_data$rn
$rn <- NULL # Remove redundant column
merged_data
return(merged_data)
}
# Merge all datasets into a single matrix
<- merge_expression_data(datasets)
merged_expression
is.na(merged_expression)] <- 0
merged_expression[
<- normalize_expression(merged_expression) nm_expression
converting counts to integer mode
Warning in DESeq(dds): the design is ~ 1 (just an intercept). is this intended?
using pre-existing size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
-- replacing outliers and refitting for 367 genes
-- DESeq argument 'minReplicatesForReplace' = 7
-- original counts are preserved in counts(dds)
estimating dispersions
fitting model and testing
# Check dimensions and summary of merged dataset
dim(nm_expression)
[1] 24 422
Network Construction
1 Mutual Information Calculation
The Mutual Information Matrix (MIM) is calculated using the minet package.
This captures statistical dependencies between gene expression profiles.
# Transpose the matrix for network inference
<- t(nm_expression)
normalized_expression
# Compute Mutual Information Matrix for gene-gene interactions
<- minet::build.mim(normalized_expression) mi_matrix
2 Network Inference
The ARACNe algorithm is applied to infer gene-gene interactions.
Weak interactions (below a threshold) are removed to retain strong regulatory relationships.
The result is an adjacency matrix where edges represent inferred regulatory interactions.
# Infer regulatory interactions using ARACNe algorithm with stricter thresholding
<- minet::aracne(mi_matrix, eps = 0.05) network
3 Network Graph Construction
The adjacency matrix is converted into an edge list for visualization.
The network is modeled using igraph, treating genes as nodes and interactions as edges.
A force-directed layout (Fruchterman-Reingold) is used for graph visualization.
# Extract edges from the adjacency matrix
<- which(network != 0, arr.ind = TRUE) |>
network_edges as.data.frame() |>
setNames(c("Gene1", "Gene2")) |>
distinct(Gene1, Gene2, .keep_all = TRUE)
# Convert row indices to gene names
$Gene1 <- rownames(nm_expression)[network_edges$Gene1]
network_edges$Gene2 <- rownames(nm_expression)[network_edges$Gene2]
network_edges
# Assign edge weights based on interaction strength
$Weight <- network[cbind(network_edges$Gene1, network_edges$Gene2)]
network_edges
# Ensure unique edges by ordering gene pairs alphabetically
<- network_edges |>
network_edges mutate(
Gene1 = pmin(Gene1, Gene2),
Gene2 = pmax(Gene1, Gene2)
|>
) distinct(Gene1, Gene2, .keep_all = TRUE) |>
filter(Gene1 != Gene2) # Remove self-interactions
# Display final network size
dim(network_edges)
[1] 46 3
# Construct the gene interaction graph
<- graph_from_data_frame(network_edges, directed = FALSE)
graph
# Compute edge density (indicates network sparsity)
cat("Edge Density:", edge_density(graph))
Edge Density: 0.1666667
Visualization and Analysis
1 Network Visualization
Nodes represent genes, and edges represent inferred regulatory interactions.
Edge width is scaled based on interaction strength.
Self-loops are removed to focus on inter-gene interactions.
# Visualize the network graph
plot(graph,
layout = layout_with_fr(graph), # Force-directed layout for better visualization
vertex.size = 5,
vertex.label.dist = 1.5,
edge.width = network_edges$Weight * 5, # Scale edge width based on interaction strength
main = "GRN of Multiple Sclerosis",
margin = 0,
edge.color = "gray")
2 Topological Analysis
Node Degree – Measures how many connections each gene has.
Betweenness Centrality – Identifies genes that act as bridges between communities.
Closeness Centrality – Determines how easily a gene can influence others.
Clustering Coefficient – Measures local connectivity (tendency to form clusters).
Network Density – Indicates how interconnected the network is.
Community Detection – Identifies functional gene clusters using fast-greedy clustering.
Network Diameter – Longest shortest path in the network, showing maximum interaction distance.
Assortativity – Measures whether highly connected genes tend to interact with other highly connected genes.
# Basic Network Properties
cat("Basic Network Properties:\n")
Basic Network Properties:
cat("Nodes:", vcount(graph), "\nEdges:", ecount(graph), "\n")
Nodes: 24
Edges: 46
cat("Edge Density:", edge_density(graph), "\n")
Edge Density: 0.1666667
# Degree of genes and hubs
<- degree(graph)
deg cat("Degree of Graph", deg, "\n")
Degree of Graph 9 2 6 2 3 6 6 2 2 6 6 4 4 3 4 4 6 1 2 2 2 3 5 2
hist(deg, col = "lightblue", main = "Degree Distribution", xlab = "Degree", ylab = "Frequency")
<- sort(degree(graph), decreasing = TRUE)
hubs print(hubs)
COX1 HLA-A RPL13A RPL23A RPL41 RPS18 TPT1 TMSB10 RPS3 RPS4X RPSA
9 6 6 6 6 6 6 5 4 4 4
TMSB4X HLA-B FTL RPS17 HBA1 HBB RPL3 RPS15 ND4 RPL32 RPL39
4 3 3 3 2 2 2 2 2 2 2
UBB HLA-C
2 1
# Centrality of Nodes
<- degree(graph, mode = "all")
degree_centrality <- sort(degree_centrality, decreasing = TRUE)[1:10]
top_degree cat("Degree Centrality:", top_degree, "\n")
Degree Centrality: 9 6 6 6 6 6 6 5 4 4
<- betweenness(graph, directed = FALSE)
betweenness_centrality <- sort(betweenness_centrality, decreasing = TRUE)[1:10]
top_betweenness cat("Betwennes: ", top_betweenness, "\n")
Betwennes: 61.95 61.72857 53.69048 48.41667 42.49286 38.02381 34.45476 30.88571 21.25 20.92381
<- closeness(graph, normalized = TRUE)
closeness_centrality <- sort(closeness_centrality, decreasing = TRUE)[1:10]
top_closeness cat("Closeness: ", top_closeness, "\n")
Closeness: 0.5111111 0.4893617 0.4791667 0.4509804 0.4509804 0.4423077 0.4423077 0.4339623 0.4107143 0.3898305
<- transitivity(graph, type = "global")
clust_coeff cat("Clustering Coefficient:", clust_coeff, "\n")
Clustering Coefficient: 0.3728814
<- cluster_fast_greedy(graph)
communities cat("Number of communities detected:", length(communities), "\n")
Number of communities detected: 4
<- diameter(graph)
diameter_graph <- mean_distance(graph)
mean_distance cat("Network Diameter:", diameter_graph, "\n")
Network Diameter: 5
cat("Average Path Length:", mean_distance, "\n")
Average Path Length: 2.688406
<- assortativity_degree(graph)
assort cat("Assortativity:", assort, "\n")
Assortativity: -0.1542322
3 Impact of Hub Genes
Hub genes are highly connected nodes with potential biological significance.
Removing hub genes tests how robust the network remains.
<- delete_vertices(graph, names(top_degree))
graph_no_hubs cat("New Network Size After Hub Removal:", vcount(graph_no_hubs), "\n")
New Network Size After Hub Removal: 14
4 Small World Analysis
Comparison with Random Network: A random graph with the same nodes and edges is generated for benchmarking.
Clustering Coefficient (C): Measures the tendency of nodes to form tightly connected groups.
- Higher clustering coefficient suggests modular structures in the network.
Average Path Length (L): Measures the average shortest path between nodes in the network.
Small-World Coefficient (σ):
Defined as σ = (Creal / Crand) / (Lreal / Lrand).
If σ > 1, the network exhibits small-world properties, meaning it is highly clustered with short path lengths.
# Generate a random graph with the same number of nodes & edges
<- sample_gnm(vcount(graph), ecount(graph))
random_graph
# Compute clustering coefficient & path length
<- transitivity(graph, type = "global") # Clustering Coefficient (GRN)
C_real <- transitivity(random_graph, type = "global") # Clustering Coefficient (Random)
C_rand
<- mean_distance(graph) # Average Path Length (GRN)
L_real <- mean_distance(random_graph) # Average Path Length (Random)
L_rand
# Compute Small-World Coefficient
<- (C_real / C_rand) / (L_real / L_rand)
sigma
cat("Small-World Coefficient (σ):", sigma, "\n")
Small-World Coefficient (σ): 1.27154
if (sigma > 1) {
cat("Your GRN exhibits small-world properties!\n")
else {
} cat("Your GRN is not small-world. Consider checking the filtering thresholds.\n")
}
Your GRN exhibits small-world properties!
5 Scale Free Property
- Degree Distribution: Measures how many connections (edges) each gene (node) has.
- Power-Law Fitting: The network’s degree distribution is tested for a power-law relationship, which characterizes scale-free networks.
- A power-law exponent (**α**) between **2 and 3** suggests **scale-free behavior** (i.e., a few hub genes with many connections and many genes with few connections).
- Visualization:
- The degree distribution is plotted on a **log-log scale**.
- The **red power-law fit line** is overlaid to check if the network follows a power-law.
# Extract degree distribution
<- degree(graph)
degree_dist
# Convert to table for frequency count
<- as.data.frame(table(degree_dist))
deg_freq colnames(deg_freq) <- c("degree", "count")
# Convert to numeric for log-log plot
$degree <- as.numeric(as.character(deg_freq$degree))
deg_freq$count <- as.numeric(as.character(deg_freq$count))
deg_freq
# Fit power-law distribution
<- displ$new(deg_freq$degree)
pl_fit <- estimate_xmin(pl_fit)
est $xmin <- est$xmin
pl_fit$pars <- estimate_pars(pl_fit)
pl_fit
# Plot log-log degree distribution
plot(deg_freq$degree, deg_freq$count, log="xy", col="blue",
xlab="Degree (log)", ylab="Frequency (log)", main="Power-Law Distribution")
# Overlay power-law fit
lines(pl_fit, col="red")
cat("Estimated Power-Law Exponent (α):", pl_fit$pars, "\n")
Estimated Power-Law Exponent (α): 3.09813
# Check if the network follows scale-free behavior
if (pl_fit$pars >= 2 & pl_fit$pars <= 3) {
cat("Your GRN exhibits scale-free properties!\n")
else {
} cat("Your GRN does NOT strictly follow power-law behavior.\n")
}
Your GRN does NOT strictly follow power-law behavior.
6 Network Neighborhood Analysis
Average Neighbor Degree: Computes how well-connected the neighbors of each gene are.
- This helps in identifying whether hubs are interacting with other hubs or with low-degree genes.
Degree Assortativity Coefficient:
Measures whether highly connected genes tend to interact with other highly connected genes (assortative) or low-degree genes (disassortative).
Assortative Networks: Genes preferentially connect to other genes of similar degree.
Disassortative Networks: Hub genes tend to interact with many low-degree genes, which is typical in biological networks.
# Compute Average Neighbor Degree for each gene
<- knn(graph)$knn
avg_neighbor_degree
# Plot degree vs. avg. neighbor degree
plot(degree(graph), avg_neighbor_degree, log="xy",
xlab="Gene Degree (log)", ylab="Avg. Neighbor Degree (log)",
main="Network Neighborhood Analysis", col="darkgreen")
# Compute Assortativity (Degree Correlation)
<- assortativity_degree(graph)
assortativity
cat("Assortativity Coefficient:", assortativity, "\n")
Assortativity Coefficient: -0.1542322
if (assortativity > 0) {
cat("Genes preferentially connect to genes of **similar degree** (assortative network).\n")
else {
} cat("Hub genes tend to connect with low-degree genes (disassortative network).\n")
}
Hub genes tend to connect with low-degree genes (disassortative network).
Results and Discussion
The gene regulatory network (GRN) analysis reveals a sparse yet well-structured network, consisting of 24 nodes and 46 edges, with an edge density of 0.1667. Compared to previous iterations, this refined network maintains a hub-centric topology, while also exhibiting an increase in connectivity and clustering.
🔹 Key Findings:
Hubs & Connectivity: COX1 (degree = 9) emerges as the most connected hub, while genes like HLA-A, RPL13A, and RPL23A (degree = 6) act as critical secondary nodes.
Centrality & Influence: High betweenness values for COX1 (61.95) and HLA-A (61.73) suggest their roles as major information flow regulators in the network.
Community Structure: The detection of 4 distinct communities implies the presence of functionally distinct gene modules, hinting at underlying biological processes.
Network Efficiency: With an average path length of 2.69 and diameter of 5, the network exhibits small-world properties, allowing efficient information propagation.
Robustness: The negative assortativity (-0.1542) suggests a hierarchical structure where hubs preferentially connect to low-degree nodes, a hallmark of biological networks.
Clustering & Modularity: The clustering coefficient of 0.3729 indicates a moderate level of local connectivity, reflecting modular functional interactions.
🔹 Impact of Hub Removal:
Upon removing key hub genes, the network size reduces to 14 nodes, reinforcing the importance of hubs in maintaining network integrity. This suggests that disrupting these highly connected genes could significantly affect the system’s overall functionality.
Conclusion
The constructed gene regulatory network (GRN) exhibits a hub-centric, modular, and small-world topology, with 24 nodes and 46 edges at an edge density of 0.1667. Key hub genes like COX1 (degree = 9), HLA-A, RPL13A, and RPL23A (degree = 6) play central roles in network connectivity, as reflected in their high betweenness centrality (max = 61.95), suggesting their importance in regulatory interactions. The presence of 4 distinct communities indicates functional modularity, while a clustering coefficient of 0.3729 suggests moderate local connectivity. The negative assortativity (-0.1542) confirms a hierarchical structure where hubs preferentially interact with low-degree nodes, typical of biological networks. With an average path length of 2.69 and a diameter of 5, the network retains efficient information flow, characteristic of robust gene interactions. Hub removal reduces the network to 14 nodes, reinforcing the critical role of central genes in maintaining structural stability. These findings highlight COX1, HLA-A, RPL13A, and RPL23A as key regulatory players, warranting further functional enrichment analysis to determine their biological relevance in the studied condition.
References
- Albert, R. (2005). Scale-free networks in cell biology. Journal of Cell Science, 118(21), 4947-4957. https://doi.org/10.1242/jcs.02714 2. Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509-512. https://doi.org/10.1126/science.286.5439.509
- Baranzini, S. E., & Oksenberg, J. R. (2017). The genetics of multiple sclerosis: From 0 to 200 in 50 years. Trends in Genetics, 33(12), 960-970.
- Bioconductor Project. (n.d.). Retrieved from https://www.bioconductor.org/
- Bioconductor GEOquery package. (n.d.). Retrieved from https://bioconductor.org/packages/release/bioc/html/GEOquery.html
- Bioinformatics Workbook. (n.d.). WGCNA Tutorial. Retrieved from https://bioinformaticsworkbook.org/tutorials/wgcna.html#gsc.tab=0
- Csardi, G., & Nepusz, T. (2006). The igraph software package for complex network research. InterJournal, Complex Systems, 1695. Retrieved from https://igraph.org
- Davis, S., & Meltzer, P. S. (2007). GEOquery: A bridge between the Gene Expression Omnibus (GEO) and BioConductor. Bioinformatics, 23(14), 1846-1847. https://doi.org/10.1093/bioinformatics/btm254
- DESeq2 Bioconductor package. (n.d.). Retrieved from https://bioconductor.org/packages/release/bioc/html/DESeq2.html
- Dobson, R., & Giovannoni, G. (2019). Multiple sclerosis – A review. European Journal of Neurology, 26(1), 27-40.
- Fortunato, S. (2010). Community detection in graphs. Physics Reports, 486(3-5), 75-174. https://doi.org/10.1016/j.physrep.2009.11.002
- GEO Database. (n.d.). Retrieved from https://www.ncbi.nlm.nih.gov/geo/
- Gentleman, R. (2008). R Programming for Bioinformatics. Chapman & Hall/CRC.
- Love, M. I., Huber, W., & Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15, 550. https://doi.org/10.1186/s13059-014-0550-8
- Margolin, A. A., et al. (2006). ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics, 7, S7. https://doi.org/10.1186/1471-2105-7-S1-S7
- Meyer, P. E., Lafitte, F., & Bontempi, G. (2008). minet: A R/Bioconductor package for mutual information based network inference. Bioinformatics, 24(3), 300-302. https://doi.org/10.1093/bioinformatics/btm554
- Newman, M. E. J. (2006). Modularity and community structure in networks. Proceedings of the National Academy of Sciences, 103(23), 8577-8582. https://doi.org/10.1073/pnas.0601602103
- Quarto Documentation. (n.d.). Retrieved from https://quarto.org/docs/
- Stuart, J. M., Segal, E., Koller, D., & Kim, S. K. (2003). A gene-coexpression network for global discovery of conserved genetic modules. Science, 302(5643), 249-255. https://doi.org/10.1126/science.1087447