Reactome analysis of Pseudo-time-series expression matrix FL vs GC

Log2 foldchange data from "FL_vs_GC_OSD-37OSD-38OSD-120OSD-217OSD-321_DGE.csv" was analysed using Plant Reactome (Data in tabel in AWG repo).

Reactome created a detailed list of pathways, reactions, and entities involved in various biological processes, specifically from the plant species Arabidopsis thaliana.

reactome: Identifier for the pathway.
Pathway name: Describes the biological pathway.
#Entities found: The number of entities found in this pathway.
#Entities total: The total number of entities in the pathway.
#Interactors found: The number of interactors found that interact with the entities.
#Interactors total: The total number of interactors in the pathway.
Entities ratio: The ratio of entities found to entities total.
Entities pValue, Entities FDR: Statistical measures indicating the significance and the false discovery rate of the findings.
#Reactions found, #Reactions total: Number of biochemical reactions found versus the total considered.
Reactions ratio: Ratio of reactions found to reactions total.
col1: Unclear column, possibly additional data or a placeholder.
Species identifier: Numerical identifier for Arabidopsis thaliana.
Species name: The scientific name of the plant.
Submitted entities found: List of entities that were found based on the submitted query.
Mapped entities: Entities that could be mapped to known entities in the database.
Submitted entities hit interactor: Entities that interact with the submitted entities.
Interacts with: Describes other entities that interact with the main entity.
Found reaction identifiers: Identifiers for reactions that were found in the database.

Dataset Overview

This dataset includes several columns related to "entities" and "reactions" in a study, each showing a strong tendency towards right-skewness, except for the 'entities_p_value' column which is left-skewed. Below is a simplified overview of each column's summary and their relationships.

General Patterns:

Right-Skewed Distributions: Most columns, except 'entities_p_value', show a pattern where there are a few very high values that pull the average up, while most of the data cluster at the lower end of the scale.
High Correlations: Several pairs of columns are highly correlated, meaning as the value in one column increases, the value in the other column tends to increase in a predictable way.

Reactome Restuls Detailed Summary:

x_entities_found:
- Mean: 8.87, SD: 29.2 -> Highly correlated with 'x_entities_total', 'x_interactors_found', 'entities_ratio', 'x_reactions_found', and 'x_reactions_total' (all R > 0.85).
x_entities_total:
- Mean: 20.83, SD: 71.5 -> Close relationship especially with 'x_entities_found' and 'entities_ratio' (R = 0.99 and R = 1 respectively).
x_interactors_found:
- Mean: 34.98, SD: 92.25 -> Strongly related to 'x_interactors_total' (R=0.99) and has significant but slightly lower correlations with 'x_entities_found' and 'x_entities_total'.
x_interactors_total:
- Mean: 127.63, SD: 345.02. -> Very high correlation with 'x_interactors_found' (R=0.99).
entities_ratio:
- Mean: 0.02, SD: 0.06
- Perfect correlation with 'x_entities_total' (R=1) and very high with others like 'x_entities_found', 'x_reactions_found', and 'x_reactions_total'.
entities_p_value:
- Mean: 0.89, SD: 0.2 -> Left-skewed with a moderate correlation to 'entities_fdr' (R=0.52).
entities_fdr:
- Mean: 1, SD: 0 -> Symmetric distribution, directly correlated with 'entities_p_value'.
x_reactions_found and x_reactions_total:
- Means: 7.9 and 10.81, SDs: 29.62 and 41.01 -> Both showcase right-skewness with very high correlations with 'x_entities_found', 'entities_ratio', among others (R close to or equal to 1).
reactions_ratio:
- Mean: 0.01, SD: 0.06 -> Almost perfectly matches the pattern of correlation observed with 'x_reactions_found' and 'x

Selecting the top 20 pathways based on '#Entities found'

```

import matplotlib.pyplot as plt

top_pathways_plot_data = data.sort_values(by='#Entities found', ascending=False).head(20).Creating the bar plot" plt.figure(figsize=(10, 8)) plt.barh(top_pathways_plot_data['Pathway name'],

top_pathways_plot_data['#Entities found'], color='skyblue') plt.xlabel('Number of Entities Found') plt.ylabel('Pathway Name') plt.title('Top 20 Pathways by Number of Entities Found') plt.gca().invert_yaxis() # Invert y-axis to have the pathway with the most entities on top plt.show()

```

Alternatively, rank and plot both the number of reactions found to change compared to those that are known to exist in total. This reveals a broad change in ~75perent of metabolic genes.

library(ggplot2)
library(dplyr)

# Assuming 'pathway_name' is a column in df that contains the names of the pathways.
# If the column has a different name, replace 'pathway_name' with the correct column name.
# Also assuming that "#Entities" refers to 'x_entities_found'.
# If it refers to a different column, replace 'x_entities_found' with the correct column name.

# Filter the top 20 pathways with the largest "#Entities"
top_pathways <- df %>%
  arrange(desc(x_entities_found)) %>%
  slice(1:20)

# Create a long format data frame for plotting with ggplot2
top_pathways_long <- reshape2::melt(top_pathways, id.vars = "pathway_name", measure.vars = c("x_reactions_found", "x_reactions_total"))

# Create the bar chart for the top 20 pathways
ggplot(top_pathways_long, aes(x = reorder(pathway_name, value), y = value, fill = variable)) +
  geom_bar(stat = "identity", position = "dodge") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  labs(x = "Pathway Name", y = "Number of Reactions", fill = "Measure") +
  scale_fill_manual(values = c("x_reactions_found" = "blue", "x_reactions_total" = "red"))

library(ggplot2)
library(dplyr)

# Assuming 'pathway_name' is a column in df that contains the names of the pathways.
# If the column has a different name, replace 'pathway_name' with the correct column name.
# Also assuming that "#Entities" refers to 'x_entities_found'.
# If it refers to a different column, replace 'x_entities_found' with the correct column name.

# Filter the top 20 pathways with the largest "#Entities"
top_pathways <- df %>%
  arrange(desc(x_entities_found)) %>%
  slice(1:20)

# Normalize the reactions found and total by dividing by the maximum value in the subset
top_pathways$x_reactions_found_normalized <- top_pathways$x_reactions_found / max(top_pathways$x_reactions_found)
top_pathways$x_reactions_total_normalized <- top_pathways$x_reactions_total / max(top_pathways$x_reactions_total)

# Create a long format data frame for plotting with ggplot2
top_pathways_long_normalized <- reshape2::melt(top_pathways, id.vars = "pathway_name", measure.vars = c("x_reactions_found_normalized", "x_reactions_total_normalized"))

# Create the normalized bar chart for the top 20 pathways
ggplot(top_pathways_long_normalized, aes(x = reorder(pathway_name, value), y = value, fill = variable)) +
  geom_bar(stat = "identity", position = "dodge") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  labs(x = "Pathway Name", y = "Normalized Number of Reactions", fill = "Measure") +
  scale_fill_manual(values = c("x_reactions_found_normalized" = "blue", "x_reactions_total_normalized" = "red"))

Network graph visualizulisation

Network graph visualizing the connections between the "Pathway name" and the associated genes from the "Submitted entities found" field. The sizes of the pathway nodes are determined by the number of entities found, and the edges are colored on a gradient from red (lower entities ratio) to blue (higher entities ratio), with arrows at both ends to indicate bidirectional relationships.

This visualization provides a useful overview of how different genes are interconnected with specific metabolic pathways in Arabidopsis thaliana. Such a graph can help in identifying key pathways with numerous genetic interactions, which could be critical for deeper metabolic studies or genetic manipulation projects.

Correcting the handling of the 'Submitted entities found' field to manage non-string entries

for index, row in data.iterrows(): pathway = row['Pathway name'] entities_found = row['#Entities found'] entities_ratio = row['Entities ratio']

# Check if 'Submitted entities found' is a string; if not, continue to next iteration
if not isinstance(row['Submitted entities found'], str):
    continue

genes = row['Submitted entities found'].split(';')

# Add the pathway node with size attribute
G.add_node(pathway, size=entities_found*10, type='pathway')  # scaled size for better visualization

# Add gene nodes and edges
for gene in genes:
    G.add_node(gene, type='gene')
    # Adding edges with color attribute based on entities_ratio
    # Colors are mapped from red (low) to blue (high) based on entities ratio
    G.add_edge(pathway, gene, color=mcolors.to_hex(plt.cm.RdBu(entities_ratio)), weight=2)
    G.add_edge(gene, pathway, color=mcolors.to_hex(plt.cm.RdBu(entities_ratio)), weight=2)

`Drawing the corrected graph`

pos = nx.spring_layout(G) # positions for all nodes

`Draw nodes with scaling for pathway nodes`

nodes = nx.draw_networkx_nodes(G, pos, node_size=[G.nodes[n]['size'] if 'size' in G.nodes[n] else 100 for n in G.nodes], node_color=['skyblue' if G.nodes[n]['type'] == 'pathway' else 'lightgreen' for n in G.nodes])

`Draw edges`

edges = G.edges(data=True) nx.draw_networkx_edges(G, pos, edgelist=edges, arrows=True, arrowstyle='->', arrowsize=10, edge_color=[e[2]['color'] for e in edges], style='solid')

`Draw labels`

nx.draw_networkx_labels(G, pos, font_size=8, font_family='sans-serif')

plt.title('Pathway-Gene Network') plt.axis('off') # Turn off the axis plt.show()

```

Extract names of the top 20 pathways from the earlier bar plot data

top_pathways_names = set(top_pathways_plot_data['Pathway name'])

Creating a new graph

G_top_pathways = nx.Graph()

Reuse the pathway_genes dictionary to add nodes and check for shared genes

for pathway1, genes1 in pathway_genes.items(): for pathway2, genes2 in pathway_genes.items(): if pathway1 != pathway2 and not genes1.isdisjoint(genes2): G_top_pathways.add_edge(pathway1, pathway2)

Drawing the graph

pos_top = nx.spring_layout(G_top_pathways) # positions for all nodes

Draw nodes, only labeling the top 20 pathways

nx.draw_networkx_nodes(G_top_pathways, pos_top, node_size=700, node_color='skyblue') nx.draw_networkx_labels(G_top_pathways, pos_top, labels={n: n if n in top_pathways_names else '' for n in G_top_pathways.nodes}, font_size=9)

Draw edges

nx.draw_networkx_edges(G_top_pathways, pos_top, edge_color='gray', alpha=0.5)

plt.title('Top 20 Pathways Connectivity via Shared Genes') plt.axis('off') # Turn off the axis plt.show()

```

import matplotlib.colors as mcolors

Getting the range for "#Reactions found" to set the color scale

reactions_range = data['#Reactions found'].max() - data['#Reactions found'].min()

Creating a new graph with enhanced layout to minimize label overlap

G_enhanced = nx.Graph()

Add nodes and edges as before, but now with additional attributes for visualization

for pathway1, genes1 in pathway_genes.items(): for pathway2, genes2 in pathway_genes.items(): if pathway1 != pathway2 and not genes1.isdisjoint(genes2): G_enhanced.add_edge(pathway1, pathway2)

Set node size and color based on "#Reactions found"

reactions_counts = {row['Pathway name']: row['#Reactions found'] for index, row in data.iterrows()} max_reactions = max(reactions_counts.values()) min_reactions = min(reactions_counts.values())

Calculate colors for nodes

color_map = [mcolors.to_hex(plt.cm.Reds((reactions_counts.get(node, 0) - min_reactions) / (max_reactions - min_reactions))) for node in G_enhanced.nodes]

Improved layout using the Kamada-Kawai layout algorithm to minimize label overlap

pos_enhanced = nx.kamada_kawai_layout(G_enhanced)

Draw nodes with color mapping based on "#Reactions found"

nx.draw_networkx_nodes(G_enhanced, pos_enhanced, node_size=700, node_color=color_map)

Only label the top 20 pathways

nx.draw_networkx_labels(G_enhanced, pos_enhanced, labels={n: n if n in top_pathways_names else '' for n in G_enhanced.nodes}, font_size=9)

Draw edges

nx.draw_networkx_edges(G_enhanced, pos_enhanced, edge_color='gray', alpha=0.5)

plt.title('Enhanced Top 20 Pathways Connectivity via Shared Genes') plt.axis('off') # Turn off the axis plt.show()

```

Here's selecting one pathway showed all the loci.

All loci included in this analysis were significantly differentially expressed based on the meta-analysis of the Peudo-time-series analysis (OSD-37, OSD-38, OSD-120, OSD-217, OSD-321). The Log2 Fold Change was uploaded to reactome and the expression data project onto the Reactome maps. Some of the significantly differentially expressed pathways are below.

Here’s the scale bar for reactome.

Plant TCA

Hormonal signalling systems were significantly different in flight.