Reactome analysis of Pseudo-time-series expression matrix FL vs GC
Log2 foldchange data from "FL_vs_GC_OSD-37OSD-38OSD-120OSD-217OSD-321_DGE.csv" was analysed using Plant Reactome (Data in tabel in AWG repo).
Reactome created a detailed list of pathways, reactions, and entities involved in various biological processes, specifically from the plant species Arabidopsis thaliana.
reactome: Identifier for the pathway.
Pathway name: Describes the biological pathway.
#Entities found: The number of entities found in this pathway.
#Entities total: The total number of entities in the pathway.
#Interactors found: The number of interactors found that interact with the entities.
#Interactors total: The total number of interactors in the pathway.
Entities ratio: The ratio of entities found to entities total.
Entities pValue, Entities FDR: Statistical measures indicating the significance and the false discovery rate of the findings.
#Reactions found, #Reactions total: Number of biochemical reactions found versus the total considered.
Reactions ratio: Ratio of reactions found to reactions total.
col1: Unclear column, possibly additional data or a placeholder.
Species identifier: Numerical identifier for Arabidopsis thaliana.
Species name: The scientific name of the plant.
Submitted entities found: List of entities that were found based on the submitted query.
Mapped entities: Entities that could be mapped to known entities in the database.
Submitted entities hit interactor: Entities that interact with the submitted entities.
Interacts with: Describes other entities that interact with the main entity.
Found reaction identifiers: Identifiers for reactions that were found in the database.
Dataset Overview
This dataset includes several columns related to "entities" and "reactions" in a study, each showing a strong tendency towards right-skewness, except for the 'entities_p_value' column which is left-skewed. Below is a simplified overview of each column's summary and their relationships.
General Patterns:
Right-Skewed Distributions: Most columns, except 'entities_p_value', show a pattern where there are a few very high values that pull the average up, while most of the data cluster at the lower end of the scale.
High Correlations: Several pairs of columns are highly correlated, meaning as the value in one column increases, the value in the other column tends to increase in a predictable way.
Reactome Restuls Detailed Summary:
x_entities_found:
Mean: 8.87, SD: 29.2 -> Highly correlated with 'x_entities_total', 'x_interactors_found', 'entities_ratio', 'x_reactions_found', and 'x_reactions_total' (all R > 0.85).
x_entities_total:
Mean: 20.83, SD: 71.5 -> Close relationship especially with 'x_entities_found' and 'entities_ratio' (R = 0.99 and R = 1 respectively).
x_interactors_found:
Mean: 34.98, SD: 92.25 -> Strongly related to 'x_interactors_total' (R=0.99) and has significant but slightly lower correlations with 'x_entities_found' and 'x_entities_total'.
x_interactors_total:
Mean: 127.63, SD: 345.02. -> Very high correlation with 'x_interactors_found' (R=0.99).
entities_ratio:
Mean: 0.02, SD: 0.06
Perfect correlation with 'x_entities_total' (R=1) and very high with others like 'x_entities_found', 'x_reactions_found', and 'x_reactions_total'.
entities_p_value:
Mean: 0.89, SD: 0.2 -> Left-skewed with a moderate correlation to 'entities_fdr' (R=0.52).
entities_fdr:
Mean: 1, SD: 0 -> Symmetric distribution, directly correlated with 'entities_p_value'.
x_reactions_found and x_reactions_total:
Means: 7.9 and 10.81, SDs: 29.62 and 41.01 -> Both showcase right-skewness with very high correlations with 'x_entities_found', 'entities_ratio', among others (R close to or equal to 1).
reactions_ratio:
Mean: 0.01, SD: 0.06 -> Almost perfectly matches the pattern of correlation observed with 'x_reactions_found' and 'x
Selecting the top 20 pathways based on '#Entities found'
```
import matplotlib.pyplot as plt
top_pathways_plot_data = data.sort_values(by='#Entities found', ascending=False).head(20).Creating the bar plot" plt.figure(figsize=(10, 8)) plt.barh(top_pathways_plot_data['Pathway name'],
top_pathways_plot_data['#Entities found'], color='skyblue') plt.xlabel('Number of Entities Found') plt.ylabel('Pathway Name') plt.title('Top 20 Pathways by Number of Entities Found') plt.gca().invert_yaxis() # Invert y-axis to have the pathway with the most entities on top plt.show()
```
Alternatively, rank and plot both the number of reactions found to change compared to those that are known to exist in total. This reveals a broad change in ~75perent of metabolic genes.
Network graph visualizulisation
Network graph visualizing the connections between the "Pathway name" and the associated genes from the "Submitted entities found" field. The sizes of the pathway nodes are determined by the number of entities found, and the edges are colored on a gradient from red (lower entities ratio) to blue (higher entities ratio), with arrows at both ends to indicate bidirectional relationships.
This visualization provides a useful overview of how different genes are interconnected with specific metabolic pathways in Arabidopsis thaliana. Such a graph can help in identifying key pathways with numerous genetic interactions, which could be critical for deeper metabolic studies or genetic manipulation projects.
Correcting the handling of the 'Submitted entities found' field to manage non-string entries
for index, row in data.iterrows(): pathway = row['Pathway name'] entities_found = row['#Entities found'] entities_ratio = row['Entities ratio']
Drawing the corrected graph
Drawing the corrected graph
pos = nx.spring_layout(G) # positions for all nodes
Draw nodes with scaling for pathway nodes
Draw nodes with scaling for pathway nodes
nodes = nx.draw_networkx_nodes(G, pos, node_size=[G.nodes[n]['size'] if 'size' in G.nodes[n] else 100 for n in G.nodes], node_color=['skyblue' if G.nodes[n]['type'] == 'pathway' else 'lightgreen' for n in G.nodes])
Draw edges
Draw edges
edges = G.edges(data=True) nx.draw_networkx_edges(G, pos, edgelist=edges, arrows=True, arrowstyle='->', arrowsize=10, edge_color=[e[2]['color'] for e in edges], style='solid')
Draw labels
Draw labels
nx.draw_networkx_labels(G, pos, font_size=8, font_family='sans-serif')
plt.title('Pathway-Gene Network') plt.axis('off') # Turn off the axis plt.show()
```
```
Extract names of the top 20 pathways from the earlier bar plot data
top_pathways_names = set(top_pathways_plot_data['Pathway name'])
Creating a new graph
G_top_pathways = nx.Graph()
Reuse the pathway_genes dictionary to add nodes and check for shared genes
for pathway1, genes1 in pathway_genes.items(): for pathway2, genes2 in pathway_genes.items(): if pathway1 != pathway2 and not genes1.isdisjoint(genes2): G_top_pathways.add_edge(pathway1, pathway2)
Drawing the graph
pos_top = nx.spring_layout(G_top_pathways) # positions for all nodes
Draw nodes, only labeling the top 20 pathways
nx.draw_networkx_nodes(G_top_pathways, pos_top, node_size=700, node_color='skyblue') nx.draw_networkx_labels(G_top_pathways, pos_top, labels={n: n if n in top_pathways_names else '' for n in G_top_pathways.nodes}, font_size=9)
Draw edges
nx.draw_networkx_edges(G_top_pathways, pos_top, edge_color='gray', alpha=0.5)
plt.title('Top 20 Pathways Connectivity via Shared Genes') plt.axis('off') # Turn off the axis plt.show()
```
```
import matplotlib.colors as mcolors
Getting the range for "#Reactions found" to set the color scale
reactions_range = data['#Reactions found'].max() - data['#Reactions found'].min()
Creating a new graph with enhanced layout to minimize label overlap
G_enhanced = nx.Graph()
Add nodes and edges as before, but now with additional attributes for visualization
for pathway1, genes1 in pathway_genes.items(): for pathway2, genes2 in pathway_genes.items(): if pathway1 != pathway2 and not genes1.isdisjoint(genes2): G_enhanced.add_edge(pathway1, pathway2)
Set node size and color based on "#Reactions found"
reactions_counts = {row['Pathway name']: row['#Reactions found'] for index, row in data.iterrows()} max_reactions = max(reactions_counts.values()) min_reactions = min(reactions_counts.values())
Calculate colors for nodes
color_map = [mcolors.to_hex(plt.cm.Reds((reactions_counts.get(node, 0) - min_reactions) / (max_reactions - min_reactions))) for node in G_enhanced.nodes]
Improved layout using the Kamada-Kawai layout algorithm to minimize label overlap
pos_enhanced = nx.kamada_kawai_layout(G_enhanced)
Draw nodes with color mapping based on "#Reactions found"
nx.draw_networkx_nodes(G_enhanced, pos_enhanced, node_size=700, node_color=color_map)
Only label the top 20 pathways
nx.draw_networkx_labels(G_enhanced, pos_enhanced, labels={n: n if n in top_pathways_names else '' for n in G_enhanced.nodes}, font_size=9)
Draw edges
nx.draw_networkx_edges(G_enhanced, pos_enhanced, edge_color='gray', alpha=0.5)
plt.title('Enhanced Top 20 Pathways Connectivity via Shared Genes') plt.axis('off') # Turn off the axis plt.show()
```
Here's selecting one pathway showed all the loci.
All loci included in this analysis were significantly differentially expressed based on the meta-analysis of the Peudo-time-series analysis (OSD-37, OSD-38, OSD-120, OSD-217, OSD-321). The Log2 Fold Change was uploaded to reactome and the expression data project onto the Reactome maps. Some of the significantly differentially expressed pathways are below.
Here’s the scale bar for reactome.
Plant TCA
Hormonal signalling systems were significantly different in flight.
Last updated