Problem How to find meaningful patterns in noisy microarray data and to formulate hypothesis explaining the observations?
Solution Search for statistically significant changes across appropriately defined groups of genes.
More sensitive and robust approach than looking at independently defined “differentially expressed” genes
G1
G3
Group of entities (genes, metabolites, etc)
v1
Experimental measurements (expression, metabolomics, etc)
{v1,v2,v3} – collection of measured values for all entities in the group
abs (Log ratio)
Distribution on the array (sampling distribution)
Example 1: the collection of observations is random; group is insignificant
frequency
abs (Log ratio)
Distribution on the array (sampling distribution)
Example 2: the observations are (overall) biased, group is significant
Compare the collection of the observed values V={v1, …, vn} to the set of all values S measured in the experiment.
The probability that V is a random sample from S (p-value) quantifies the significance of the group of genes in the experiment.
v1
… vn
Distributions are generally non-Gaussian
Non-parametric representation (rank tests)
All observations, ordered
(e.g. by absolute log-ratio)
v1
… vn
observations within a group
All observations, ordered
(e.g. by absolute log-ratio)
… vn
Group is insignificant
Group is significant (the ranks of measurements from the group are overall high in the ordered list)
significant
insignificant
Target exhibiting large change
To get the distribution expected by chance
Break all the links in the whole network
Corollary: during random network rewiring, each regulator “sees” all the edges ingoing into a particular target T. Probability to randomly reconnect to T is proportional to indegree(T).
Sampling approximation (instead of brute force resampling): Increased probability to reconnect to a “promiscuous” target T = increased probability to observe measurement associated with T downstream.
Proposition (used in NEA): instead of all measurements on the array, use effective sampling distribution, where measured value for each target T is replicated indegree(T) times.
© 2011 Ariadne. All Rights Reserved.
Sub-Network Enrichment Analysis:
as good as your knowledge network database
Higher p-value (less significant)
Molecular networks in microarray analysis. Sivachenko A, Yuryev A, Daraselia N, Mazo I. J Bioinform Comp. Biol. 2007
© 2011 Ariadne. All Rights Reserved.