Supplementary material for the paper:

An Information and Combinatorial Theories-Based Supervised Learning Framework for Integrative Inference and Analysis of Genetic Regulatory Networks

Binhua Tang1, Xuechen Wu2, Su-Shing Chen3, Qing Jing4 and Bairong Shen5,§

1 Department of Bioinformatics, Tongji University, Shanghai, China

2 Institute of Protein Research, Tongji University, Shanghai, China

3 CAS-MPG Partner Institute of Computational Biology, Shanghai, China

4 Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, China

5 Center for Systems Biology, Soochow University, Suzhou, China

§Corresponding authorbairong.shen@suda.edu.cn

 

DS 1. The Synthetic Dataset from a Typical Mammalian Cell Cycle Pathway

MI

Figure 1-A. The calculated mutual information matrix of 36 gene pairs among 9 genes from the mammalian G1/S cell cycle transition network. The mutual information matrix’s axes are numbered with GE No. 1 to GE No. 9, representing the above species pRB, E2F1, CycDi, CycDa, AP-1, pRBp, pRBpp, CycEi and CycEa respectively. The diagonal elements are all equal to one since mutual information is maximized for measuring the same variables. Furthermore the mutual information matrix is symmetric (i.e. I(X;Y)=I(Y;X)) and nonnegative (i.e. I(X;Y)≥0), thus the above mutual information matrix is symmetric and all elements are nonnegative.

 

D:\Program Files\MATLAB\work\08.MICORPS\DS1_Swat\s02_MI_3.tif

Figure 1-B. The descending-sorted mutual information, correlation coefficients and corresponding P-values statistics for the total pairwise candidates of the mammalian cell cycle pathway. The upper subplot is for descending-sorted mutual information of total 36 gene pairs, and lower one for the descending-sorted correlation coefficients and corresponding P-values. As indicated by the vertical dotted line in the lower plot, there are totally 16 pairs with their P-values smaller than 0.05.

 

MI_CC_PGtot_2

Figure 1-C. Associativity measure statistics for the APGs and QPGs groups in DS1. Based on the MICORPS concepts defined in the methodology section.

 

Table 1. Authentic Pairwise Genes from Dataset 1

No.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

R

2

2

2

2

3

3

4

8

6

3

1

5

4

7

C

6

4

5

3

4

5

5

9

9

9

3

6

6

9

          * R: Row, C: Column

 

Table 2. Questionable Pairwise Genes from Dataset 1

No.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

R

2

3

5

1

1

4

2

6

5

1

4

2

3

3

7

C

9

7

7

7

4

7

7

7

9

5

9

8

8

6

8

             * R: Row, C: Column

 

 


 

DS 2. The Cell Cycle Microarray Dataset under the Elutriation Treatment

s02_MI

Figure 2-A. The calculated mutual information matrix for 276 gene pairs from the 24 cell-cycle genes. The diagonal elements are all equal to one since mutual information is maximized for measuring the totally same variables. The mutual information matrix is nonnegative (I(X;Y)≥0) and symmetric (I(X;Y) = I(Y;X)).

 

D:\Program Files\MATLAB\work\08.MICORPS\DS2_LNChen\s02_MI_3.tif

Figure 2-B. The descending-order sorted mutual information, correlation coefficients and corresponding P-values statistics for the total pairwise candidates of the cell cycle regulatory network. As indicated by the vertical dotted line in the lower plot, there are totally 105 pairs with their P-values smaller than 0.05.

 

Confidence_2

Figure 2-C. The calculated P-value statistical confidence areas for 276 pairwise gene samples with respect to the correlation coefficient and mutual information. For the upper plot, the blue dots represent the sorted correlation coefficients for pairwise gene samples; the khaki area illustrates the percentage variation for selected samples satisfying specific confidence criterion. Here the confidence criterion is defined as the percentage of pair samples with their P-values smaller than 0.05 among all currently selected samples under discussion. The lower graph illustrates the relationship between the confidence variation and the descending-sorted mutual information of pair samples. The red dotted line is for mutual information values, and the reseda area denotes the relative confidence statistics.

 

D:\Program Files\MATLAB\work\08.MICORPS\DS2_LNChen\115_4.bmp

Figure 2-D. The three-dimensional graph for authentic (APGs), questionable (QPGs), and unauthentic pairwise genes (UPGs) under different thresholds of mutual information and correlation coefficient. The related P-value adopts 0.05. Totally, there are 276 pairs among 24 genes for the cell cycle regulatory network. The horizontal axis represents different mutual information thresholds, and the vertical axis for the correlation coefficient.

 

D:\Program Files\MATLAB\work\08.MICORPS\DS2_LNChen\ConnectivityMeasure_1.bmp

Figure 2-E. Associativity measure statistics for the APGs group in DS2. Based on the MICORPS concepts defined in the methodology section.

 

 

D:\Program Files\MATLAB\work\08.MICORPS\DS2_LNChen\Phaseshift_1.tif

Figure 2-F. The Phase-shift statistics for the APGs group (totally 83 pairwise genes, sorted according to descending mutual information values of each pair), calculated based on the signal processing concepts defined above. The red part (+1) represents the leading phase shift for the related pairwise genes, the black (-1) for the lagging phase shift, and the white for those pairs without any phase shift under specific gain thresholds.

 

DS 3. The Microarray Dataset of a p53 Pathway with Multiple Feedback Loops

D:\Program Files\MATLAB\work\08.MICORPS\DS3\S02_MI.tif

Figure 3-A. Mutual information matrix for the triplicate MOTL4 microarray experiments, implemented under irradiation from 0 to 12 hours at intervals of 2 hours.

 

D:\Program Files\MATLAB\work\08.MICORPS\DS3_BSChen\S02_MI_3.tif

Figure 3-B. The descending-sorted mutual information, correlation coefficients and corresponding P-values statistics for the total pairwise candidates of the multi-feedback p53 pathway. The mutual information statistics are of the homogeneous distribution among the range between 0.3134 and 1, while note that the Pearson correlation statistics only have 10 candidates with P-values below 0.05, indicated with the vertical dashed line in the lower subgraph.

 

 

D:\Program Files\MATLAB\work\08.MICORPS\DS3_BSChen\s02_MI_Confidence.tif

Figure 3-C. The calculated P-value statistical confidence areas for 120 pairwise samples via dynamic thresholding with respect to the correlation coefficient and mutual information. For the upper plot, the blue dots represent the sorted correlation coefficients for pairwise gene samples; the khaki area illustrates the percentage variation for selected samples satisfying specific confidence criterion. Here the confidence criterion is defined as the percentage of pair samples with their P-values smaller than 0.05 among all currently selected samples under discussion. The lower graph illustrates the relationship between the confidence variation and the descending-sorted mutual information of pair samples. The red dotted line is for mutual information values, and the reseda area denotes the relative confidence statistics.

 

D:\Program Files\MATLAB\work\08.MICORPS\DS3_BSChen\s06_MAT_MI_PC_p02_2_0.tif

Figure 3-D. The calculated statistics of the authentic (APGs), questionable (QPGs), and unauthentic (UPGs) groups under different thresholds of mutual information and correlation coefficients. The related P-value adopts 0.8 to ensure enough candidates in the APGs for the network-reconstruction.

 

D:\Program Files\MATLAB\work\08.MICORPS\DS3_BSChen\s08_Imagesc_Sign_Fun_2.tif

Figure 3-E. The Phase-shift statistics for the APGs group (totally 55 gene pairs, sorted according to descending mutual information values of each pair), calculated based on the signal processing concepts defined above. The red part (+1) represents the leading phase shift for the related pairwise genes, the black (-1) for the lagging phase shift, and the white for those pairs without any phase shift under specific gain thresholds.

 

D:\Program Files\MATLAB\work\08.MICORPS\DS3_BSChen\s08_Imagesc_Sign_Fun.tif

Figure 3-F. Associativity measure statistics for the APGs group in DS3. Based on the MICORPS concepts defined in the methodology section.

 

D:\Program Files\MATLAB\work\08.MICORPS\DS3_BSChen\GRN_DS3_BSChen\DS3_g1.tif

Figure 3-G. The constructed genetic graph with gain threshold at 1. As depicted in the graph, #5 (cdk2) and #6 (Rb) are the weak-connected nodes, #3 (MDM2) and #10 (β-catenin), etc. are the strong-connected ones under the current gain threshold.