Improving ramification detection of St. Nicolas House Analysis

A combination approach

Authors

DOI:

https://doi.org/10.52905/hbph2024.1.81

Keywords:

St. Nicolas Analysis, snha, network reconstruction, R-squared gaining, linear model check, graph estimation

Abstract

The St. Nicolas House Analysis (SNHA) is a new graph estimation method for detection of extensive interactions among variables. It operates by ranking absolute bivariate correlation coefficients in descending order thereby creating hierarchic association chains. The latter characterizes dependence structures of interacting variables which can be visualized in a corresponding network graph as a chain of end-to-end connected edges representing direct relationships between the connected nodes. The important advantage of this relatively new approach is that it produces less false positive edges resulting from indirect or transitive associations than expected with standard correlation or linear model-based approaches. Here we aim to improve the detection of ramifications in graphs by addition of different data processing layers to SNHA. They include the combinations of the extensions R-squared gaining(RSG) and linear model check(LMC).
SNHA together with these so-called extensions were benchmarked against default SNHA and other reference methods available for the programming language R. In the end combinations of RSG, LMC and Bootstrapping improve SNHA performance across different network types, albeit at the cost of longer computation time.

References

Barabasi, A. L./Albert, R. (1999). Emergence of scaling in random networks. Science 286 (5439), 509–512. https://doi.org/10.1126/science.286.5439.509. DOI: https://doi.org/10.1126/science.286.5439.509

Bekkar, M./Djemaa, H. K./Alitouche, T. A. (2013). Evaluation measures for models assessment over imbalanced data sets. Journal of Information Engineering and Applications 3, 27–38. Available online at https://api.semanticscholar.org/CorpusID:52267786.

Bicego, M./Mensi, A. (2023). Null/No Information Rate (NIR): a statistical test to assess if a classification accuracy is significant for a given problem, 2023. Available online at http://arxiv.org/pdf/2306.06140.

Bodenberger, B. Improved network reconstruction using resampling methods. Project work thesis at University of Potsdam. Potsdam.

Bozdogan, H. (1987). Model selection and Akaike’s Information Criterion (AIC): The general theory and its analytical extensions. Psychometrika 52 (3), 345–370. https://doi.org/10.1007/BF02294361. DOI: https://doi.org/10.1007/BF02294361

Chen, S./Mar, J. C. (2018). Evaluating methods of inferring gene regulatory networks highlights their lack of performance for single cell gene expression data. BMC Bioinformatics 19 (1), 232. https://doi.org/10.1186/s12859-018-2217-z. DOI: https://doi.org/10.1186/s12859-018-2217-z

Chicco, D./Jurman, G. (2020). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21 (1), 6. https://doi.org/10.1186/s12864-019-6413-7. DOI: https://doi.org/10.1186/s12864-019-6413-7

Diestel, R. (2017). Graph theory. 0072-5285. https://doi.org/10.1007/978-3-662-53622-3. DOI: https://doi.org/10.1007/978-3-662-53622-3

Epskamp, S./Cramer, A. O./Waldorp, L. J./Schmittmann, V. D./Borsboom, D. (2012). qgraph: Network visualizations of relationships in psychometric data. Journal of Statistical Software 48 (4), 1–18. DOI: https://doi.org/10.18637/jss.v048.i04

Filosi, M./Visintainer, R./Riccadonna, S./Jurman, G./Furlanello, C. (2014). Stability indicators in network reconstruction. PLOS ONE 9 (2), 1–24. https://doi.org/10.1371/journal.pone.0089815. DOI: https://doi.org/10.1371/journal.pone.0089815

Friedman, J./Hastie, T./Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 (3), 432–441. https://doi.org/10.1093/biostatistics/kxm045. DOI: https://doi.org/10.1093/biostatistics/kxm045

García, V./Mollineda, R. A./Sánchez, J. S. (2009). Index of balanced accuracy: A performance measure for skewed class distributions. In: Helder Araujo/Ana Maria Mendonça/Armando J. Pinho et al. (Eds.). Pattern recognition and image analysis. Berlin, Heidelberg, Springer Berlin Heidelberg, 441–448. DOI: https://doi.org/10.1007/978-3-642-02172-5_57

Groth, D. (2022). Asg: Package for generating correlation networks based on association chains. Available online at https://github.com/mittelmark/snha-gui.

Groth, D. (2023). SNHA: Package for generating correlation networks based on association chains. Available online at https://github.com/mittelmark/snha/tree/main.

Groth, D./Scheffler, C./Hermanussen, M. (2019). Body height in stunted Indonesian children depends directly on parental education and not via a nutrition mediated pathway - Evidence from tracing association chains by St. Nicolas House Analysis. Anthropol Anz 76 (5), 445–451. https://doi.org/10.1127/anthranz/2019/1027. DOI: https://doi.org/10.1127/anthranz/2019/1027

Hemelrijk, C. K. (1990). A matrix partial correlation test used in investigations of reciprocity and other social interaction patterns at group level. Journal of Theoretical Biology 143 (3), 405–420. https://doi.org/10.1016/S0022-5193(05)80036-0. DOI: https://doi.org/10.1016/S0022-5193(05)80036-0

Hermanussen, M./Aßmann, C./Groth, D. (2021). Chain reversion for detecting associations in interacting variables—St. Nicolas House Analysis. International Journal of Environmental Research and Public Health 18 (4), 1741. Available online at https://www.mdpi.com/1660-4601/18/4/1741. DOI: https://doi.org/10.3390/ijerph18041741

Huynh-Thu, V. A./Irrthum, A./Wehenkel, L./Geurts, P. (2010). Inferring regulatory networks from expression data using tree-based methods. PLOS ONE 5 (9), 1–10. https://doi.org/10.1371/journal.pone.0012776. DOI: https://doi.org/10.1371/journal.pone.0012776

Jiang, H./Fei, X./Liu, R./Roeder, K./Lafferty, J./Wasserman, L./Li, X./Zhao, T. (2021). Huge: High-dimensional undirected graph estimation.

Krivitsky, P. N./Hunter, D. R./Morris, M./Klumb, C. (2023). ergm 4: New features for analyzing exponential-family random graph models. Journal of Statistical Software 105 (6), 1–44. https://doi.org/10.18637/jss.v105.i06. DOI: https://doi.org/10.18637/jss.v105.i06

Logsdon, B. A./Mezey, J. (2010). Gene expression network reconstruction by convex feature selection when incorporating genetic perturbations. PLOS Computational Biology 6 (12), 1–13. https://doi.org/10.1371/journal.pcbi.1001014. DOI: https://doi.org/10.1371/journal.pcbi.1001014

Marks, D. S./Colwell, L. J./Sheridan, R./Hopf, T. A./Pagnani, A./Zecchina, R./Sander, C. (2011). Protein 3D structure computed from evolutionary sequence variation. PLOS ONE 6 (12), 1–20. https://doi.org/10.1371/journal.pone.0028766. DOI: https://doi.org/10.1371/journal.pone.0028766

Meinshausen, N./Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. The Annals of Statistics 34 (3), 1436–1462. https://doi.org/10.1214/009053606000000281. DOI: https://doi.org/10.1214/009053606000000281

Miles, J. (2005). R-squared, adjusted R-squared. In: Brian Everitt/David Howell (Eds.). Encyclopedia of statistics in behavioral science. John Wiley & Sons, Ltd.

Moris, C. (2023). Improving ramification detection of St. Nicolas House Analysis. Project work thesis at University of Potsdam.

Novine, M./Mattsson, C. C./Groth, D. (2022). Network reconstruction based on synthetic data generated by a Monte Carlo approach. Human Biology and Public Health 3. https://doi.org/10.52905/hbph2021.3.26. DOI: https://doi.org/10.52905/hbph2021.3.26

R Core Team (2022). R: A Language and Environment for Statistical Computing. Vienna, Austria. Available online at https://www.R-project.org/.

Tasaki, S./Sauerwine, B./Hoff, B./Toyoshiba, H./Gaiteri, C./Chaibub Neto, E. (2015). Bayesian network reconstruction using systems genetics data: Comparison of MCMC methods. Genetics 199 (4), 973–989. https://doi.org/10.1534/genetics.114.172619. DOI: https://doi.org/10.1534/genetics.114.172619

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58 (1), 267–288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x. DOI: https://doi.org/10.1111/j.2517-6161.1996.tb02080.x

Wu, S.-H./Chen, K.-L./Hsu, C./Chen, H.-C./Chen, J.-Y./Yu, S.-Y./Shiu, Y. (2022). Creatine supplementation for muscle growth: A scoping review of randomized clinical trials from 2012 to 2021. Nutrients 14 (6). https://doi.org/10.3390/nu14061255. DOI: https://doi.org/10.3390/nu14061255

Zhao, T./Liu, H./Roeder, K./Lafferty, J./Wasserman, L. (2020). The huge package for high-dimensional undirected graph estimation in R.

Downloads

Published

2024-07-08

How to Cite

Chen, S., Moris, C., & Groth, D. (2024). Improving ramification detection of St. Nicolas House Analysis: A combination approach. Human Biology and Public Health, 1. https://doi.org/10.52905/hbph2024.1.81

Issue

Section

International Student Summer School