Tutorial: Statistics for Human Biologists - Terminology and Univariate Statistics

Detlef Groth

doi:10.52905/hbph2025.2.113

Authors

Detlef Groth University of Potsdam, IBB, Bioinformatics, 14469 Potsdam, Germany

DOI:

https://doi.org/10.52905/hbph2025.2.113

Keywords:

statistics, tutorial, descriptive statistics, inferential statistics, p-value, effect size, confidence interval, univariate statistics

Abstract

In the second part of this tutorial series, we will discuss the foundations of univariate statistics. We will define statistical terms such as p-value, confidence interval, and effect size. The latter two are often neglected, but they are more informative than the commonly overinterpreted p-value.

A p-value expresses the likelihood of observing a difference, or a larger, between a sample and a population parameter, or between two groups, assuming they share the same distribution. In univariate statistics, this difference is between a sample and a known population value. The assumption of equal distributions is the null hypothesis, which may be rejected if the p-value is very small. Importantly, the p-value depends not only on the difference itself but also on the sample size. Large samples can yield very small, highly significant p-values even when the effect size is trivial.

In contrast, a confidence interval provides a range for the population parameter based on the sample, giving insight into precision beyond statistical significance. The effect size conveys the magnitude of an effect, which simplifies interpretation and allows to assess the relevance of observed differences.

In this review, we explain these terms in the context of univariate statistics, comparing variables of interest from sample data against a known population value such as a mean or proportion. Relevant measures and visualizations include the mean, standard deviation, coefficient of variation, and histograms, barplots, along with effect size measures like Cohen’s d for numerical data and Cohen’s w for categorical data.

References

Arntsen, S. H./Borch, K. B./Wilsgaard, T./Njølstad, I./Hansen, A. H. (2023). Time trends in body height according to educational level. A descriptive study from the Tromsø Study 1979‐2016. Public Library of Science One 18 (1), e0279965. https://doi.org/10.1371/journal.pone.0279965.

Bateman, David/Eaton, John W./Hauberg, Søren/Wehbring, Rik (2025). GNU Octave version 9.4.0 manual: a high-level interactive language for numerical computations 2025. Available online at https://www.gnu.org/software/octave/doc/interpreter/.

Chotai, J./Wiseman, R. (2005). Born lucky? The relationship between feeling lucky and month of birth. Personality and Individual Differences 39 (8), 1451–1460. https://doi.org/10.1016/j.paid.2005.06.012.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences. New York, Routledge.

Cohen, J. (1992). Quantitative methods in psychology: A Power Primer. Psychological Bulletin 112, 155–159. https://doi.org/10.1037/0033-2909.112.1.155.

Groth, D. (2024). Statistics for Human Biologists-Introduction. Human Biology and Public Health 2. https://doi.org/10.52905/hbph2024.2.92.

Groth, Detlef/Schandler, L. (2025). SBI: package for the course Statistical Bioinformatics at the University of Potsdam 2025. Available online at https://github.com/mittelmark/sbi.

Mood, A. M./Graybill, F. A./Boes, D. C. (1974). Introduction to the Theory of Statistics. Columbus, McGraw-Hill.

Motulsky, H. (2014). Intuitive biostatistics : a nonmathematical guide to statistical thinking. New York, Oxford University Press.

Python Core Team (2025). Python: A dynamic, open source programming language 2025. Available online at https://www.python.org/.

R Core Team (2025). R: A Language and Environment for Statistical Computing. Vienna, Austria 2025. Available online at https://www.R-project.org.

Sil, A./Betkerur, J./Das, N. K. (2019). P-Value Demystified. Indian Dermatology Online Journal 10 (6), 745–750. https://doi.org/10.4103/idoj.idoj_368_19.