Correct me if I'm wrong: Navigating multiple comparison corrections in A/B Testing
Blog post from Statsig
A/B testing frequently encounters the challenge of multiple comparisons, which can inflate false positive rates when conducting multiple hypothesis tests simultaneously. To address this, statistical correction methods such as the Bonferroni correction, Dunnett’s test, and the Benjamini-Hochberg (BH) procedure are employed, each with distinct approaches to controlling error rates. The Bonferroni correction, known for its simplicity, is highly conservative, reducing statistical power by adjusting the significance threshold for each test to maintain the family-wise error rate (FWER) below a specified level. Dunnett’s test, more powerful than Bonferroni in specific contexts, compares multiple treatment groups to a single control group, adjusting for dependencies between hypotheses. The BH procedure, focusing on the false discovery rate (FDR), offers a balance between controlling false positives and maintaining power, making it suitable for exploratory studies and scenarios with many tests. In cases of sequential testing, often arising from peeking at accumulating data, methods like alpha spending functions or the Mixture Sequential Probability Ratio Test (mSPRT) are used to manage the increased risk of Type I errors. Selecting the appropriate method depends on the research context, the importance of controlling false positives, and the need to preserve statistical power.