Understanding (and reducing) variance and standard deviation
Blog post from Statsig
Uncertainty is a constant in data analysis, and statistics, particularly standard deviation and variance, are key tools for quantifying this uncertainty. Standard deviation measures how much data varies around the mean, with a low standard deviation indicating data points close to the mean and a high standard deviation indicating wide dispersion. This measure, along with the central limit theorem and certain assumptions, allows for assessing probabilities and establishing confidence intervals useful in polling, quality measurement, A/B testing, and risk assessment. Techniques such as filtering, winsorization, capping, CUPED, and thresholding help manage outliers and reduce standard deviation to improve the reliability of conclusions drawn from data. For instance, outlier management methods like winsorization or capping can significantly reduce the influence of extreme values on variance and standard deviation, thereby enhancing the accuracy of experiment results. The combination of these methods, often encouraged in practical applications, can lead to clearer insights and more decisive outcomes when testing hypotheses or conducting randomized controlled trials.