Question 1

What's the difference between variance and standard deviation?

Accepted Answer

Variance measures the average squared deviation from the mean, expressed in squared units (e.g., dollars², cm²), which can be less intuitive to interpret. Standard deviation is simply the square root of variance and expresses variability in the same units as your original data, making it easier to understand. For example, if you're measuring heights in centimeters with a variance of 100 cm², the standard deviation is 10 cm—meaning heights typically vary by about 10 cm from the average. Both measure spread, but standard deviation is preferred for interpretation because it's in meaningful units. In statistical formulas and calculations, variance is often used because it has nice mathematical properties (e.g., variances add when combining independent random variables).

Question 2

What does skewness indicate?

Accepted Answer

Skewness measures the asymmetry of a data distribution, indicating whether values are concentrated on one side with a tail extending in the opposite direction. Positive skewness (right-skewed) means the distribution has a longer tail on the right side, with most values clustered on the left and the mean greater than the median—common in income distributions, house prices, and response times. Negative skewness (left-skewed) has a longer left tail with values clustered on the right and mean less than median—seen in test scores with a ceiling effect or age at retirement. Zero or near-zero skewness indicates a symmetric distribution like the normal distribution, where mean ≈ median ≈ mode. Skewness values |skew| < 0.5 are considered fairly symmetric, 0.5-1.0 are moderately skewed, and > 1.0 are highly skewed. Understanding skewness helps choose appropriate statistical methods—for example, the median is often preferred over the mean for skewed data.

Question 3

What does kurtosis mean?

Accepted Answer

Kurtosis measures how heavy or light the tails of a distribution are compared to a normal distribution, indicating the likelihood of extreme values (outliers). A normal distribution has kurtosis = 3 (or excess kurtosis = 0). High kurtosis (> 3, or excess > 0) indicates 'heavy tails' with more extreme values and a sharper peak than normal—common in financial returns, where rare but extreme events occur. Low kurtosis (< 3, or excess < 0) indicates 'light tails' with fewer outliers and a flatter distribution than normal. Platykurtic (low kurtosis) distributions have values clustered near the mean with few extremes, like uniform distributions. Leptokurtic (high kurtosis) distributions have long tails and many outliers, requiring robust statistical methods. In practice, kurtosis helps assess risk—high kurtosis in stock returns means more frequent crashes or booms than a normal model would predict.

Question 4

When should I use weighted statistics?

Accepted Answer

Use weighted statistics when different observations contribute unequally to the analysis—i.e., some data points have more importance, frequency, or reliability than others. Common scenarios include: (1) Grade Point Average (GPA): weight grades by credit hours (a 4-credit A counts more than a 1-credit A). (2) Survey data: weight responses by the number of people each respondent represents (e.g., demographic surveys where one respondent represents 1000 people in that group). (3) Investment portfolios: weight returns by the dollar amount invested in each asset. (4) Quality scores: weight ratings by reliability or confidence (e.g., expert opinions weighted higher than novice). (5) Grouped data: when you have frequency counts rather than raw values (e.g., 10 students scored 85, 15 scored 90). The weighted mean formula is Σ(w_i · x_i) / Σw_i, where w_i are the weights. Using regular (unweighted) statistics when weights matter can produce misleading averages that don't reflect the true center of your data.

Question 5

What's considered an outlier in this tool?

Accepted Answer

This tool uses two complementary methods to detect outliers: (1) Z-score method: A value is flagged as an outlier if it's more than 3 standard deviations away from the mean (|z| > 3). This works well for approximately normal distributions and identifies extreme values in terms of standard deviations. For example, if mean = 100, σ = 10, then values below 70 or above 130 are outliers. (2) IQR (Interquartile Range) method: A value is an outlier if it's below Q1 - 1.5×IQR or above Q3 + 1.5×IQR, where IQR = Q3 - Q1. This method is robust to non-normal distributions and doesn't assume any particular shape. The tool reports outliers detected by either method. Not all outliers are errors—some represent genuine extreme cases (e.g., a billionaire in income data, a genius in IQ scores). Investigate outliers to determine if they're measurement errors, data entry mistakes, or valid extreme observations that deserve special attention or separate analysis.

Question 6

How do I interpret Cohen's d?

Accepted Answer

Cohen's d is a standardized effect size that measures the difference between two group means in units of standard deviation: d = (mean₁ - mean₂) / pooled_σ. It's 'standardized' because it's unit-free, making it comparable across different measurements and studies. Interpretation guidelines: |d| < 0.2 = negligible difference (groups are nearly identical), 0.2-0.5 = small effect (noticeable but subtle difference), 0.5-0.8 = medium effect (moderate practical significance), > 0.8 = large effect (substantial difference that's obvious in practice). For example, d = 0.5 means the groups differ by half a standard deviation—a medium effect where about 69% of one group scores above the mean of the other group. Cohen's d is used to assess practical significance beyond statistical significance—a statistically significant p-value with d = 0.1 may not be meaningful in practice, while d = 0.9 indicates a large real-world difference even if the sample is small. In clinical trials, d > 0.5 often indicates clinically significant improvement.

Metric	Meaning & Interpretation
Mean (μ)	Central average of the dataset. Sensitive to outliers. Use when data is symmetric without extreme values. Represents the "center of mass" of the distribution.
Median	Middle value when sorted. Robust to outliers. Preferred for skewed distributions (income, house prices, response times). 50th percentile of the data.
Mode	Most frequent value(s). Can be multiple (bimodal, multimodal) or none (all unique). Useful for categorical data and identifying common responses or patterns.
Variance (σ²)	Average squared distance from the mean. Units are squared (less intuitive). Measures spread; higher variance = more dispersed data. Used in statistical tests and formulas.
Std Dev (σ)	Typical distance from mean (√variance). Same units as data. ~68% within ±1σ, ~95% within ±2σ for normal distributions. Smaller σ = more consistent, larger σ = more variable.
Skewness	Direction and degree of asymmetry. Positive (right-skewed): tail on right, mean > median. Negative (left-skewed): tail on left, mean < median. Zero: symmetric distribution.
Kurtosis	Tail heaviness. > 3 = heavy tails, more outliers than normal. < 3 = light tails, fewer outliers. = 3 = normal distribution (excess kurtosis = 0).
Outliers	Extreme values beyond expected range. Detected via z-score (±3σ) or IQR (Q1-1.5×IQR, Q3+1.5×IQR). Investigate for errors or genuine extremes requiring special treatment.
Cohen's d	Standardized effect size between two datasets: (mean₁ - mean₂) / pooled σ. \|d\| < 0.2 = negligible, 0.2-0.5 = small, 0.5-0.8 = medium, > 0.8 = large difference.
Weighted Mean	Average adjusted by weights: Σ(w_i·x_i) / Σw_i. Use when observations have different importance (GPA by credits, survey by sample size).

Metric	Meaning & Interpretation
Mean (μ)	Central average of the dataset. Sensitive to outliers. Use when data is symmetric without extreme values. Represents the "center of mass" of the distribution.
Median	Middle value when sorted. Robust to outliers. Preferred for skewed distributions (income, house prices, response times). 50th percentile of the data.
Mode	Most frequent value(s). Can be multiple (bimodal, multimodal) or none (all unique). Useful for categorical data and identifying common responses or patterns.
Variance (σ²)	Average squared distance from the mean. Units are squared (less intuitive). Measures spread; higher variance = more dispersed data. Used in statistical tests and formulas.
Std Dev (σ)	Typical distance from mean (√variance). Same units as data. ~68% within ±1σ, ~95% within ±2σ for normal distributions. Smaller σ = more consistent, larger σ = more variable.
Skewness	Direction and degree of asymmetry. Positive (right-skewed): tail on right, mean > median. Negative (left-skewed): tail on left, mean < median. Zero: symmetric distribution.
Kurtosis	Tail heaviness. > 3 = heavy tails, more outliers than normal. < 3 = light tails, fewer outliers. = 3 = normal distribution (excess kurtosis = 0).
Outliers	Extreme values beyond expected range. Detected via z-score (±3σ) or IQR (Q1-1.5×IQR, Q3+1.5×IQR). Investigate for errors or genuine extremes requiring special treatment.
Cohen's d	Standardized effect size between two datasets: (mean₁ - mean₂) / pooled σ. \|d\| < 0.2 = negligible, 0.2-0.5 = small, 0.5-0.8 = medium, > 0.8 = large difference.
Weighted Mean	Average adjusted by weights: Σ(w_i·x_i) / Σw_i. Use when observations have different importance (GPA by credits, survey by sample size).

Descriptive Statistics Calculator