Descriptive Statistics Calculator
Compute mean, median, mode, variance, standard deviation, skewness, kurtosis, and outliers. Compare datasets or calculate weighted statistics with visual charts.
Compute mean, median, mode, variance, standard deviation, skewness, kurtosis, and outliers. Compare datasets or calculate weighted statistics with visual charts.
Descriptive statistics summarize and describe the key characteristics of a dataset, providing insights into its center (central tendency), spread (variability), and shape (distribution). These fundamental measures are essential for understanding data before applying deeper inferential or predictive techniques in business analytics, scientific research, education, and decision-making.
Outliers are data points that significantly differ from the majority of observations. They can indicate measurement errors, data entry mistakes, or genuine extreme values that deserve special attention. Common detection methods include:
This calculator provides comprehensive statistical analysis with three modes to suit different needs. Follow these steps:
10, 12, 15, 20, 2585 90 78 92 88Pro Tip: When comparing datasets, always examine both the numerical differences (Cohen's d) and visual distributions (box plots, histograms). A large effect size with overlapping distributions may still have practical significance, while a small effect size with non-overlapping distributions could be statistically significant.
The calculator presents results in multiple formats to aid interpretation. Here's how to read each metric:
| Metric | Meaning & Interpretation |
|---|---|
| Mean (μ) | Central average of the dataset. Sensitive to outliers. Use when data is symmetric without extreme values. Represents the "center of mass" of the distribution. |
| Median | Middle value when sorted. Robust to outliers. Preferred for skewed distributions (income, house prices, response times). 50th percentile of the data. |
| Mode | Most frequent value(s). Can be multiple (bimodal, multimodal) or none (all unique). Useful for categorical data and identifying common responses or patterns. |
| Variance (σ²) | Average squared distance from the mean. Units are squared (less intuitive). Measures spread; higher variance = more dispersed data. Used in statistical tests and formulas. |
| Std Dev (σ) | Typical distance from mean (√variance). Same units as data. ~68% within ±1σ, ~95% within ±2σ for normal distributions. Smaller σ = more consistent, larger σ = more variable. |
| Skewness | Direction and degree of asymmetry. Positive (right-skewed): tail on right, mean > median. Negative (left-skewed): tail on left, mean < median. Zero: symmetric distribution. |
| Kurtosis | Tail heaviness. > 3 = heavy tails, more outliers than normal. < 3 = light tails, fewer outliers. = 3 = normal distribution (excess kurtosis = 0). |
| Outliers | Extreme values beyond expected range. Detected via z-score (±3σ) or IQR (Q1-1.5×IQR, Q3+1.5×IQR). Investigate for errors or genuine extremes requiring special treatment. |
| Cohen's d | Standardized effect size between two datasets: (mean₁ - mean₂) / pooled σ. |d| < 0.2 = negligible, 0.2-0.5 = small, 0.5-0.8 = medium, > 0.8 = large difference. |
| Weighted Mean | Average adjusted by weights: Σ(wi·xi) / Σwi. Use when observations have different importance (GPA by credits, survey by sample size). |
Visual Guides: Histograms show the distribution shape and frequency of values. Box plots display median, quartiles, and outliers at a glance. Comparison charts overlay two datasets for direct visual comparison of centers and spreads.
• Data Quality: Descriptive statistics summarize the data you provide—they cannot detect data entry errors, measurement errors, or sampling bias. "Garbage in, garbage out" applies rigorously.
• Sample vs. Population: Results reflect your dataset, which may be a sample from a larger population. Sample statistics are estimates of population parameters and carry inherent uncertainty not captured by descriptive measures alone.
• Outlier Sensitivity: The mean and standard deviation are sensitive to outliers. A single extreme value can dramatically shift these statistics. Always examine data visually and consider robust alternatives (median, IQR) when outliers are present.
• Distribution Shape: Summary statistics can mask important distributional features. Two datasets with identical mean, median, and standard deviation can have completely different shapes. Always visualize your data.
Important Note: This calculator is strictly for educational and informational purposes only. It does not provide professional data analysis, research validation, or statistical consulting. Descriptive statistics are a starting point—they describe but do not explain patterns in data. Results should be verified using professional statistical software (R, Python pandas, SAS, SPSS, Excel) for any research, business, or academic applications. Always consult qualified data analysts or statisticians for important analytical decisions, especially when statistical summaries inform medical research, business strategy, policy decisions, or scientific conclusions.
The statistical formulas and concepts used in this calculator are based on established statistical theory and authoritative academic sources:
Common questions about variance, standard deviation, skewness, kurtosis, weighted statistics, outliers, and effect sizes.
Variance measures the average squared deviation from the mean, expressed in squared units (e.g., dollars², cm²), which can be less intuitive to interpret. Standard deviation is simply the square root of variance and expresses variability in the same units as your original data, making it easier to understand. For example, if you're measuring heights in centimeters with a variance of 100 cm², the standard deviation is 10 cm—meaning heights typically vary by about 10 cm from the average. Both measure spread, but standard deviation is preferred for interpretation because it's in meaningful units. In statistical formulas and calculations, variance is often used because it has nice mathematical properties (e.g., variances add when combining independent random variables).
Skewness measures the asymmetry of a data distribution, indicating whether values are concentrated on one side with a tail extending in the opposite direction. Positive skewness (right-skewed) means the distribution has a longer tail on the right side, with most values clustered on the left and the mean greater than the median—common in income distributions, house prices, and response times. Negative skewness (left-skewed) has a longer left tail with values clustered on the right and mean less than median—seen in test scores with a ceiling effect or age at retirement. Zero or near-zero skewness indicates a symmetric distribution like the normal distribution, where mean ≈ median ≈ mode. Skewness values |skew| < 0.5 are considered fairly symmetric, 0.5-1.0 are moderately skewed, and > 1.0 are highly skewed. Understanding skewness helps choose appropriate statistical methods—for example, the median is often preferred over the mean for skewed data.
Kurtosis measures how heavy or light the tails of a distribution are compared to a normal distribution, indicating the likelihood of extreme values (outliers). A normal distribution has kurtosis = 3 (or excess kurtosis = 0). High kurtosis (> 3, or excess > 0) indicates 'heavy tails' with more extreme values and a sharper peak than normal—common in financial returns, where rare but extreme events occur. Low kurtosis (< 3, or excess < 0) indicates 'light tails' with fewer outliers and a flatter distribution than normal. Platykurtic (low kurtosis) distributions have values clustered near the mean with few extremes, like uniform distributions. Leptokurtic (high kurtosis) distributions have long tails and many outliers, requiring robust statistical methods. In practice, kurtosis helps assess risk—high kurtosis in stock returns means more frequent crashes or booms than a normal model would predict.
Use weighted statistics when different observations contribute unequally to the analysis—i.e., some data points have more importance, frequency, or reliability than others. Common scenarios include: (1) Grade Point Average (GPA): weight grades by credit hours (a 4-credit A counts more than a 1-credit A). (2) Survey data: weight responses by the number of people each respondent represents (e.g., demographic surveys where one respondent represents 1000 people in that group). (3) Investment portfolios: weight returns by the dollar amount invested in each asset. (4) Quality scores: weight ratings by reliability or confidence (e.g., expert opinions weighted higher than novice). (5) Grouped data: when you have frequency counts rather than raw values (e.g., 10 students scored 85, 15 scored 90). The weighted mean formula is Σ(w_i · x_i) / Σw_i, where w_i are the weights. Using regular (unweighted) statistics when weights matter can produce misleading averages that don't reflect the true center of your data.
This tool uses two complementary methods to detect outliers: (1) Z-score method: A value is flagged as an outlier if it's more than 3 standard deviations away from the mean (|z| > 3). This works well for approximately normal distributions and identifies extreme values in terms of standard deviations. For example, if mean = 100, σ = 10, then values below 70 or above 130 are outliers. (2) IQR (Interquartile Range) method: A value is an outlier if it's below Q1 - 1.5×IQR or above Q3 + 1.5×IQR, where IQR = Q3 - Q1. This method is robust to non-normal distributions and doesn't assume any particular shape. The tool reports outliers detected by either method. Not all outliers are errors—some represent genuine extreme cases (e.g., a billionaire in income data, a genius in IQ scores). Investigate outliers to determine if they're measurement errors, data entry mistakes, or valid extreme observations that deserve special attention or separate analysis.
Cohen's d is a standardized effect size that measures the difference between two group means in units of standard deviation: d = (mean₁ - mean₂) / pooled_σ. It's 'standardized' because it's unit-free, making it comparable across different measurements and studies. Interpretation guidelines: |d| < 0.2 = negligible difference (groups are nearly identical), 0.2-0.5 = small effect (noticeable but subtle difference), 0.5-0.8 = medium effect (moderate practical significance), > 0.8 = large effect (substantial difference that's obvious in practice). For example, d = 0.5 means the groups differ by half a standard deviation—a medium effect where about 69% of one group scores above the mean of the other group. Cohen's d is used to assess practical significance beyond statistical significance—a statistically significant p-value with d = 0.1 may not be meaningful in practice, while d = 0.9 indicates a large real-world difference even if the sample is small. In clinical trials, d > 0.5 often indicates clinically significant improvement.
Explore other statistical tools to complement your descriptive analysis
Convert z-scores to p-values and vice versa. Calculate critical values for one-tailed and two-tailed hypothesis tests with shaded graphs.
Calculate normal PDF/CDF, convert z ↔ x, and find one- or two-tailed probabilities with interactive bell curve visualization.
Calculate Pearson and Spearman correlation coefficients to measure linear and monotonic relationships between variables.
Perform linear, multiple, or polynomial regression online. Get regression equations, coefficients, R², residual plots, and model insights.
Compute confidence intervals for means (Z/t), proportions, and differences. See standard error, critical value, and margin of error.
Calculate probabilities for discrete and continuous distributions including binomial, Poisson, normal, and exponential with detailed results.
Access essential scientific calculators including stats quick calc, molarity, and percentage calculators for everyday calculations.
Enter your data and click Calculate to see mean, median, mode, standard deviation, skewness, kurtosis, and more