Plan Sample Size for Proportion Studies
Plan how many observations you need when working with proportions. Design sample sizes for single-proportion confidence intervals (margin of error) or for two-proportion comparisons (power-based). This is an educational tool, not for clinical or regulatory use.
Pick Goal: Margin of Error or Power
A sample size calculator for proportions serves two distinct purposes. The first is precision-driven: you want a confidence interval narrow enough to inform decisions, so you specify a target margin of error—say, plus or minus 3 percentage points. The formula then returns the n needed to hit that precision at your chosen confidence level.
The second purpose is power-driven: you plan to compare two proportions and need enough observations to detect a meaningful difference. Here you specify baseline and alternative proportions, alpha, and target power—commonly 80%. The calculator returns n per group required to achieve that detection probability.
These two modes answer different questions. Margin-of-error mode asks "how precisely can I estimate a single proportion?" Power mode asks "can I reliably detect a difference between two groups?" Confusing them leads to mismatched designs—a survey budget optimized for precision may lack power for subgroup comparisons.
Before opening the calculator, clarify your research goal. Will you report a single estimate with an interval, or will you test a hypothesis comparing groups? That choice determines which inputs the tool requires and how to interpret the output.
Baseline Proportion Guess (Worst-Case p=0.5)
The variance of a proportion depends on p itself: variance equals p times (1 − p) divided by n. This product is maximized when p equals 0.5, yielding 0.25. At p = 0.1, variance drops to 0.09; at p = 0.9, it's the same 0.09 by symmetry. Higher variance means you need more observations to achieve a given precision.
If you have no prior estimate of the true proportion, using p = 0.5 in your sample size calculation guarantees the worst-case scenario. The resulting n will be large enough regardless of where the actual proportion lands. This conservative approach is standard for polling and general surveys.
But if you have reliable prior data—say, historical defect rates around 2%—you can plug in p = 0.02. The required sample shrinks dramatically because variance is so much smaller. Just be sure the prior estimate applies to the current population and measurement method; otherwise, you risk undersizing.
Sensitivity analysis helps hedge uncertainty. Calculate n at p = 0.3, 0.5, and 0.7, then pick a sample size that covers the plausible range. This transparency reassures funders that the study is robust to estimation error.
Finite Population Adjustment (Optional)
Standard sample size formulas assume an infinite population—sampling with replacement or from a pool so large that each draw doesn't noticeably deplete it. When your population is finite and your sample represents a sizable fraction, variance shrinks because you're capturing more of the whole.
The finite population correction (FPC) is (N − n) / (N − 1), where N is the population size and n is the sample. If n is a tiny fraction of N, FPC is close to 1 and barely changes anything. But if you're surveying 500 employees out of a 600-person company, FPC reduces the required n noticeably.
Some calculators let you enter N and apply FPC automatically. Others provide the infinite-population n and leave you to adjust manually. The formula is n_adjusted = n / (1 + (n − 1) / N). This gives a smaller sample that still achieves the same margin of error within the bounded population.
Use FPC when sampling more than about 5–10% of the population. Below that threshold, the adjustment is negligible. Above it, ignoring FPC wastes resources on unnecessary observations—helpful knowledge when budgets are tight.
Two-Proportion Comparisons: Inputs That Matter
When planning a study to compare two proportions—control versus treatment, A versus B—the calculator needs both the baseline proportion (p1) and the alternative proportion (p2) you hope to detect. The difference |p2 − p1| is the effect size. Smaller differences require larger samples.
Alpha sets the false-positive rate, typically 0.05. Power sets the true-positive rate, commonly 0.80 or higher. Together they determine how the rejection region and non-centrality parameter combine. Lowering alpha or raising power both inflate sample size—there's no free lunch.
One-tailed tests assume you only care about improvements, not decrements. This boosts power but blinds you to harmful effects. Two-tailed tests detect either direction at the cost of slightly larger n. Most regulatory and scientific contexts prefer two-tailed unless directionality is justified a priori.
Pooled variance formulas assume both groups share the same underlying rate under the null. Unpooled formulas allow different variances in each arm. Results differ slightly; unpooled is more conservative when the true proportions diverge substantially.
Budget vs Precision Tradeoffs
Precision comes at a price. Halving the margin of error roughly quadruples the required sample because n scales with 1/ME². A survey needing ±5% precision at 95% confidence with p = 0.5 requires about 385 respondents. Tightening to ±2.5% jumps to roughly 1,537. Budget constraints often force compromise.
For hypothesis testing, detecting small effects demands large samples. A 1-percentage-point lift in conversion rate might need tens of thousands of users per arm. If your traffic can't support that volume, either accept lower power or reframe the question around a larger minimum detectable effect.
Cost per observation varies by context. Online surveys are cheap; clinical visits are expensive. Power analysis paired with cost modeling reveals the optimal balance. Sometimes it's better to accept wider intervals and save budget for follow-up studies than to over-invest in a single underpowered experiment.
Presenting tradeoff tables to stakeholders helps manage expectations. Show what precision or power is achievable at different budget levels. This transparency prevents disappointment when results come back with wider-than-hoped intervals or non-significant p-values.
Sample Size Questions, Answered
Why does p = 0.5 give the largest sample size?
Because variance p(1 − p) is maximized at 0.5. Proportions near 0 or 1 have lower variance, so you need fewer observations for the same precision. Using 0.5 when uncertain guarantees enough data regardless of the true proportion.
How do I account for dropouts or non-response?
Inflate the calculated n by the expected attrition rate. If you expect 20% non-response, divide n by 0.80. This ensures your final analyzable sample meets the target. Ignoring attrition leads to underpowered studies.
What's the difference between pooled and unpooled variance?
Pooled variance uses the average of p1 and p2, assuming equal rates under the null. Unpooled uses each group's own variance. Pooled is simpler and often slightly larger; unpooled is more flexible when true proportions differ noticeably.
Should I always use the finite population correction?
Only when sampling more than about 5–10% of the population. Below that, FPC is negligible. Above it, ignoring FPC wastes resources. If your population is effectively infinite—like all internet users—skip the correction.
Can I use this calculator for cluster-randomized trials?
Not directly. Cluster designs inflate variance by an intraclass correlation factor. You'd need to multiply the simple n by the design effect. This calculator assumes simple random sampling; consult specialized tools for clustered data.
Limitations & Assumptions
• Normal Approximation: Sample size formulas assume the normal approximation to the binomial is valid (np and n(1−p) both exceeding about 5). For extreme proportions near 0 or 1, exact methods may yield more accurate but larger sample sizes.
• Simple Random Sampling: Formulas assume simple random sampling with independent observations. For stratified, clustered, or multistage designs, standard formulas underestimate required sample sizes—design effects must be incorporated.
• No Attrition Adjustment: Calculated n assumes all observations are usable. Real studies experience non-response and dropout that reduce effective sample size. Inflate n accordingly before data collection.
• Effect Size Sensitivity: For two-sample tests, sample size is highly sensitive to the assumed difference between proportions. Small changes in effect size dramatically alter required n. Conduct sensitivity analyses.
Important Note: This calculator is for educational and informational purposes only. It demonstrates how sample size planning works mathematically, not for clinical trial design, regulatory submissions, or high-stakes research. Professional applications require proper consideration of dropout rates, interim analyses, multiple comparisons, and regulatory requirements. For real studies, use dedicated software (G*Power, PASS, nQuery, R packages) and consult with qualified biostatisticians.
Sources & References
The mathematical formulas and sample size concepts used in this calculator are based on established statistical theory and authoritative academic sources:
- •NIST/SEMATECH e-Handbook: Sample Sizes Required - Authoritative reference from the National Institute of Standards and Technology.
- •Penn State STAT 500: Sample Size for Proportions - University course material on sample size planning.
- •Cochran (1977): Sampling Techniques - Classic textbook on sample size formulas and survey methodology.
- •Statistics How To: Sample Size Guide - Practical explanations of sample size calculations.
- •OpenStax Statistics: Population Proportion - Free textbook chapter on proportion sample size.
Frequently Asked Questions
Common questions about sample size for proportions, confidence interval sample size, margin of error, power analysis, planning proportion, and how to use this calculator for homework and study design practice.
What is a planning proportion and why do I need it?
The planning proportion (p*) is your best guess about what the true proportion might be. It's used to estimate the variance p*(1 − p*) needed for sample size calculations. If you have prior data or expert knowledge, use that estimate. If not, using 0.5 gives the most conservative (largest) sample size because variance is maximized at p = 0.5.
Why does assuming p = 0.5 give the largest required sample size?
The variance term p*(1 − p*) reaches its maximum value of 0.25 when p* = 0.5. Proportions near 0 or 1 have smaller variance (e.g., 0.1 × 0.9 = 0.09). Since sample size is proportional to variance, p* = 0.5 requires the most observations. This is the 'worst-case' or 'conservative' approach that guarantees adequate precision regardless of the actual proportion.
What is the difference between margin of error and power?
Margin of error applies to confidence intervals—it's the half-width of the interval (e.g., ±5%). Power applies to hypothesis tests—it's the probability of correctly rejecting a false null hypothesis (detecting a real effect). CI design focuses on precision (how narrow your interval is), while test design focuses on the ability to detect differences.
How does the effect size affect the required sample size for two-proportion tests?
Smaller effect sizes (differences between p₁ and p₂) require larger sample sizes to detect. If you're looking for a 1% difference, you'll need many more observations than if looking for a 10% difference. The sample size roughly scales with 1/(effect size)², so halving the effect size quadruples the required n.
What is the difference between pooled and unpooled variance approximations?
The pooled approach uses the average proportion p̄ = (p₁ + p₂)/2 to estimate variance under the null hypothesis (assuming equal proportions). The unpooled approach uses each group's expected variance separately. Pooled is traditional for significance tests; unpooled may be more appropriate when planning for specific alternative proportions. Results are usually similar.
Why can't I use this calculator alone for clinical trials?
Clinical trial sample size calculations require much more sophisticated analysis: multiple endpoints, interim analyses, adaptive designs, adjustment for dropout, multiple comparisons corrections, regulatory considerations, and careful effect size estimation from pilot studies. This tool uses simplified normal approximations suitable for learning and rough planning, not regulatory submissions.
What assumptions does this calculator make?
This calculator assumes: (1) Simple random sampling, (2) Independent observations (Bernoulli trials), (3) Normal approximation to the binomial distribution, (4) For two-sample tests: equal group sizes and independent groups. These approximations work well for moderate sample sizes and proportions not too close to 0 or 1, but may be inaccurate for small samples or extreme proportions.
How should I interpret the 'achieved power' for two-sample tests?
Since sample size must be a whole number, we round up the calculated n. The 'achieved power' shows the approximate power you'll actually get with this rounded sample size—it will typically be slightly higher than your target power. This is an approximation based on the same formula used for the calculation.
Related Math & Statistics Tools
Confidence Interval for Proportions
Compute CIs for a single proportion using Wald and Wilson methods
Hypothesis Test Power Calculator
Compute power or sample size for z/t tests on means
Z-Score & P-Value Calculator
Convert between z-scores, raw values, and p-values
Normal Distribution Calculator
Compute probabilities and quantiles for the normal distribution
Correlation Significance Calculator
Test whether a correlation is statistically significant