Plan Sample Size for Proportion Studies
Plan how many observations you need when working with proportions. Design sample sizes for single-proportion confidence intervals (margin of error) or for two-proportion comparisons (power-based). This is an educational tool, not for clinical or regulatory use.
Sample size is a planning question, answered before data collection rather than after. You set a precision target (margin of error for a single-proportion CI) or a detection target (power against a specific difference for a two-proportion test), and the math returns the n that hits it. The page handles both modes plus the inverse: given the n you're stuck with, what precision or power does it actually buy.
For a single proportion CI, n = z²·p(1−p) / E². Without a prior estimate of p, plug in 0.5: that maximizes p(1−p) at 0.25 and gives the worst-case (largest) n, so the realized proportion is guaranteed to fall inside your stated precision. For two-proportion tests, the formula uses both proportions plus α and target power; ballpark p's come from pilot data or domain knowledge, but pilot effect sizes are biased upward by selection. Anchor on a "smallest difference of interest" if you can. If your population is small relative to n, apply the finite-population correction; without it the calculator returns a larger n than you actually need.
Pick Goal: Margin of Error or Power
A sample size calculator for proportions serves two distinct purposes. The first is precision-driven: you want a confidence interval narrow enough to inform decisions, so you specify a target margin of error—say, plus or minus 3 percentage points. The formula then returns the n needed to hit that precision at your chosen confidence level.
The second purpose is power-driven: you plan to compare two proportions and need enough observations to detect a meaningful difference. Here you specify baseline and alternative proportions, alpha, and target power—commonly 80%. The calculator returns n per group required to achieve that detection probability.
These two modes answer different questions. Margin-of-error mode asks "how precisely can I estimate a single proportion?" Power mode asks "can I reliably detect a difference between two groups?" Confusing them leads to mismatched designs—a survey budget optimized for precision may lack power for subgroup comparisons.
Before opening the calculator, clarify your research goal. Will you report a single estimate with an interval, or will you test a hypothesis comparing groups? That choice determines which inputs the tool requires and how to interpret the output.
Baseline Proportion Guess (Worst-Case p=0.5)
The variance of a proportion depends on p itself: variance equals p times (1 − p) divided by n. This product is maximized when p equals 0.5, yielding 0.25. At p = 0.1, variance drops to 0.09; at p = 0.9, it's the same 0.09 by symmetry. Higher variance means you need more observations to achieve a given precision.
If you have no prior estimate of the true proportion, using p = 0.5 in your sample size calculation guarantees the worst-case scenario. The resulting n will be large enough regardless of where the actual proportion lands. This conservative approach is standard for polling and general surveys.
But if you have reliable prior data—say, historical defect rates around 2%—you can plug in p = 0.02. The required sample shrinks dramatically because variance is so much smaller. Just be sure the prior estimate applies to the current population and measurement method; otherwise, you risk undersizing.
Sensitivity analysis helps hedge uncertainty. Calculate n at p = 0.3, 0.5, and 0.7, then pick a sample size that covers the plausible range. This transparency reassures funders that the study is robust to estimation error.
Finite Population Adjustment (Optional)
Standard sample size formulas assume an infinite population—sampling with replacement or from a pool so large that each draw doesn't noticeably deplete it. When your population is finite and your sample represents a sizable fraction, variance shrinks because you're capturing more of the whole.
The finite population correction (FPC) is (N − n) / (N − 1), where N is the population size and n is the sample. If n is a tiny fraction of N, FPC is close to 1 and barely changes anything. But if you're surveying 500 employees out of a 600-person company, FPC reduces the required n noticeably.
Some calculators let you enter N and apply FPC automatically. Others provide the infinite-population n and leave you to adjust manually. The formula is n_adjusted = n / (1 + (n − 1) / N). This gives a smaller sample that still achieves the same margin of error within the bounded population.
Use FPC when sampling more than about 5–10% of the population. Below that threshold, the adjustment is negligible. Above it, ignoring FPC wastes resources on unnecessary observations—helpful knowledge when budgets are tight.
Two-Proportion Comparisons: Inputs That Matter
When planning a study to compare two proportions—control versus treatment, A versus B—the calculator needs both the baseline proportion (p1) and the alternative proportion (p2) you hope to detect. The difference |p2 − p1| is the effect size. Smaller differences require larger samples.
Alpha sets the false-positive rate, typically 0.05. Power sets the true-positive rate, commonly 0.80 or higher. Together they determine how the rejection region and non-centrality parameter combine. Lowering alpha or raising power both inflate sample size—there's no free lunch.
One-tailed tests assume you only care about improvements, not decrements. This boosts power but blinds you to harmful effects. Two-tailed tests detect either direction at the cost of slightly larger n. Most regulatory and scientific contexts prefer two-tailed unless directionality is justified a priori.
Pooled variance formulas assume both groups share the same underlying rate under the null. Unpooled formulas allow different variances in each arm. Results differ slightly; unpooled is more conservative when the true proportions diverge substantially.
Budget vs Precision Tradeoffs
Precision comes at a price. Halving the margin of error roughly quadruples the required sample because n scales with 1/ME². A survey needing ±5% precision at 95% confidence with p = 0.5 requires about 385 respondents. Tightening to ±2.5% jumps to roughly 1,537. Budget constraints often force compromise.
For hypothesis testing, detecting small effects demands large samples. A 1-percentage-point lift in conversion rate might need tens of thousands of users per arm. If your traffic can't support that volume, either accept lower power or reframe the question around a larger minimum detectable effect.
Cost per observation varies by context. Online surveys are cheap; clinical visits are expensive. Power analysis paired with cost modeling reveals the optimal balance. Sometimes it's better to accept wider intervals and save budget for follow-up studies than to over-invest in a single underpowered experiment.
Presenting tradeoff tables to stakeholders helps manage expectations. Show what precision or power is achievable at different budget levels. This transparency prevents disappointment when results come back with wider-than-hoped intervals or non-significant p-values.
Sizing the study: practical FAQ
Why does p = 0.5 give the largest sample size?
Because variance p(1 − p) is maximized at 0.5. Proportions near 0 or 1 have lower variance, so you need fewer observations for the same precision. Using 0.5 when uncertain guarantees enough data regardless of the true proportion.
How do I account for dropouts or non-response?
Inflate the calculated n by the expected attrition rate. If you expect 20% non-response, divide n by 0.80. This ensures your final analyzable sample meets the target. Ignoring attrition leads to underpowered studies.
What's the difference between pooled and unpooled variance?
Pooled variance uses the average of p1 and p2, assuming equal rates under the null. Unpooled uses each group's own variance. Pooled is simpler and often slightly larger; unpooled is more flexible when true proportions differ noticeably.
Should I always use the finite population correction?
Only when sampling more than about 5–10% of the population. Below that, FPC is negligible. Above it, ignoring FPC wastes resources. If your population is effectively infinite—like all internet users—skip the correction.
Can I use this calculator for cluster-randomized trials?
Not directly. Cluster designs inflate variance by an intraclass correlation factor. You'd need to multiply the simple n by the design effect. This calculator assumes simple random sampling; consult specialized tools for clustered data.
Limitations of these sample size formulas
Worst-case p: for a single-proportion CI, plugging p = 0.5 gives the largest n. Using a smaller assumed p shrinks the required sample but you lose the worst-case guarantee. Be sure your prior estimate of p is solid before relying on it.
Effect-size sensitivity: for two-proportion comparisons, n is highly sensitive to the assumed |p₁ − p₂|. A pilot estimate that's too optimistic gives n that won't actually detect the real (smaller) effect. Anchor on a "smallest difference of interest" if you can.
No attrition adjustment: the formula assumes all observations are usable. Real studies lose participants. Inflate n by your expected dropout rate or risk under-recruiting.
Simple random sampling: assumed. Complex sampling designs (clusters, strata, multi-stage) need a design effect adjustment that this calculator doesn't apply.
Note: PASS is the standard commercial tool for clinical trial sizing. R's pwr.2p.test (and pwr.2p2n.test for unequal n) gives the same numbers as this page. The pitfall already mentioned on the power calculator page applies here too: don't plug a noisy pilot effect size into a sample-size formula and expect it to detect the real effect.
Sources & References
The mathematical formulas and sample size concepts used in this calculator are based on established statistical theory and authoritative academic sources:
- •NIST/SEMATECH e-Handbook: Sample Sizes Required - Authoritative reference from the National Institute of Standards and Technology.
- •Penn State STAT 500: Sample Size for Proportions - University course material on sample size planning.
- •Cochran (1977): Sampling Techniques - Classic textbook on sample size formulas and survey methodology.
- •Statistics How To: Sample Size Guide - Practical explanations of sample size calculations.
- •OpenStax Statistics: Population Proportion - Free textbook chapter on proportion sample size.
Sizing a proportion study: working questions
How do I calculate sample size for a survey or proportion study?
For a single-proportion CI: n = z² · p(1 − p) / E², where z is the critical value (1.96 for 95%), p is the planning proportion (use 0.5 if unknown), and E is the desired margin of error. Example: 95% confidence, ±3 percentage points, p = 0.5 → n = (1.96)² · 0.25 / 0.03² ≈ 1067. That's the textbook "about 1000 respondents for ±3 points" rule of thumb. For two-proportion comparisons, the formula uses both p₁ and p₂ along with α and target power.
Why does p = 0.5 give the largest required sample size?
The variance of a Bernoulli is p(1 − p), which is maximized at p = 0.5 with value 0.25. At p = 0.1 or 0.9, variance is 0.09. Since required n is proportional to variance for a fixed margin of error, p = 0.5 is the worst case. Plugging in 0.5 guarantees your sample is large enough regardless of the true proportion. If you have prior data suggesting p is far from 0.5, you can use that and get a smaller required n, but you lose the worst-case guarantee.
How does margin of error change with sample size?
Inversely with √n. Halving the margin of error takes quadrupling n. To go from ±3 percentage points to ±1.5 points (with 95% confidence and p = 0.5), you'd need n ≈ 4267 instead of 1067. This is why political polls almost always cluster at n = 1000-1200; ±3 points is the floor where sample-size cost meets precision tolerance for most surveys. Going to ±1 point requires n in the 9600 range, which is only worth it for very high-stakes decisions.
Two-proportion vs one-proportion sample size, what's the difference?
One-proportion sizes a single CI to a target precision: n = z² · p(1 − p) / E². Two-proportion sizes a hypothesis test to detect a difference between groups with target power. The two-proportion formula needs α, target power (typically 0.80), and both p₁ and p₂. R's pwr::pwr.2p.test() and pwr::pwr.2p2n.test() (for unequal n) compute this. Detecting a difference of 5 percentage points (e.g., 50% vs 55%) at 80% power, α = 0.05 needs roughly n = 1565 per group.
What target power should I use for sample size planning?
0.80 is the convention, set informally by Cohen and codified into most journal guidelines and grant requirements. 0.90 if the cost of missing a true effect is high (regulatory submissions, safety studies). Below 0.80 is generally considered underpowered. The trade-off: power = 0.90 vs 0.80 increases n by roughly 35% for the same effect size. For exploratory studies, 0.80 is standard. For confirmatory or stakes-heavy studies, 0.90 is the safer choice.
How do I account for dropout or non-response?
Inflate the calculated n by your expected loss rate. If the formula gives n = 1000 and you expect 20% dropout, recruit n / (1 − 0.20) = 1250 to end up with about 1000 usable observations. Long studies often have 30-40% dropout, so the inflation factor matters. Survey response rates are typically much lower (10-30% for cold outreach), which drives the difference between sample size sent and sample size received. Plan for this upfront; under-recruitment is harder to fix once data collection has started.
When does the finite-population correction matter?
When your sample is more than ~5% of the total population. The correction multiplies the variance by (N − n)/(N − 1), shrinking the required sample size. For a town of 5,000 voters where you're polling 500 (10% of population), the FPC reduces the effective variance noticeably. For national polls drawing from 200 million eligible voters, the FPC is essentially 1 and ignored. Most calculators (including this page) apply the correction when N is provided.
What if I overestimated the effect size?
You'll be underpowered for the real (smaller) effect, and the study will likely fail to reject H₀. This is the most common failure mode in sample-size planning, because pilot effect sizes are upward-biased by selection (you only proceed when the pilot looks promising). Defenses: anchor on a domain-driven smallest effect of interest, not a pilot estimate. Run a sensitivity analysis showing required n across a range of plausible effect sizes. Build in a safety margin by inflating the assumed σ or shrinking the assumed effect by 20%. Pre-register the plan.
Related Math & Statistics Tools
Confidence Interval for Proportions
Compute CIs for a single proportion using Wald and Wilson methods
Hypothesis Test Power Calculator
Compute power or sample size for z/t tests on means
Z-Score & P-Value Calculator
Convert between z-scores, raw values, and p-values
Normal Distribution Calculator
Compute probabilities and quantiles for the normal distribution
Correlation Significance Calculator
Test whether a correlation is statistically significant