Skip to main content

Visualize Bayesian Updating From Prior to Posterior

Visualize how a Beta prior distribution updates to a posterior after observing successes and failures. See the shift in probability estimates and credible intervals.

Last Updated: February 13, 2026

Bayesian updating shows how beliefs shift as evidence accumulates—a core skill for A/B testing, clinical trials, and any setting where decisions hinge on uncertain probabilities. A product manager ran an A/B test with 45 conversions out of 500 visitors and wanted to know the true conversion rate. She started with a weak prior Beta(2, 2) and updated with the data, getting a posterior Beta(47, 457). The common mistake is treating the prior as an afterthought—choosing Beta(1, 1) always "to be objective" can waste useful historical information, while picking an overly strong prior overwhelms scarce data. When interpreting results, read the posterior mean as a point estimate and the credible interval as the range where the true rate probably lies, given your assumptions.

Set the Prior (Beta Parameters) With Intuition

The Beta distribution lives on [0, 1], making it a natural choice for probabilities. Its two parameters, α and β, shape the curve. Think of α as "pseudo-successes" and β as "pseudo-failures" baked into your initial belief. Beta(1, 1) is uniform—every probability is equally likely before you see data.

Higher α + β means a tighter distribution. Beta(2, 2) is gently mounded around 0.5, representing mild uncertainty. Beta(50, 50) is sharply peaked at 0.5, representing strong confidence that the probability is near 50%. The sum α + β acts like sample size—larger sums indicate firmer beliefs that take more data to move.

To encode historical knowledge, translate past data into α and β. If last quarter's conversion rate was 8% across 200 trials, a reasonable informative prior is Beta(16, 184), with mean 16/200 = 0.08. This anchors your analysis without ignoring what you already know.

Common priors:

• Beta(1, 1) — uniform, no preference

• Beta(0.5, 0.5) — Jeffreys, invariant under transformation

• Beta(2, 2) — weak, centered at 0.5

Add Evidence: Successes and Failures

The Beta-Binomial conjugate pair makes updating trivial. If your prior is Beta(α, β) and you observe s successes and f failures, the posterior is Beta(α + s, β + f). No integrals, no numerical approximations—just add the counts. This property is why the Beta prior is so popular for binary outcomes.

Sequential updating works identically: update after each batch of data, or wait until all data arrives—the final posterior is the same. Only total counts matter, not the order. This lets you monitor results in real time without penalty.

With little data, the posterior clings to the prior. With lots of data, the posterior shifts toward the observed rate and sharpens. Twenty observations barely move a Beta(100, 100) prior, but decisively reshape a Beta(1, 1) prior.

Example: Prior Beta(5, 5), observe 30 successes and 20 failures → Posterior Beta(35, 25). New mean = 35/60 ≈ 0.583, up from prior mean of 0.5.

Posterior Mean and Credible Interval Readout

The posterior mean is (α + s) / (α + s + β + f), a weighted blend of your prior belief and the observed rate. When α + β is small relative to s + f, the data dominates. When α + β is large, the prior anchors the estimate.

A 95% credible interval marks the region where the true probability lies with 95% posterior probability. Unlike frequentist confidence intervals, this is a direct probability statement given your prior and data—not a long-run coverage guarantee.

As data accumulates, the credible interval shrinks. With 10 observations, you might have [0.25, 0.75]. With 1,000 observations, that narrows to something like [0.48, 0.52]. More data means more certainty.

Credible interval from inverse CDF:

Lower = F⁻¹(0.025), Upper = F⁻¹(0.975)

where F is the Beta CDF with posterior parameters

How Strong Is the Prior? Sensitivity Toggle

Sensitivity analysis tests how much your conclusions depend on the prior. Run the same data through Beta(1, 1), Beta(5, 5), and Beta(50, 50). If all three posteriors agree, your inference is robust. If they diverge, you need more data or a better-justified prior.

A weak prior (low α + β) lets the data speak. A strong prior (high α + β) resists change. Neither is inherently right—the choice depends on how much you trust your prior information versus the new observations.

When sample size vastly exceeds prior strength, different priors converge to the same posterior. At n = 1,000 observations, Beta(1, 1) and Beta(10, 10) yield nearly identical results. At n = 10, they differ noticeably.

Tip: Document your prior choice and rationale. Reviewers and stakeholders should understand why you used Beta(α, β) and what it represents in real-world terms.

Limits: When Beta-Binomial Isn't Appropriate

The model assumes independent, identically distributed Bernoulli trials with constant success probability. If your success rate drifts over time, trials are clustered, or outcomes depend on covariates, the simple Beta-Binomial breaks down.

Non-binary outcomes need different conjugate pairs. For counts without an upper bound, use Poisson-Gamma. For continuous measurements, use Normal-Normal. The Beta-Binomial is purpose-built for yes/no data.

Hierarchical models handle groups with different underlying rates. If you're comparing multiple variants or segments, each with its own probability, a hierarchical Beta-Binomial pools information across groups rather than treating them as one homogeneous population.

Check assumptions: Independence, constant probability, binary outcomes. Violations require more sophisticated models (logistic regression, mixed effects, time-series).

Bayes Visualizer Questions

Why does my posterior look so similar to my prior?

Your prior is stronger than your data. If α + β = 100 and you observe 20 trials, the prior contributes five times as much weight as the data. Collect more observations, or use a weaker prior if you lack solid historical justification.

Can I use Bayesian updating for continuous outcomes?

Yes, but not with the Beta-Binomial. For continuous data, the Normal-Normal or Normal-Gamma conjugate pairs apply. Each likelihood-prior pair has its own update rules. The principle—posterior ∝ prior × likelihood—remains the same.

How do I compare two variants in an A/B test?

Compute separate posteriors for each variant. Then sample from both posteriors and count how often variant A exceeds variant B. The fraction of times A > B is P(A better than B). Alternatively, compute the posterior on the difference or ratio directly.

What if my prior is wrong?

With enough data, the prior washes out. A "wrong" prior slows convergence but doesn't block it. Sensitivity analysis helps you see how much the prior affects your conclusions at your current sample size.

Why is the credible interval asymmetric?

If α and β differ, the Beta distribution is skewed. A posterior Beta(10, 90) has mean 0.1 and is right-skewed, so the interval is tighter on the low end and longer on the high end. Symmetry appears only when α ≈ β.

Limitations & Assumptions

• Beta-Binomial Only: This tool handles binary (yes/no) outcomes with constant probability. Count data, continuous measurements, or varying rates require different models.

• Independence Required: Trials must be independent. Clustered or correlated observations violate the model's assumptions and can produce misleading posteriors.

• Prior Matters: With small samples, different priors yield different conclusions. Document your prior choice and run sensitivity checks before drawing firm conclusions.

• Numerical Precision: Inverse CDF calculations use iterative search. For extreme parameter values, precision may be limited to approximately 1e-8.

Disclaimer: This calculator demonstrates Bayesian updating for educational purposes. For production A/B testing, clinical trials, or regulatory submissions, use validated statistical software and consult qualified statisticians.

Sources & References

Methods and formulas follow standard Bayesian statistics references:

Frequently Asked Questions

Common questions about Bayesian updating, prior and posterior distributions, Beta distribution, credible intervals, conjugate priors, Bayes' theorem, and how to use this visualizer for homework and statistics practice.

What is the difference between prior and posterior?

The prior represents your belief about a parameter before seeing any data. It encodes your initial uncertainty. The posterior is your updated belief after incorporating observed data through Bayes' theorem. As you observe more data, the posterior becomes more concentrated around the true value and less influenced by the prior.

Why do we use the Beta distribution for probabilities?

The Beta distribution is defined on the interval [0, 1], making it perfect for modeling probabilities. It's also flexible — by adjusting α and β, you can create distributions that are uniform, symmetric, skewed left or right, concentrated or spread out. Most importantly, Beta is a 'conjugate prior' to the Binomial likelihood, meaning the posterior is also a Beta distribution, which makes the math elegant and computation simple.

What does a 95% credible interval mean?

A Bayesian credible interval is a direct probability statement about the parameter. A 95% credible interval means: 'Given the prior and observed data, there is a 95% probability that the true parameter θ lies within this interval.' This is more intuitive than frequentist confidence intervals, which describe long-run coverage properties of the procedure, not the probability that any specific interval contains the parameter.

How is a credible interval different from a confidence interval?

A frequentist confidence interval says: 'If we repeated this experiment many times, 95% of the intervals we construct would contain the true parameter.' A Bayesian credible interval says: 'Given our model and data, there's a 95% probability the parameter is in this range.' The Bayesian interpretation is often more aligned with what people intuitively want to know.

What prior should I use if I have no prior information?

A common choice for an 'uninformative' or 'diffuse' prior is Beta(1, 1), which is the uniform distribution — it assigns equal probability to all values between 0 and 1. However, there's debate about what truly constitutes an uninformative prior. Some statisticians prefer Beta(0.5, 0.5) (Jeffreys prior) as it's invariant under certain transformations.

Can I use this tool to make real decisions?

This tool is designed for educational purposes to help you understand Bayesian concepts. For real-world decisions in medicine, business, engineering, or other domains, you should use validated statistical software, consider all relevant factors, consult domain experts, and understand the limitations of your model and data.

What happens if I have a strong prior and little data?

With a strong (concentrated) prior and little data, your posterior will be heavily influenced by your prior beliefs. The posterior will shift somewhat toward the data, but won't move far from the prior. This is why prior choice is important — a strong prior can 'overwhelm' small amounts of data. With more data, the likelihood dominates and the posterior becomes less sensitive to the prior.

How do I interpret the prior/posterior mean and mode?

The mean is the expected value — if you averaged many draws from the distribution, you'd get this value. For Beta(α, β), mean = α/(α+β). The mode is the most probable value (the peak of the distribution). For Beta, mode = (α-1)/(α+β-2), but only when α > 1 and β > 1. For inference, the mean is often more stable, while the mode can be useful for point estimation.

Can I do Bayesian updating sequentially?

Yes! One of the elegant properties of Bayesian inference is that you can update sequentially. After observing some data, your posterior becomes your new prior for the next batch of data. For the Beta-Binomial model, just keep adding successes to α and failures to β. The final posterior is the same whether you update all at once or in batches — only the total counts matter.

What assumptions does this model make?

The Beta-Binomial model assumes: (1) Each trial is independent of others, (2) The probability of success θ is constant across trials, (3) There are only two outcomes per trial (success/failure). If your data violates these assumptions — for example, if success rates trend over time or vary by segment — this simple model may not be appropriate.

Bayesian Update Visualizer: Prior to Posterior