Skip to main content

Convert Z-Scores to P-Values and Critical Z

Convert X to Z and find p-values, or get critical Z from p. Supports standard and custom normal distributions with shaded graph.

Last Updated: February 13, 2026

The conversion runs both ways. Given a z, return the p-value (one-tailed left, one-tailed right, or two-tailed); given a p, return the critical z. The same Φ does both directions. The inverse just runs Newton's method on Φ(z) − p = 0. For two-tailed work, mind the factor of two: the critical z for α = 0.05 two-tailed is 1.96 (Φ⁻¹(0.975)), not 1.645 (Φ⁻¹(0.95), which is the one-tailed value).

Use this for the standard hypothesis-testing workflow. You have a test statistic and need a p, or you have an α and need the rejection cutoff. Critical values match NIST tables to six decimals (z₀.₀₂₅ = 1.95996, z₀.₀₀₅ = 2.57583). The one mistake worth flagging in advance: don't change tail direction after seeing the data. Switching from two-tailed to one-tailed post-hoc halves your p-value and lands you in p-hacking territory. The kind of move that retracts papers. Pre-register the tail before computing.

One-Tailed vs Two-Tailed: Choose the Right Tail

Z-to-p conversions require specifying tail direction. Selecting the wrong tail doubles or halves your p-value and can flip your hypothesis test conclusion. Before computing, match your alternative hypothesis to a tail type.

Left-Tailed: P(Z ≤ z)

Use left-tailed when your alternative hypothesis claims the parameter is less than the null value. "The new process reduces defect rate" or "Treatment lowers blood pressure" requires a left-tail test. The p-value equals the cumulative probability from negative infinity up to your z-score.

Right-Tailed: P(Z ≥ z)

Use right-tailed when your alternative claims the parameter is greater than the null value. "The drug increases survival time" or "Marketing campaign raised conversion rate" needs a right-tail test. The p-value equals 1 − CDF(z), the probability above your z-score.

Two-Tailed: P(|Z| ≥ |z|)

Use two-tailed when your alternative states "not equal to" without specifying direction. "The mean differs from the claimed value" or "There is a difference" requires a two-tail test. The p-value sums both extreme tails: 2 × (1 − Φ(|z|)). Two-tailed tests are more conservative—the same |z| produces a larger p-value than either one-tailed test.

Common mistake: Switching tail direction after seeing results. Your hypothesis determines the tail before data collection. Changing it post-hoc inflates false positive rates and undermines statistical validity.

Critical Z for Any Alpha (Replace the Z-Table)

Critical z-values define rejection regions for hypothesis tests. Instead of hunting through printed z-tables, enter your significance level α and let the calculator return the exact cutoff.

Common Critical Values

Confidence / αTwo-Tailed zOne-Tailed z
90% / α = 0.10±1.6451.282
95% / α = 0.05±1.9601.645
99% / α = 0.01±2.5762.326
99.9% / α = 0.001±3.2913.090

Using p → Z Mode

Enter your target p-value (e.g., 0.05) and select tail type. The calculator inverts the CDF to return the corresponding z. For two-tailed at α = 0.05, you get ±1.96. For one-tailed right at α = 0.05, you get +1.645. These values define rejection thresholds: if your test statistic exceeds the critical z, reject the null.

Non-Standard Alpha Levels

Not all tests use 0.05. Medical trials often use α = 0.01 for safety margins. Physics discovery claims require 5σ (p < 2.87 × 10⁻⁷). Exploratory research might use α = 0.10. Enter any α to get the corresponding critical z without interpolating tables.

Convert z ↔ p With Correct Rounding

Z-scores and p-values are two sides of the same coin. Converting between them requires the standard normal CDF (Φ) and its inverse. The calculator handles both directions with high precision.

Z → p: Finding Probability from Z-Score

Enter a z-score and select tail type. The calculator computes Φ(z) for left-tail, 1 − Φ(z) for right-tail, or 2 × (1 − Φ(|z|)) for two-tail. Results display to several decimal places—more precision than any printed table provides.

p → Z: Finding Z-Score from Probability

Enter a p-value and select tail type. The calculator inverts the CDF to find the z-score where the cumulative (or tail) probability equals your input. This is essential for constructing confidence intervals or finding rejection boundaries.

X → z → p: Full Conversion Chain

If you have a raw score x with known μ and σ, use X → z → p mode. The calculator first standardizes using z = (x − μ) / σ, then computes the tail probability. This connects raw measurements to statistical significance in one step.

Precision note: Results are accurate to many decimal places, matching R, Python SciPy, and professional statistical software. For extremely small p-values (below 10⁻¹⁰), minor rounding may appear.

Tail Shading Preview: What Your p-Value Represents

The interactive bell curve chart visualizes exactly which area corresponds to your p-value. Shading confirms that your tail selection matches your hypothesis and helps catch input errors before you act on results.

Left-Tail Shading

The region from negative infinity up to your z-score fills with color. The shaded area equals the cumulative probability P(Z ≤ z). Smaller (more negative) z-scores produce larger shaded areas.

Right-Tail Shading

The region from your z-score out to positive infinity fills. The shaded area equals P(Z ≥ z) = 1 − Φ(z). Larger z-scores produce smaller shaded areas—consistent with smaller p-values for more extreme observations.

Two-Tail Shading

Both extreme tails shade simultaneously: below −|z| and above +|z|. The total shaded area equals the two-tailed p-value. The center of the curve (the "acceptance region") remains unshaded, representing outcomes consistent with the null hypothesis.

Between-Bounds Shading

When computing interval probability P(a ≤ Z ≤ b), only the region between your lower and upper bounds shades. The rest of the curve stays clear. This mode answers "what fraction falls within this range" questions directly.

Decision Rule in Plain English (p vs Alpha)

Hypothesis testing compares your computed p-value against a pre-chosen significance level α. The decision rule is straightforward: if p ≤ α, reject the null; if p > α, fail to reject.

Rejection Interpretation

When p ≤ α, the observed data are unlikely under the null hypothesis—unlikely enough that you conclude the null is probably false. You "reject H₀" and accept the alternative. This does not prove the alternative true, but indicates sufficient evidence against the null at your chosen confidence level.

Failure to Reject

When p > α, the data are consistent with the null hypothesis—not proof that the null is true, but insufficient evidence to reject it. "Fail to reject H₀" is the correct phrasing; "accept H₀" overstates the conclusion.

Practical vs Statistical Significance

A small p-value indicates statistical significance—the effect is unlikely to be noise. It says nothing about effect size or practical importance. A tiny difference can be "significant" with large samples, yet irrelevant in practice. Always report effect sizes alongside p-values.

P-value misreading: A p-value of 0.03 does not mean "3% chance the null is true." It means "if the null were true, data this extreme would occur 3% of the time." The distinction matters for proper interpretation.

Z-to-P Quick Checks

What does z = 0 mean?

Z = 0 means the observation equals the mean exactly. Left-tail p = 0.5, right-tail p = 0.5, two-tail p = 1.0. No deviation from the null, no evidence against it.

Why is my two-tailed p-value double the one-tailed?

Two-tailed tests count extremity in both directions. If the right-tail p is 0.025, the two-tailed p is 0.05 because you also count equally extreme negative z. This makes two-tailed tests more conservative.

When should I use the t-distribution instead?

Use t when the population σ is unknown and estimated from sample data, especially with n < 30. The t-distribution has heavier tails, producing larger p-values and wider confidence intervals to account for estimation uncertainty. As sample size grows, t converges to z.

How do I interpret z = 3 or higher?

Z = 3 corresponds to about 3 standard deviations from the mean. Only 0.3% of values under a normal distribution fall beyond ±3σ. Values this extreme provide strong evidence against the null. In quality control, 3σ events are rare; in physics, 5σ (z ≈ 5) is the threshold for "discovery."

Can I use z-tests for proportions?

Yes—for large samples, the sampling distribution of a proportion is approximately normal. Compute z = (p̂ − p₀) / √(p₀(1−p₀)/n), then use this calculator to find the p-value. This is the basis of one-proportion z-tests.

What if my data are not normal?

Z-based inference assumes normality. For non-normal data, consider transformations (log, square root), non-parametric tests (Wilcoxon, Mann-Whitney), or bootstrapping. The Central Limit Theorem justifies z-tests for large samples even with non-normal populations, but check assumptions first.

Worked examples: z to p in both directions

Six problems: forward (z to p) and inverse (p to z), with one-tailed and two-tailed forms. Numbers match scipy.stats.norm to four decimals.

Problem. z = 2.44. Find the two-tailed p-value.

Two-tailed p = 2 · (1 − Φ(|z|)). With Φ(2.44) ≈ 0.99266: p = 2 · (1 − 0.99266) = 2 · 0.00734 ≈ 0.01469.

The factor of 2 accounts for both tails being equally extreme under H₀. For a one-sided alternative, drop the factor and report the upper-tail p = 0.00734 directly. The same z = 2.44 supports a stronger conclusion in a one-sided test, which is one reason pre-registering the direction matters.

Answer: two-tailed p ≈ 0.0147. Below α = 0.05 (reject H₀ at the 5% level), above α = 0.01 (don't reject at 1%).

Problem. z = 2.43685. Find the two-tailed p-value.

Same formula, more precision in the input. Φ(2.43685) ≈ 0.99258. p = 2 · (1 − 0.99258) = 2 · 0.00742 ≈ 0.01485.

Compared to the previous example, the z value differs by 0.00315 and p shifts from 0.01469 to 0.01485, a change of about 0.00016. In the tail, the standard normal density φ(z) is small (here φ(2.44) ≈ 0.0203), so p barely moves with small z perturbations. That sensitivity matters when you're reproducing a paper's reported p to the published precision.

Answer: two-tailed p ≈ 0.0148.

Problem. The two-tailed p-value is 0.0147. What |z| produced it?

Invert the two-tailed formula. p = 2 · (1 − Φ(|z|)) gives 1 − Φ(|z|) = p/2 = 0.00735, so Φ(|z|) = 0.99265 and |z| = Φ⁻¹(0.99265) ≈ 2.4395, which rounds to 2.44.

In code: scipy.stats.norm.ppf(1 − 0.0147/2) = 2.4395. Same calculation, same answer. The sign comes from the direction of the test statistic, not from p alone, since a two-tailed p doesn't carry sign information.

Answer: |z| ≈ 2.44.

Problem. z = 1.96. Find the p-value (one-tailed and two-tailed).

Φ(1.96) = 0.97500 to four decimals. One-tailed (upper): p = 1 − Φ(1.96) = 0.02500. Two-tailed: p = 2 · 0.02500 = 0.05000.

This is the canonical reason 1.96 shows up as the critical value for a 95% two-sided confidence interval and an α = 0.05 two-tailed test. Sometimes rounded to 2 in undergraduate texts, but 1.96 (to two decimals) is the version reported in journals. The full-precision value is 1.959963984540054.

Answer: one-tailed p = 0.0250, two-tailed p = 0.0500.

Problem. z = 1.645. Find the one-tailed p-value.

Φ(1.645) ≈ 0.95001. One-tailed (upper): p = 1 − 0.95001 ≈ 0.04999, conventionally written 0.05.

Critical value for an α = 0.05 one-tailed test or a 90% two-sided CI. Easy to confuse with 1.96 (the two-tailed analog at α = 0.05). Quick mnemonic: 1.645 is the "half" one-sided cousin of 1.96, and 90% two-sided uses ±1.645 because each tail holds 5%. The four-decimal value 1.6449 is what scipy.stats.norm.ppf(0.95) returns; the rounding to 1.645 is convention, not loss of precision.

Answer: one-tailed p ≈ 0.05.

Problem. A test statistic gives z = -1.32. Find the one-tailed (lower) p-value.

Lower-tail p is just Φ(z). Φ(-1.32) = 1 − Φ(1.32) ≈ 1 − 0.9066 = 0.0934. The two-tailed analog would be 2 · 0.0934 ≈ 0.1868.

For a negative z and a two-sided test, you can also compute the two-tailed p as 2 · Φ(z) directly without the absolute value, since Φ(-1.32) = 1 − Φ(1.32). Both routes give the same answer. The sign of z matters only for one-sided tests, where the alternative specifies a direction.

Practical note: software typically reports the two-tailed p by default. If your test is one-sided, divide by 2, but only after checking that z falls in the alternative's tail. If z is on the "wrong" side (here, z = +1.32 against a left-sided alternative), the one-sided p is 1 − 0.0934 = 0.9066, not 0.0934, since the data point against H₀ in that direction is essentially nonexistent.

Answer: one-tailed (lower) p ≈ 0.0934. Above 0.05, so don't reject H₀ at the standard 5% level for a left-sided test.

Limitations of z-based inference

Known σ: z-tests assume the population standard deviation is known. If you estimated σ from sample data, you want the t-distribution, especially for n below 30.

Underlying normality: required. Severe skew or heavy tails distort tail probabilities. Verify with a Q-Q plot or Shapiro-Wilk before reporting a p.

P-value semantics: a p-value is the probability of data this extreme under H₀, not the probability that H₀ is true. The distinction matters when communicating results.

Note: The pitfall to flag specifically for this tool: don't switch tail direction after seeing the data. Picking one-tailed because a two-tailed test "didn't make it" halves your p-value and lands you in p-hacking territory. Pre-register the tail. The ASA Statement on p-values (2016) is the standard reference for what p-values do and don't tell you. For implementation, scipy.stats.norm.cdf and R's pnorm match NIST tables to at least 6 decimals.

Sources

Z to p, p to z: working questions

Why does z = 2.44 give a two-tailed p of 0.0147?

For two-tailed: p = 2 · (1 − Φ(|z|)). Φ(2.44) ≈ 0.99266. So 1 − Φ(2.44) ≈ 0.00734, and doubling gives p ≈ 0.0147. The intuition: 2.44 standard deviations above the mean leaves about 0.73% in the upper tail, and a two-tailed test counts equally extreme values in the lower tail too, hence the doubling. R's 2*(1-pnorm(2.44)) and scipy.stats.norm.sf(2.44)*2 both return 0.014694..., matching this exactly.

How do I convert z to p by hand without a table?

For practical work, memorize a few anchors. z = 1.0 gives one-tail p ≈ 0.1587. z = 1.645 gives 0.05. z = 1.96 gives 0.025. z = 2.326 gives 0.01. z = 3.0 gives 0.00135. Interpolate linearly between anchors for rough work. For real precision, the rational Chebyshev approximation (Abramowitz & Stegun 26.2.17) gets you to 10⁻⁷ accuracy with a polynomial in z. Easier: any pocket calculator, Excel (NORM.S.DIST(z, TRUE)), R, or Python does the conversion in milliseconds and beats any by-hand method.

When I invert a two-tailed p back to z, is the answer unique?

For one-tailed conversions, yes. Each p in (0, 1) corresponds to a unique z. For two-tailed, the answer comes back as ±|z| because both signs give the same |z| and therefore the same two-tailed p. Inverting p = 0.05 gives z = ±1.96, not just 1.96. If your hypothesis is directional (one-tailed), you need to commit to a sign before computing the critical value. The page returns both critical values for two-tailed inversion.

What's the difference between left-tail, right-tail, and two-tailed p-values?

Left-tail is P(Z ≤ z), the p you want when the alternative hypothesis claims the parameter is below the null. Right-tail flips that: P(Z ≥ z) = 1 − Φ(z), used when the alternative says above. Two-tailed sums both extremes: P(|Z| ≥ |z|) = 2 · (1 − Φ(|z|)). Use it for non-directional alternatives. The two-tailed test is more conservative (larger p for the same |z|), which protects against the temptation to pick the more favorable tail post-hoc.

When should I use a t-distribution instead of z?

Whenever you estimated σ from sample data with small n. The t-distribution has heavier tails than the normal, so critical values are larger and p-values larger for the same test statistic. At df = 10, t₀.₀₂₅ = 2.228 versus z = 1.96; the gap shrinks to 2.042 at df = 30 and to a rounding error by df = 120. Rule of thumb: switch to t whenever you're estimating σ. With large samples (n > 30 or so) the difference vanishes, but using t is never wrong.

What is the empirical rule and how does it relate to z?

The 68-95-99.7 rule for the standard normal. About 68% of the distribution falls within |z| < 1, about 95% within |z| < 2, about 99.7% within |z| < 3. The exact tail probabilities are 31.7%, 4.55%, and 0.27% respectively. It's a quick mental check: if your z is around 2, you're at the edge of "unusual." At 3, genuinely rare. Beyond 5, "discovery threshold" in physics (p ≈ 2.87 × 10⁻⁷).

How do I read the critical values 1.645, 1.96, 2.326, 2.576?

1.645 is the one-tailed 5% critical z, the 95th percentile. For a two-tailed test at α = 0.05, you want 1.96 (the 97.5th percentile). At α = 0.01: one-tailed cutoff 2.326 (99th percentile), two-tailed cutoff 2.576 (99.5th percentile). For 95% confidence intervals on a mean, you use ±1.96 standard errors. NIST/SEMATECH e-Handbook §1.3.6.7.1 lists the long-form values to six decimals.

Does a low p mean my hypothesis is true?

No. The p-value is P(data this extreme or more | H₀ true), not P(H₀ true | data). A p of 0.03 means "if the null held, we'd see data this extreme 3% of the time," not "there's a 3% chance the null is true." Posterior probabilities of hypotheses live in a Bayesian framework with explicit priors. Treat p as evidence against H₀ at the chosen significance level, not as a measure of how likely H₀ is. The ASA Statement on p-values (2016) is the canonical reference on this distinction.

Related Math & Statistics Tools