Logistic Regression Probability Curve Visualizer
Visualize the S-shaped sigmoid curve of a logistic regression model. Set the intercept and slope parameters, define an x-range, and see how predicted probabilities change across the feature space.
Configure Your Logistic Regression Model
Set the intercept and slope parameters to visualize the S-shaped probability curve. See how different coefficients affect the predicted probabilities across your chosen x-range.
Quick Tips:
- 1.Choose a preset or enter custom model parameters
- 2.Set the x-range to focus on your region of interest
- 3.Adjust the decision threshold (default: 0.5)
- 4.View the probability curve and key model insights
Positive Slope:
- - Probability increases with x
- - Higher x = more likely positive
- - S-curve rises left to right
Negative Slope:
- - Probability decreases with x
- - Higher x = less likely positive
- - S-curve falls left to right
What the Sigmoid Curve Actually Tells You
You trained a logistic model to predict whether a lead will convert. The output for a specific lead is 0.73. That number is not a score or a rank — it is a probability estimated by a sigmoid function that squashes the linear predictor into the 0–1 range. A logistic regression probability curve visualiser plots that S-shaped mapping across the full input range so you can see exactly where the transition from “almost certainly no” to “almost certainly yes” happens, and how steeply it occurs.
The mistake that confuses most first-time users: reading the linear predictor (log-odds) as though it were the probability. A log-odds value of 2.0 does not mean “200% chance” — it maps to a probability of about 0.88 through the sigmoid. The curve visualiser makes this distinction concrete by showing both the straight line (log-odds) and the S-curve (probability) side by side.
How the Confusion Matrix Reads Out From the Curve
The probability curve does not classify anything on its own — you need a threshold to convert probabilities into yes/no decisions. Draw a horizontal line at threshold = 0.5 across the S-curve. Every data point whose predicted probability falls above that line is classified positive; every point below is negative. The resulting counts of correct and incorrect decisions fill the four cells of the confusion matrix: TP, FP, TN, FN.
Sliding the threshold line up or down shifts where the S-curve crosses it, which changes the x-value boundary between the two classes. Move the threshold from 0.5 to 0.3 and you classify more observations as positive — TP goes up, but so does FP. Move it to 0.7 and FP drops, but FN rises because you are now demanding higher confidence before predicting positive. The curve visualiser lets you see this mechanically: the threshold line intersects the S-curve at a single x-value, and everything on one side gets one label.
This connection matters because stakeholders often ask “why did the model miss that case?” The answer is usually that the predicted probability was just below the threshold — visible on the curve as a point sitting barely under the horizontal line. Adjusting the threshold by a small amount would have caught it, at the cost of more false positives elsewhere.
Tuning the Decision Threshold for Precision or Recall
The default threshold of 0.5 treats false positives and false negatives as equally costly. In most real problems they are not. A disease screening tool should minimise missed cases (high recall), even if that means more healthy patients get follow-up tests (lower precision). A fraud alert system that pages an analyst at 2 a.m. should avoid false alarms (high precision), accepting that some low-confidence fraud slips through (lower recall).
To find the right threshold, compute precision and recall at several candidate values — 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8 — and build a table. Each row is a different confusion matrix. The operating point that best matches your cost structure is the one to deploy. There is no formula for this choice; it requires knowing the dollar cost (or human cost) of each error type.
A common trap: optimising the threshold on the training set. The sigmoid curve fit to training data is slightly overconfident, so a threshold tuned there will underperform on new data. Always tune on a held-out validation set, and report final metrics on a separate test set that was never used for any decision.
Odds Ratio vs. Probability — Two Views of the Same Model
The logistic model operates in three linked scales. The linear predictor z = β₀ + β₁x lives on the real line (−∞ to +∞). Exponentiate it and you get odds: eᵖ = P/(1−P), always positive. Apply the sigmoid and you get probability: P = 1/(1+e⁻ᵖ), bounded 0–1. Each scale answers a different question.
Odds ratios are how researchers report logistic regression results in journals: “each additional year of experience multiplies the odds of promotion by 1.15” means eᵝ₁ = 1.15. But decision-makers think in probabilities: “this candidate has a 68% chance of promotion.” The curve visualiser bridges the gap — you set the coefficients, and it shows the probability at any x-value, while the underlying odds ratio explains how much each unit of x shifts the odds.
One subtlety: the odds ratio is constant across all x-values (each unit increase in x multiplies odds by the same factor), but the probability change per unit of x is not constant. Near the midpoint of the S-curve, a 1-unit increase in x produces the largest probability jump. Near the tails, the same 1-unit increase barely moves the probability because the sigmoid flattens. This is why reporting “a 5-percentage-point increase in probability” requires specifying the baseline — the same coefficient produces different probability shifts at different starting points.
Logistic Curve Interpretation Mistakes
The slope β₁ is 0.5. Does that mean a 50% increase in probability per unit of x?
No. β₁ is in log-odds units, not probability units. A slope of 0.5 means each unit of x adds 0.5 to the log-odds — the equivalent probability change depends on where you are on the curve. At the midpoint it is roughly β₁/4 ≈ 12.5 percentage points; near the tails it is much smaller.
I set a negative intercept and the curve starts below 0.5. Is the model biassed?
Not necessarily. A negative intercept (β₀ < 0) means the baseline log-odds at x = 0 are negative, so the probability at x = 0 is below 50%. This is expected when the positive class is uncommon at the reference point. The midpoint of the curve simply shifts to x = −β₀/β₁, which may be far from zero.
Two models have different curves but similar AUC. Which is better?
AUC measures overall ranking ability, not calibration. One model might assign well-separated probabilities (steep curve) while the other assigns probabilities bunched near 0.5 (flat curve). If you need well-calibrated probabilities for risk scoring, the steeper curve is more useful even if AUC is identical. Check calibration plots alongside AUC.
Can I use this single-feature curve for a multi-feature model?
Only as a partial-effect visualisation. In a multi-feature model, the curve for one feature holds all other features at fixed values (often their means). Changing those held-out values shifts and reshapes the curve. The visualiser shows one slice of a higher-dimensional surface, not the full model.
Sigmoid, Log-Odds, and Probability Equations
Three equations define the logistic model:
Units note: β₀ and β₁ are in log-odds units. x can be in any measurement unit — the sigmoid converts everything to a dimensionless probability. The maximum slope of the probability curve occurs at the midpoint and equals β₁/4.
Lead-Scoring Model With Conversion Probability Curve
Scenario: A SaaS company models trial-to-paid conversion using a single feature: number of actions taken during the 14-day trial. The fitted logistic model has β₀ = −3.0 and β₁ = 0.12. Decision threshold is 0.5.
Step 1 — Midpoint.
Midpoint = −(−3.0) / 0.12 = 25 actions. At 25 actions the predicted conversion probability is exactly 50%. Below 25 the model predicts “will not convert”; above 25 it predicts “will convert.”
Step 2 — Probability at key points.
At 10 actions: z = −3.0 + 0.12×10 = −1.8, P = 1/(1+e¹⋅⁸) ≈ 0.14 (14%). At 40 actions: z = −3.0 + 0.12×40 = 1.8, P ≈ 0.86 (86%). The curve rises steeply between roughly 15 and 35 actions and flattens outside that range.
Step 3 — Odds ratio interpretation.
eᵝ₁ = e⁰⋅¹² ≈ 1.127. Each additional trial action multiplies the odds of conversion by about 1.13 — a 13% increase in odds per action, regardless of starting point. But the probability increase per action is largest near 25 actions (about 0.12/4 = 3 percentage points) and smaller near the extremes.
Step 4 — Business use.
The sales team targets leads with 15–25 actions — the steepest part of the curve where a nudge (webinar invite, feature walkthrough) could push the probability above the threshold. Leads below 10 actions are too cold; leads above 35 are likely converting on their own.
Sources
CMU Statistics — Logistic Regression: Sigmoid derivation, log-odds interpretation, and odds ratio properties.
scikit-learn — Logistic Regression: Implementation details, threshold tuning, and probability calibration methods.
NCBI — Interpreting Odds Ratios in Logistic Regression: Relationship between odds ratios, log-odds, and probability in applied research.
Penn State STAT 504 — Logistic Regression Model: Coefficient interpretation, curve shape, and threshold selection for binary outcomes.
Frequently Asked Questions
What do the intercept and slope parameters mean?
The intercept (β₀) is the log-odds when x = 0. It shifts the sigmoid curve left or right. A larger positive intercept shifts the curve left (higher probability at lower x values), while a negative intercept shifts it right. The slope (β₁) determines how steeply probability changes as x increases. A larger absolute slope means a sharper transition between low and high probability regions. Positive slopes create increasing probability curves; negative slopes create decreasing curves. Understanding this helps you see how parameters control the curve and why each parameter matters.
How do I interpret the probability output?
The probability P(y=1|x) represents the model's confidence that an observation belongs to the positive class given its x value. For example, P = 0.8 means the model predicts an 80% chance of the positive outcome. To make a binary prediction, compare this probability to your decision threshold (typically 0.5): if P > threshold, predict positive. Understanding this helps you see how to interpret probability outputs and make classifications.
What is the midpoint and why does it matter?
The midpoint is the x value where probability equals exactly 0.5—where the model is maximally uncertain. It's calculated as x = -β₀/β₁ (when slope ≠ 0). This is the 'decision boundary' in simple logistic regression. Below this x, the model predicts negative (if slope is positive); above it, the model predicts positive. Understanding where this boundary falls helps interpret the model and see where classifications change.
Why does my curve look almost flat?
A nearly flat curve occurs when the slope magnitude is very small (close to 0). This means x has little effect on the predicted probability—the model doesn't distinguish well between different x values. In practice, this might indicate that x is not a useful predictor for your outcome, or that you're viewing a narrow x-range where changes are subtle. Understanding this helps you see when curves appear flat and how to diagnose model behavior.
How should I choose the decision threshold?
The default threshold of 0.5 treats false positives and false negatives equally. In practice, adjust based on your costs: if missing a positive case is very costly (e.g., disease screening), lower the threshold. If false positives are costly (e.g., expensive interventions), raise it. Use ROC curves and domain knowledge to find the optimal threshold for your application. Understanding this helps you see how to choose appropriate thresholds and why threshold selection matters.
Can I use logistic regression with multiple features?
Yes! Real-world logistic regression typically uses multiple features: P(y=1|X) = 1/(1+e^-(β₀ + β₁x₁ + β₂x₂ + ...)). This visualizer shows single-feature models for educational purposes. With multiple features, the decision boundary becomes a hyperplane rather than a single point, and visualization requires dimensionality reduction or partial dependence plots. Understanding this helps you see when single-feature models are appropriate and when multiple features are needed.
What's the relationship between log-odds and probability?
Log-odds (also called logit) is log(P/(1-P)), where P is probability. The linear predictor β₀ + β₁x directly gives log-odds, not probability. The sigmoid function converts log-odds to probability. Log-odds can be any real number (-∞ to +∞), while probability is bounded (0 to 1). Each unit increase in x changes log-odds by β₁. Understanding this helps you see how log-odds and probability are related and why the sigmoid function is needed.
How is this different from linear regression?
Linear regression predicts continuous values and can produce any output. Logistic regression predicts probabilities bounded between 0 and 1. Linear regression minimizes squared error; logistic regression maximizes likelihood. Linear regression assumes normally distributed errors; logistic regression models binary outcomes. Use linear for 'how much?' questions, logistic for 'which category?' questions. Understanding this helps you see when to use each method and why they serve different purposes.
Does this tool train models from data?
No. This tool only visualizes a probability curve given user-provided coefficients (intercept and slope). It does NOT train models from data, estimate coefficients, evaluate model accuracy, or handle multiple features. Real logistic regression requires training data, coefficient estimation (maximum likelihood estimation), and model evaluation. Understanding this limitation helps you use the tool correctly and recognize when model training software is needed.
Is this tool suitable for medical diagnosis or credit decisions?
No. This is an educational visualization tool, not a validated predictive model. It does NOT provide medical diagnosis, credit scoring, or regulatory risk assessment. Real-world applications require proper model training, validation, calibration, regulatory compliance, and professional oversight. Always consult qualified professionals and use validated systems for medical, financial, or legal decisions. Understanding this limitation helps you use the tool for learning while recognizing that real applications require validated procedures and professional judgment.
Related Data Science Tools
Confusion Matrix Calculator
Analyze classification model performance with precision, recall, and F1 scores.
Correlation Matrix Visualizer
Upload a CSV and visualize correlations between numeric columns with a heatmap.
Feature Scaling & Normalization Helper
Apply Z-Score or Min-Max scaling to numeric features for ML preprocessing.
Dose-Response EC50 Estimator
Fit a 4-parameter logistic model to dose-response data and estimate EC50.
A/B Test Significance Calculator
Calculate statistical significance and lift for A/B test results.
Smoothing & Moving Average Calculator
Apply SMA, EMA, and WMA to time series data for trend analysis.
Explore More Data Science Tools
Build essential skills in data analysis, statistics, and machine learning concepts
Explore All Data Science & Operations Tools