Skip to main content

Logistic Regression Predictions With Sigmoid Intuition

Set coefficients and input features to see how logistic regression converts them into probabilities and classifications. This is a demonstration tool for learning — not for real-world decisions.

Last Updated: February 13, 2026

Logistic regression models P(y = 1 | x), a probability in [0, 1], by passing a linear predictor through the sigmoid. The model is logit(p) = ln(p / (1 − p)) = β₀ + β₁x₁ + ⋯ + βₖxₖ, which inverts to p = 1 / (1 + e^−(β₀ + β·x)). The output is a probability, not a class label. Classification comes from picking a threshold (default 0.5) and predicting 1 when p ≥ threshold.

The interpretation trap: a coefficient β = 0.5 does not mean "probability rises by 0.5 per unit of x." It means log-odds rise by 0.5, equivalent to multiplying the odds by e^0.5 ≈ 1.65. The probability change depends on where you sit on the sigmoid: small near the tails, largest near p = 0.5. For inference, report odds ratios e^β. For prediction, report p evaluated at concrete x. And for imbalanced classes, 0.5 is rarely the right threshold. Pick it from the ROC curve based on the cost ratio of false positives to false negatives in your application.

Logit to Probability: The Sigmoid Explained

Logistic regression first computes a linear predictor z = β₀ + β₁x₁ + β₂x₂ + … where βs are coefficients and xs are feature values. This z can range from negative infinity to positive infinity—it's the log-odds of the positive class.

The sigmoid function σ(z) = 1 / (1 + e^(−z)) transforms z into a probability between 0 and 1. Large positive z yields probability near 1; large negative z yields probability near 0; z = 0 yields exactly 0.5. The S-shaped curve is steepest at the midpoint—small changes in z produce the largest probability shifts when probability hovers around 0.5.

Understanding the sigmoid matters because it shows why coefficient effects are non-linear in probability terms. A one-unit increase in a feature always adds the same amount to z (the linear predictor) but adds a variable amount to probability depending on where you sit on the curve.

p = 1 / (1 + e^(−z))

where z = β₀ + β₁x₁ + β₂x₂ + … (the log-odds)

Enter Coefficients and See Class Probabilities

Provide the intercept (β₀) and coefficients for each feature. In real applications, these come from training a model on labeled data via maximum likelihood estimation. Here, you enter them manually to explore how changes affect predicted probabilities.

Then input feature values (x₁, x₂, …). The tool computes z, applies the sigmoid, and reports the probability of class 1. The probability of class 0 is simply 1 − p.

Experiment by varying feature values while holding coefficients fixed. Watch how probability responds—a large positive coefficient on a feature makes probability climb steeply as that feature increases; a negative coefficient makes probability fall.

Threshold Tuning: Precision vs Recall Intuition

A classification threshold converts probability into a binary decision. The default 0.5 means "classify as positive if p ≥ 0.5." But 0.5 isn't always optimal—it depends on the cost of errors.

Lower the threshold (e.g., 0.3) when missing positives is expensive. Medical screening for a serious disease might use a low threshold to catch more true cases, accepting more false alarms. Raise the threshold (e.g., 0.7) when false positives are costly—spam filtering on critical inboxes, for instance, where mislabeling a legitimate email hurts more than letting some spam through.

Threshold selection trades off precision (of those labeled positive, how many truly are) against recall (of all true positives, how many did we catch). Moving the threshold doesn't change the underlying probability estimate—only the decision boundary.

Practical tip: In production, analysts use ROC curves or precision-recall curves to pick a threshold that balances business objectives. This tool lets you see how threshold shifts change the classification for a single observation.

Odds Ratios: What a Coefficient Means

Odds are p / (1 − p). If probability of success is 0.8, odds are 0.8 / 0.2 = 4 (four-to-one in favor). The coefficient β in logistic regression tells you how much the log-odds change for a one-unit increase in x.

The odds ratio OR = e^β translates that to a multiplicative effect on odds. If β = 0.5, OR ≈ 1.65, meaning odds increase by 65% per unit increase in x. If β = −0.3, OR ≈ 0.74, meaning odds multiply by 0.74 (a 26% decrease).

Odds ratios are often easier to communicate than raw coefficients. "Each additional year of experience multiplies the odds of promotion by 1.3" is more intuitive than "the coefficient on experience is 0.26."

OR = e^β

OR > 1 → increased odds | OR < 1 → decreased odds | OR = 1 → no effect

Limitations: Not a Full Training Tool

This calculator demonstrates how logistic regression makes predictions given coefficients you supply. It does not estimate coefficients from data—that requires maximum likelihood estimation on a labeled dataset, typically done in Python (scikit-learn), R, or specialized software.

Real models also need validation: train/test splits, cross-validation, calibration checks, and diagnostic plots (ROC, confusion matrices). Coefficients have standard errors and confidence intervals that this tool doesn't compute.

Treat outputs here as educational explorations, not production predictions. For any consequential decision—loan approvals, medical diagnoses, hiring—use validated models built by qualified professionals with proper data governance.

Warning: Never use manually entered coefficients for real decisions. Real coefficients come from careful model training and must include uncertainty estimates.

Logit Predictor Questions

Why doesn't a coefficient of 0.5 add 0.5 to probability?

Coefficients operate on log-odds, not probability. The sigmoid function's curvature means the same log-odds shift produces different probability changes depending on where you start. Near p = 0.5, small log-odds changes have big probability effects; near p = 0 or 1, they have small effects.

What does a negative coefficient mean?

It means increasing that feature decreases the probability of the positive class. For example, a negative coefficient on "days since last purchase" in a repeat-buyer model indicates longer gaps associate with lower likelihood of buying again.

Can I compare coefficients across features?

Only if features are on comparable scales. A coefficient of 2.0 on a feature measured in thousands differs in meaning from 2.0 on a feature measured in single units. Standardize features (z-scores) first if you need to compare relative importance.

How do I know if my threshold is correct?

Threshold choice depends on costs. If false negatives are expensive, lower the threshold. If false positives are expensive, raise it. ROC curves and precision-recall analysis help find the sweet spot for your application.

Why can probability never actually reach 0 or 1?

The sigmoid function approaches but never touches 0 or 1. Mathematically, you'd need z = −∞ or z = +∞. In practice, probabilities get close enough (e.g., 0.0001 or 0.9999) that the distinction rarely matters.

Limitations of this logistic predictor

Linearity in log-odds: the model assumes logit(p) is a linear function of the features. If the true relationship has a non-monotonic shape, no choice of coefficients will recover it. Consider GAMs, splines, or tree-based models.

Coefficients as input, not estimated: this page takes β values rather than fitting them. Real coefficients come from MLE on a labeled training set, with standard errors and CIs attached. Use R's glm(family=binomial) or sklearn.linear_model.LogisticRegression.

Separation: complete or quasi-complete separation (one feature perfectly predicts the outcome) sends MLE coefficients to ±∞. If you see suspiciously huge β values, separation is the likely cause. Firth's penalized likelihood handles it.

Threshold choice: 0.5 is rarely the right cutoff for imbalanced classes. Pick from the ROC curve based on the cost ratio of false positives to false negatives in your application.

Note: For real model training, sklearn.linear_model.LogisticRegression with solver='lbfgs' is the workhorse in Python. statsmodels exposes inference (standard errors, p-values) more cleanly. R's glm(family=binomial) is the standard equivalent. ISLR Chapter 4 is the canonical introductory reference.

Sources & References

Formulas and interpretation guidelines follow standard machine learning references:

Logistic regression: working questions

How do I interpret an odds ratio?

OR = e^β. A coefficient of β = 0.7 in logistic regression means the odds of the positive outcome multiply by e^0.7 ≈ 2.01 for every one-unit increase in that predictor, holding others constant. OR = 1 means no effect, OR &gt; 1 increases odds, OR &lt; 1 decreases. The asymmetry around 1 trips people: OR = 2 (doubling) and OR = 0.5 (halving) are equal-magnitude effects on the log-odds scale (β = ±0.69). For interpretation in publications, also report the 95% CI on the OR.

Why does logistic regression use log-odds?

To map a probability bounded in [0, 1] to an unbounded scale where linear modeling makes sense. Probability is hard to model linearly because it can't go above 1 or below 0; log-odds (logit) ranges from −∞ to +∞. The model logit(p) = β₀ + β₁x₁ + ⋯ inverts to p = 1 / (1 + e^−(β₀ + β·x)), the sigmoid. The choice of logit isn't unique. Probit regression uses Φ⁻¹ instead and gives nearly identical predictions; logit dominates because the math is cleaner and the coefficients are interpretable as log-odds.

How do I pick a classification threshold?

0.5 is the default but rarely optimal. The right threshold depends on the cost ratio of false positives to false negatives. For medical screening where missing a disease is bad, lower the threshold to catch more cases. For spam filtering where false-positives anger users, raise it. The ROC curve plots TPR vs FPR across thresholds; pick the point that minimizes your application-specific cost. Youden's J (TPR − FPR) maximized gives a balanced default. For imbalanced classes, the F1 or precision-recall curve is more informative than ROC.

What is separation and how do I detect it?

Complete or quasi-complete separation: a single feature (or combination) perfectly predicts the outcome. The MLE coefficient runs to ±∞ and the standard error explodes. R will warn "fitted probabilities numerically 0 or 1 occurred." Detection: check coefficient magnitudes (β̂ &gt; 10 is suspicious), and look for predictors where one value never coincides with the other class. Fix: Firth's penalized likelihood (R's logistf package) handles separation cleanly. Bayesian logistic regression with weakly informative priors does the same job. Don't drop the predictor; that often loses real signal.

How do I evaluate logistic regression accuracy?

Accuracy alone is misleading for imbalanced classes (90% accuracy is trivial when the base rate is 90%). Better metrics: AUROC (area under ROC), which is invariant to threshold choice; log-loss, which directly penalizes overconfident wrong predictions; Brier score for calibration. Confusion matrix at the chosen threshold gives precision, recall, F1. For probabilistic predictions, calibration plots (predicted probability vs observed frequency in bins) tell you whether your probabilities are honest or over/under-confident. R's pROC and Python's sklearn.metrics handle all of these.

Multinomial vs binary logistic, when do I need each?

Binary for two-class outcomes (yes/no, pass/fail). Multinomial for k &gt; 2 unordered categories (which color, which brand). Ordinal logistic (proportional odds model) for ordered categories (mild/moderate/severe). For multinomial, you fit k − 1 logit equations against a reference class. Coefficients are then log-odds ratios for that category vs the reference. R's nnet::multinom() fits multinomial; MASS::polr() fits ordinal. sklearn's LogisticRegression handles multinomial via multi_class='multinomial'.

How does logistic regression handle interactions?

Same way as linear regression: include the product term explicitly. R: glm(y ~ x1 * x2, family = binomial) expands to x1, x2, and x1·x2. Interpretation gets harder because the marginal effect of x1 now depends on x2. To report cleanly, predict probabilities at meaningful (x1, x2) combinations rather than reading the interaction coefficient directly. The marginaleffects R package and statsmodels' margeff in Python compute average marginal effects for you.

What's a good sample size for logistic regression?

The classic rule: at least 10 events per predictor (EPV ≥ 10), where "events" means observations in the minority class. Logistic regression on 50 yes/50 no cases supports about 5 predictors max under that rule. Vittinghoff and McCulloch (2007) argue EPV = 5-10 is fine in many cases. For more aggressive inference (small p-values, narrow CIs), aim higher. Penalized methods (LASSO, ridge, Firth) work in lower-EPV regimes by trading some bias for stability. Below EPV = 5, treat coefficient point estimates with serious skepticism.