Skip to main content

Transform Random Variables and Track Mean/SD

Convert between raw values, z-scores, and scaled variables. Explore how linear transformations (Y = aX + b) and min-max scaling change the mean, standard deviation, and distribution of a random variable. This is an educational tool, not for official score reporting or clinical use.

Last Updated: February 13, 2026

Linear transformations let you shift and scale data without distorting its shape—essential when standardizing test scores, converting temperature units, or preparing features for machine learning. A data analyst had exam scores with mean 72 and standard deviation 8. She wanted z-scores to compare students across different tests. Applying z = (x − 72) / 8, a score of 88 became z = 2.0, meaning two standard deviations above average. The common mistake is applying the same transformation to mean and variance: if Y = aX + b, then E[Y] = aE[X] + b, but Var[Y] = a²Var[X]—the additive constant b drops out of variance. When interpreting results, remember that z-scores preserve relative position: a z = +1.5 always means 1.5 standard deviations above the mean, regardless of the original scale.

Linear Transformations: aX + b Made Simple

A linear transformation Y = aX + b has two parts: a scales (stretches or compresses) the distribution, while b shifts it left or right. Temperature conversion is the classic example: Fahrenheit = 1.8 × Celsius + 32. The coefficient 1.8 expands the scale; the constant 32 shifts the zero point.

Linear transforms preserve shape. If X is normally distributed, so is Y. If X is skewed right, Y is skewed right too. Relative positions stay intact—if Alice scored higher than Bob before the transformation, she still scores higher afterward (assuming a > 0).

Correlation survives linear transformation unchanged (or flips sign if a < 0). This matters in regression and machine learning: scaling features doesn't destroy their predictive relationships.

Key formulas:

• E[aX + b] = a·E[X] + b

• Var[aX + b] = a²·Var[X]

• SD[aX + b] = |a|·SD[X]

Raw Score to Z-Score and Back

The z-score formula z = (x − μ) / σ centers data at zero and scales it to unit variance. Every z-score tells you how many standard deviations the original value sits from the mean. A z = −0.5 means half a standard deviation below average.

To reverse the transformation, use x = μ + zσ. If a standardized test reports z = 1.2 and you know the population mean is 500 with SD 100, the raw score is 500 + 1.2 × 100 = 620.

Z-scores make comparison easy. A z = 2.0 in math and z = 1.5 in English means the student performed relatively better in math, even if the raw scores were on completely different scales.

Normal distribution benchmarks: About 68% of values fall within z = ±1, 95% within z = ±2, and 99.7% within z = ±3.

How Mean and SD Change Under Scaling

Adding a constant shifts the mean but leaves the standard deviation untouched. If every student gets 5 bonus points, the class average rises by 5, but the spread stays the same.

Multiplying by a constant scales both mean and standard deviation. Doubling all values doubles the mean and doubles the SD. The coefficient a acts as a stretching factor.

Variance scales by a², not a. If you multiply X by 3, variance becomes 9 times larger, and SD becomes 3 times larger. This distinction trips up students who forget to square.

Example: X has μ = 50, σ = 10

Y = 2X + 5 → μ_Y = 2(50) + 5 = 105

σ_Y = |2|(10) = 20 (not 25!)

Unit Conversions That Preserve Statistics

Converting units is just a linear transformation. Meters to centimeters: multiply by 100 (a = 100, b = 0). Celsius to Fahrenheit: F = 1.8C + 32. These preserve shape and correlation while changing numeric scale.

Z-scores are unit-free. A z = 1.5 in meters or centimeters is still z = 1.5—the transformation to standard units wipes out the original scale. This is why z-scores are useful for cross-variable comparison.

Min-max scaling maps data to a target range like [0, 1]. The formula y = (x − x_min) / (x_max − x_min) puts the minimum at 0 and maximum at 1. Unlike z-scores, this bounds the output, which suits neural networks that expect inputs in [0, 1].

Caution: Min-max scaling is sensitive to outliers. One extreme value stretches the range and compresses everything else.

Limits: Nonlinear Transforms Need More

Linear rules only apply to Y = aX + b. If you take the log, square root, or any curved function, the simple formulas break down. E[log X] ≠ log E[X] in general—Jensen's inequality governs these cases.

Nonlinear transforms change distribution shape. A right-skewed income distribution becomes more symmetric after a log transformation. That's the point—you're reshaping, not just rescaling.

For nonlinear transforms, use simulation or the delta method (Taylor expansion) to approximate the new mean and variance. These techniques go beyond this tool's scope but are standard in advanced statistics.

When linear formulas fail: Y = X², Y = log(X), Y = 1/X, Y = e^X. Each requires distribution-specific or numerical methods.

Transformation Questions, Answered

Why does adding a constant not change variance?

Variance measures spread around the mean. Adding a constant shifts every value—including the mean—by the same amount. The distances from each point to the mean stay unchanged, so variance stays unchanged.

Can z-scores be negative?

Yes. A negative z-score means the value is below the mean. Half of all values in a symmetric distribution have negative z-scores. There's nothing unusual or wrong about z = −1.3.

What if I multiply by a negative constant?

The distribution flips direction. If Y = −X, high values become low and vice versa. Standard deviation uses |a|, so SD stays positive. Correlation with other variables changes sign.

How do I rescale to a new mean and SD?

First standardize: z = (x − μ_old) / σ_old. Then scale to the new parameters: y = μ_new + z × σ_new. This two-step process works for any target mean and standard deviation.

Does standardization make data normal?

No. Z-scores shift and scale but don't change shape. If the original data is skewed, the z-scores are skewed too. To induce normality, you need a nonlinear transform like Box-Cox or rank-based normalization.

Limitations & Assumptions

• Linear Only: Formulas apply to Y = aX + b. Log, square, exponential, and other nonlinear transforms require different methods (simulation, delta method, or exact distributional theory).

• Known Parameters: Accurate results depend on correct μ and σ inputs. Sample estimates carry uncertainty not reflected in single-point transformations.

• Educational Purpose: This tool demonstrates formulas, not production data pipelines. For ML feature scaling, use libraries like scikit-learn that handle train/test separation properly.

• Min-Max Sensitivity: Min-max scaling is highly sensitive to outliers. One extreme value can distort the entire scaled range.

Disclaimer: This calculator demonstrates linear transformation concepts for learning purposes. For official psychometric scoring, clinical assessments, or production ML pipelines, use validated tools and consult domain experts.

Sources & References

Methods follow standard probability and statistics references:

Frequently Asked Questions

Common questions about random variable transformations, z-score standardization, linear transformations, min-max scaling, and how to use this calculator for homework and data transformation practice.

What is a z-score and why do we use it?

A z-score (or standard score) tells you how many standard deviations a value is from the mean. The formula is z = (x − μ) / σ. Z-scores are useful because they put different variables on a common scale, making it possible to compare values from different distributions. For example, you can compare a test score to a height measurement because both are expressed in standard deviation units.

What does a linear transformation Y = aX + b do to the distribution?

A linear transformation changes the location and scale of a distribution but preserves its shape. Specifically: the new mean becomes E[Y] = a × E[X] + b, the new standard deviation becomes SD[Y] = |a| × SD[X], and the variance becomes Var[Y] = a² × Var[X]. The shape (e.g., normal, skewed) remains unchanged, and the relative positions of values (z-scores) are preserved.

When should I use min–max scaling vs z-scores?

Use z-score standardization when you want to express values in terms of how far they are from the mean (in standard deviation units), especially for statistical analyses or when comparing across different scales. Use min–max scaling when you need values bounded to a specific range (like 0–1 for neural network inputs or 0–100 for percentage-like displays). Note that min–max scaling is sensitive to outliers since extreme values define the range.

Does scaling change correlations or regression results?

Linear transformations preserve Pearson correlation coefficients. If X and Y have correlation r, and you transform X to X' = aX + b (with a ≠ 0), the correlation between X' and Y is still r (or −r if a < 0). Similarly, regression slopes change predictably: if you rescale both X and Y linearly, you can convert the regression equation accordingly. However, non-linear transformations (like log or square root) do change correlations.

Is this calculator suitable for official exam score conversions?

No. This tool is for educational purposes only—to help you understand how linear transformations work and to practice with numeric examples. Official score conversions (like standardized test scores, curved grades, or psychometric scaling) require validated procedures, proper norm samples, and often more sophisticated methods than simple linear transformations. Always consult official guidelines and qualified professionals for real-world applications.

What happens to the z-scores when I do a linear rescale?

Z-scores are preserved under linear transformations. If a value x has z-score z_X in the original distribution, and you transform it to y = aX + b with new mean μ_Y and new standard deviation σ_Y, then the z-score of y in the new distribution is still z_X. This is because linear transformations shift and stretch the entire distribution uniformly.

Why does p = 0.5 or mid-values need the most data in some contexts?

For proportions, the variance p(1−p) is maximized at p = 0.5. This means uncertainty is greatest when the population is evenly split. For sample size calculations with proportions, assuming p = 0.5 gives the most conservative (largest) sample size. Similarly, in normal distributions, values near the mean are most common but distinguishing small differences requires more data than distinguishing extreme values.

Can I use this for feature scaling in machine learning?

This tool demonstrates the formulas correctly, but for real machine learning pipelines, you should use dedicated libraries (like scikit-learn's StandardScaler or MinMaxScaler) that properly fit on training data and transform test data consistently. The principles are the same, but ML workflows require careful separation of training and test sets to avoid data leakage.

Z-Score & Scaling Tool: Transform Mean and SD