Part 4: Probability and Statistics

Part of the Mathematics for Programming 101 Series

The A/B Test That Almost Cost Us

We ran an A/B test on a new checkout flow. Test group showed 8% higher conversion.

Control: 152/1000 = 15.2% conversion
Test: 165/1000 = 16.5% conversion
Improvement: 8.6%

Product wanted to ship immediately. "The numbers don't lie," they said.

I ran a statistical significance test. P-value: 0.23

Translation: 23% chance this result is random noise. Not significant at all.

We didn't ship. Ran the test for two more weeks. Final result: 0.3% difference, not statistically significant.

That's when I learned: Statistics isn't about collecting numbersβ€”it's about making correct decisions under uncertainty.

Probability: Quantifying Uncertainty

The Basics

Probability measures how likely something is to happen, on a scale from 0 (impossible) to 1 (certain).

import numpy as np
from collections import Counter

# Simulate rolling a fair die
def roll_die(n_rolls=1000):
    """Simulate rolling a six-sided die"""
    return np.random.randint(1, 7, size=n_rolls)

rolls = roll_die(10000)
probabilities = Counter(rolls)

print("Probability of each outcome:")
for outcome in sorted(probabilities.keys()):
    prob = probabilities[outcome] / len(rolls)
    print(f"{outcome}: {prob:.3f} (expected: 0.167)")

# Law of large numbers: more rolls β†’ closer to theoretical probability
for n in [100, 1000, 10000, 100000]:
    rolls = roll_die(n)
    prob_six = np.sum(rolls == 6) / n
    print(f"{n:6d} rolls: P(6) = {prob_six:.4f}")

Probability Distributions

Distributions describe how probabilities are spread across possible values.

Bernoulli Distribution (yes/no events)

Binomial Distribution (multiple yes/no trials)

Normal Distribution (bell curve)

Real Application: Anomaly Detection

Bayesian Thinking

Update beliefs based on evidence.

Bayes' Theorem

Translation: Probability of A given B = How well A explains B Γ— Prior belief in A / Probability of seeing B

Real Example: Spam Classification

Hypothesis Testing and A/B Tests

The Right Way to Do A/B Testing

Sample Size Calculation

Correlation vs Causation

Confidence Intervals

Key Takeaways

  • Probability quantifies uncertainty in systems

  • Distributions model different types of random events

  • Bayesian thinking updates beliefs based on evidence

  • Hypothesis testing makes statistically sound decisions

  • A/B testing requires proper sample sizes and significance tests

  • Correlation β‰  Causation - always check for confounding variables

  • Confidence intervals quantify estimation uncertainty

What's Next

In the next article, we'll explore discrete mathematicsβ€”the foundation of algorithms, data structures, and computational thinking.

You'll learn:

  • Set theory and data structures

  • Logic and Boolean algebra

  • Combinatorics and counting

  • Recurrence relations and recursion

Continue to Part 5: Discrete Mathematics β†’


Last updated