Probability and Statistics

Solving Old Problems and New

Today

  • See a few examples of how probability and Statistics has been used to solve some interesting and important problems!

  • Give you some historical context for how these ideas developed

  • Get to try some of them yourself!

  • Please ask questions!

Why Statistics/Data Science/Actuarial Science

  • Data is everywhere, statistics is the way to understand that data

  • Get to work on interesting problems, sports, medicine, animal behavior, astronomy, finance, and lots more

  • Statistics/Data/Actuarial gives you practical skills that you will use over and over again

How do you estimate a population from a sample

Getting estimates from a small sample

  • How would you estimate how many Canadians have had Covid in the last 6 months?

  • Can’t ask more than 35 million people!

  • Can you get a good estimate from a much smaller group?

A simple estimate

  • You ask 100 “random” people. 35% had Covid in the last 6 months.

  • Is \(38.25 \times \frac{35}{100} = 13.4\ \text{Million}\) a good estimate?

  • How would you be able to check?

Lets do a simple example

  • Suppose we wanted to estimate how many people would vote a certain way in an election

  • We’ll do a simple version of this, where we want to count how many triangles are in a random area on a page

  • Think of the page as the population in a town. Too slow to ask everyone so we just take a sample of 10% (each square). 10 times the number in the square is an estimate for total number of triangles in the town

  • Do with a partner, record your results

Lets see the results

Lets see the true value

Making this harder

  • No we’re going to do this again on the next page

  • Pick one square and repeat, tell me your answer again

The results

What this means

  • It can be hard to get a sample that actually represents the population well!

  • This is even harder with people, who are all very different

  • Different types of people live in different areas

  • If you just pick a random area, you may not get a representative sample

  • This is why it is so hard to predict elections

Doing this in the real world

Who knows what this is?

  • Previously, ring random numbers in the phonebook to get a random sample

  • Some newer ways of doing this, like using XBox users to estimate who will win the US presidential election

Using Random Numbers to estimate \(\pi\)

Some history

  • \(\pi\) is a pretty weird number with some interesting properties

  • How to actually compute \(\pi\) has been studied for centuries

  • Original example related to French aristocrats gambling (Buffon’s Needle)

  • Will first see a very simple example, using random numbers

Throwing Darts at a Board

  • Suppose you have a square board with a circle inscribed inside it

  • If you throw a dart at this square, what is the probability it will land inside the circle?

  • If the length of the sides of the square is 1, it has area…

  • Area of Square = 1

  • If the circle fits exactly inside it, it has radius and area…

  • Radius = 0.5 and area \(\frac{\pi}{4}\)

  • What’s the ratio of the area of the circle to the square?

Getting \(\pi\)

  • So if the dart is thrown randomly at the board, the probability it ends up inside the circle is

\[ \text{Probability Inside Circle}\ = \frac{\pi}{4} \]

Random Numbers

  • To solve this we need to generate random darts hitting the board

  • Computers actually can’t compute random things easily

  • Instead compute numbers that are pseudo-random, that appear random to us

  • Can you do better than a computer?

  • Try get an estimate of \(\pi\) by creating random darts

Experiment

  • Try to drop a pen at random on the page

  • Only count those inside the square

  • Count how many inside the circle, how many total

\[ \frac{\text{Number inside Circle}}{\text{Total in Square}}\approx \frac{\pi}{4} \]

An Example

An Example

The Original Problem

The original problem is from George-Louis Leclerc, Comte de Buffon

Suppose we have a floor made of parallel strips of wood, each the same width, and we drop a needle onto the floor. What is the probability that the needle will lie across a line between two strips?

Buffon’s Needle

What the problem looks like Buffon

The probability

The probability a needle crosses one of the gray strips is

\[ P = \int_{\theta=0}^{\pi/2}\int_{x=0}^{(l/2)\cos\theta}\frac{4}{t\pi}dxd\theta = \frac{2l}{t\pi} \]

  • So if we do this \(n\) times and \(h\) needles cross the line then \(P=\frac{h}{n}\) and

\[ \pi \approx \frac{2ln}{th}. \]

Using this

  • In 1901 an Italian math teacher claimed to have done this 3400 times and estimated \(\pi\) to 6 decimal places

  • You can actually use statistics to show he very likely cheated!

Random Numbers now

  • Used to solve lots of difficult problems in machine learning and statistics

  • The way pseudo-random numbers are generated uses math from the 1600s

  • Still very hard to get truly random numbers!

Keeping Data Private

How to disclose sensitive information safely

  • Suppose we wanted to estimate what proportion of people here, \(p\), have evaded the skytrain/bus fare?

  • You might not be comfortable answering that question!

  • Is there a way to estimate the proportion of people who hop the fare gate, while keeping your data private?

A simple Rule

How to estimate p

Can we estimate p?

  • You might answer Yes even if that’s not your true answer

  • If you answer yes, no one will know for certain if its the truth

  • We can still get an estimate of the true answer

Some quick math

\[ P(\text{Answer Yes}) = P(\text{Flipped Heads and Yes}) + P(\text{Flipped Tails}) \]

\[ P(\text{Flipped Heads and Yes}) = P(\text{Flipped Heads})\times P(\text{Yes}) \]

\[ P(\text{Answer Yes}) = \frac{1}{2}\times p + \frac{1}{2} \]

  • Solve for \(p= 2P(\text{Answer Yes})-1\)

Modern Uses

  • This might seem like a very simple idea, but is essentially what is known as differential privacy

  • Lots of famous examples where people were identified by their anonymous data

  • Every big tech company (Apple, Facebook, Google) uses differential privacy to store your data now

Wrap Up

  • Statistics comes up with ways to solve real world problems involving data

  • The skills in a statistics/data science/actuarial degree will always be useful

  • Questions?