Solving Old Problems and New
See a few examples of how probability and Statistics has been used to solve some interesting and important problems!
Give you some historical context for how these ideas developed
Get to try some of them yourself!
Please ask questions!
Data is everywhere, statistics is the way to understand that data
Get to work on interesting problems, sports, medicine, animal behavior, astronomy, finance, and lots more
Statistics/Data/Actuarial gives you practical skills that you will use over and over again
How would you estimate how many Canadians have had Covid in the last 6 months?
Can’t ask more than 35 million people!
Can you get a good estimate from a much smaller group?
You ask 100 “random” people. 35% had Covid in the last 6 months.
Is \(38.25 \times \frac{35}{100} = 13.4\ \text{Million}\) a good estimate?
How would you be able to check?
Suppose we wanted to estimate how many people would vote a certain way in an election
We’ll do a simple version of this, where we want to count how many triangles are in a random area on a page
Think of the page as the population in a town. Too slow to ask everyone so we just take a sample of 10% (each square). 10 times the number in the square is an estimate for total number of triangles in the town
Do with a partner, record your results
No we’re going to do this again on the next page
Pick one square and repeat, tell me your answer again
It can be hard to get a sample that actually represents the population well!
This is even harder with people, who are all very different
Different types of people live in different areas
If you just pick a random area, you may not get a representative sample
This is why it is so hard to predict elections
Who knows what this is?
Previously, ring random numbers in the phonebook to get a random sample
Some newer ways of doing this, like using XBox users to estimate who will win the US presidential election
\(\pi\) is a pretty weird number with some interesting properties
How to actually compute \(\pi\) has been studied for centuries
Original example related to French aristocrats gambling (Buffon’s Needle)
Will first see a very simple example, using random numbers
Suppose you have a square board with a circle inscribed inside it
If you throw a dart at this square, what is the probability it will land inside the circle?
If the length of the sides of the square is 1, it has area…
Area of Square = 1
If the circle fits exactly inside it, it has radius and area…
Radius = 0.5 and area \(\frac{\pi}{4}\)
What’s the ratio of the area of the circle to the square?
\[ \text{Probability Inside Circle}\ = \frac{\pi}{4} \]
To solve this we need to generate random darts hitting the board
Computers actually can’t compute random things easily
Instead compute numbers that are pseudo-random, that appear random to us
Can you do better than a computer?
Try get an estimate of \(\pi\) by creating random darts
Try to drop a pen at random on the page
Only count those inside the square
Count how many inside the circle, how many total
\[ \frac{\text{Number inside Circle}}{\text{Total in Square}}\approx \frac{\pi}{4} \]
The original problem is from George-Louis Leclerc, Comte de Buffon
Suppose we have a floor made of parallel strips of wood, each the same width, and we drop a needle onto the floor. What is the probability that the needle will lie across a line between two strips?
The probability a needle crosses one of the gray strips is
\[ P = \int_{\theta=0}^{\pi/2}\int_{x=0}^{(l/2)\cos\theta}\frac{4}{t\pi}dxd\theta = \frac{2l}{t\pi} \]
\[ \pi \approx \frac{2ln}{th}. \]
In 1901 an Italian math teacher claimed to have done this 3400 times and estimated \(\pi\) to 6 decimal places
You can actually use statistics to show he very likely cheated!
Used to solve lots of difficult problems in machine learning and statistics
The way pseudo-random numbers are generated uses math from the 1600s
Still very hard to get truly random numbers!
Suppose we wanted to estimate what proportion of people here, \(p\), have evaded the skytrain/bus fare?
You might not be comfortable answering that question!
Is there a way to estimate the proportion of people who hop the fare gate, while keeping your data private?
How to estimate p
You might answer Yes even if that’s not your true answer
If you answer yes, no one will know for certain if its the truth
We can still get an estimate of the true answer
\[ P(\text{Answer Yes}) = P(\text{Flipped Heads and Yes}) + P(\text{Flipped Tails}) \]
\[ P(\text{Flipped Heads and Yes}) = P(\text{Flipped Heads})\times P(\text{Yes}) \]
\[ P(\text{Answer Yes}) = \frac{1}{2}\times p + \frac{1}{2} \]
This might seem like a very simple idea, but is essentially what is known as differential privacy
Lots of famous examples where people were identified by their anonymous data
Every big tech company (Apple, Facebook, Google) uses differential privacy to store your data now
Statistics comes up with ways to solve real world problems involving data
The skills in a statistics/data science/actuarial degree will always be useful
Questions?
Owen Ward