Lecture
- Course overview
- Warming up
- Probabilities
- Probability distributions
- Cooling down
Tutorial
- R Markdown mini-tutorial
- Assignment 1
October 2, 2017
Lecture
Tutorial
Date | Time | Subject | Lecturer |
---|---|---|---|
2/10 | 9-13 | Probability | Abe & Alexander |
4/10 | 9-13 | Inference | Abe & Alexander |
6/10 | 9-13 | Validity | Abe & Alexander |
9/10 | 9-13 | Regression | Abe & Alexander |
10-13/10 | Self-study | ||
11/10 | 9-11 | Question time | Abe & Alexander |
16/10 | 9-13 | Exam |
Refreshment
Foundation
Intuition
Assignments
Exam
Grade must be ≥ 4
The Foundations of Statistics: A Simulation-based Approach (Shravan Vasishth & Michael Broe)
Sophie Berkhout
http://www.abehofman.com/onderwijs/ http://www.alexandersavi.nl/teaching/
Or anonymous:
Descriptive statistics: A description of some collected data (sample); e.g. the average age of ..
Inferential statistics: What are the properties of a population? The observed data is assumed to be sampled from a larger population. Thus we need population estimates and hypothesis testing (we have uncertainty).
Inferential statistics needs probabilities (Vasishth:)
"If we know what a ‘random’ distribution looks like, we can tell random variation from non-random variation. Specific individual cases are unpredictable, but they follow predictable laws in the aggregate. Once we learn to identify this ‘pattern of chance,’ we can confidently distinguish it from patterns that are not due to random phenomena."
It might rain this afternoon.
I'm pretty confident I'll pass this course.
The dikes are expected to prevent Amsterdam from flooding.
There is a slight chance of recovery.
Smoking can cause infertility.
It might rain this afternoon.
I'm pretty confident I'll pass this course.
The dikes are expected to prevent Amsterdam from flooding.
There is a slight chance of recovery.
Smoking can cause infertility.
There's uncertainty! Probabilities help us quantify it: 'patterns of change'.
The game of Pig.
Every round:
You keep the score from the previous round.
Who collects the most coins in 3 rounds?
Example: The probability of trowing a 1 with some dice: \(P(X=1)\)
Example: Select a third year PM student from all third year psychology students:
\(P(selected \ student = PM):\)
\[ \begin{aligned} P(X=PM) &= N_{PM} / N_{TOT} \\ &= 20 / 400 \\ &= 0.05 \end{aligned} \]
\(P(selected \ student = Male):\)
\[ \begin{aligned} P(X=Male) &= N_{Male} / N_{TOT} \\ &= 80 / 400 \\ &= 0.20 \end{aligned} \]
\(P(A \ or \ B) = P(A) + P(B)\)
Example dices:
\[ \begin{aligned} P(X=1 \ or \ X=2) &= P(X=1) + P(X=2) \\ &= 1/6 + 1/6\\ &= 1/3 \end{aligned} \]
Experiment: Select one student at random from all third year students: What is \(P(X = PM \ or \ X = SP)\)?
\(P(PM)\) Green; \(P(SP)\) Red
\[ \begin{aligned} P(PM \; or \; SP) &= (N_{PM} + N_{SP}) / N_{TOT} \\ &= (20 + 40) / 400 \\ &= 0.15 \end{aligned} \] \[ P(PM \; or \; SP) = P(PM) + P(SP) \]
This only holds if the events are (Mutually Exclusive; Two events cannot happen together).
If not, use a more general Sum Rule:
\[P(A \; or \; B) = P(A) + P(B) - P(A \; and \; B)\]
\[P(PM \; or \; SP) = P(PM) + P(SP) - P(PM \; and \; SP)\]
So, in some cases we need to know \(P(A \ and \ B)\), what brings us the Product Rule:
So, to calculate \(P(X = PM \ and \ X = Male)\) we need to understand (in)dependence and conditional probabilities
Example: Back to the lecture hall. \(P(X = PM)\) Green and \(P(X = Male)\) Blue:
What is \(P(X = PM \ and \ X = Male)\)? (independent?)
If so, \(P(X = PM \ and \ X = Male)\) are independent.
"In probability theory, conditional probability is a measure of the probability of an event given that (by assumption, presumption, assertion or evidence) another event has occurred." (Wikipedia)
\(P(X = Male) = 80/400\)
\(P(X = Male \ | \ X = PM) = 15/20\)
Research shows that: probability of selection is dependent on gender!
(Disclaimer: http://blog.casperalbers.nl/science/nwo-discriminatie-en-de-simpsonparadox)
http://students.brown.edu/seeing-theory/basic-probability/index.html#first
http://students.brown.edu/seeing-theory/compound-probability/index.html#third
"Can be thought of as providing the probability of occurrence of different possible outcomes in an experiment."
What would the probability distribution of a coin flip look like?
"Can be thought of as providing the probability of occurrence of different possible outcomes in an experiment."
What would the probability distribution of a coin flip look like?
toss <- sample(c("heads", "tails"), 10000, replace = TRUE) prob_dist <- table(toss) / length(toss) barplot(prob_dist)
A coin flip has a binomial distribution (two terms).
What might this coin flip stand for?
In psychology, we are often not interested in the precise ones and zeros in a series (mistakenly), but in a sum score. Here's a distribution of the sum scores of a Xhosa exam with 40 items.
How does such a distribution arise? The quincunx (or bean machine) simulates this.
Left or right: coin flip. Exit: sum score.
A demonstration in R (use R console, not RStudio):
install.packages("animation") library(animation) balls <- 500 layers <- 15 + 2 ani.options(interval = 0.05, nmax = balls + layers - 2) quincunx(balls = balls, layers = layers)
Quincunx shows the frequency distribution.
What differences do you expect if:
balls <- 20 balls <- 50 balls <- 1000
Will the probability distribution change?
Let's give it a closer look.
You take a Xhosa exam with 15 questions. Answer the following questions, and if possible use the quincunx.
You can do this in R
choose(15, 8) # (1) number of ways to score an 8 dbinom(8, 15, .5) # (2) probability of sum score 8 pbinom(8, 15, .5) # (3) probability of sum score 8 or less qbinom(.1, 15, .5) # (4) lowest sum scores with probability 10% or less
Or simulate the quincunx by sampling 500 sum scores from an infinite number of Xhosa exams
rbinom(n = 500, size = 15, prob = .5)
And you can do the calculations yourself. You'll learn it in Vasishth and do it in assignment 1.
"Can be thought of as providing the probability of occurrence of different possible outcomes in an experiment."
Discrete probability distributions
Continuous probability distributions
In The Joy of Stats, the late Hans Rosling talks about probability distributions and shows us how exciting even descriptive statistics can be.
Explore many probability distributions on Wikipedia.
Play with interactive visualizations of both discrete and continuous probability distributions on Seeing Theory.
Watch a clear tutorial on probability density functions by Joel Schneider.
Assuming you know nothing more about Alice, which of 1-5 is most likely? Or does 6 hold?
Think it through (1 minute) and write down your answer.
Assuming you know nothing more about Alice, which of 1-5 is most likely? Or does 6 hold?
Think it through (1 minute) and write down your answer.
Today we learned …
Next we'll learn …
descriptive statistics inferential statistics probability sum rule mutually exclusive events independence product rule conditional probability venn diagram discrete probability distribution continuous probability distribution binomial distribution quincunx binomial theorem normal distribution gamma distribution cauchy distribution central limit theorem sample mean standard deviation sampling distribution standard error law of large numbers variance confidence interval 1.96 standard error null hypothesis p-value p-value distribution test statistic t-value z-value student's t-test one-sample t-test two-sample t-test degrees of freedom one- and two-tailed tests statistical significance type 1 and type 2 errors significance level family-wise error rate multiple comparisons problem false discovery rate statistical power observed / predicted power lindley’s paradox effect size prediction / association least squares linear regression linear equation regression coefficients polynomial regression logistic regression explained variation errors and residuals model selection occam’s razor saturated model mean squared prediction error bias-variance trade-off overfitting / underfitting adjusted r-squared cross-validation information criterion statistical inference frequentist inference bayesian inference parametric statistics nonparametric statistics multicollinearity heteroscedasticity
What
Why
See syllabus and assignment.
Make sure to explain …
Deadline tomorrow 17:00.
Learn from your peers
But beware of plagiarism and the exam.
R language + Markdown language
Why?
You will use it for
install.packages("rmarkdown")
library(rmarkdown)
# Assignment 1
becomes header level 1## Question 3
becomes header level 2### 3a
becomes header level 3_this_
becomes this__this__
becomes thisLet's try!
But you can do without!
And much much more…