October 2, 2017

Today

Lecture

  • Course overview
  • Warming up
  • Probabilities
  • Probability distributions
  • Cooling down

Tutorial

  • R Markdown mini-tutorial
  • Assignment 1

Schedule

Date Time Subject Lecturer
2/10 9-13 Probability Abe & Alexander
4/10 9-13 Inference Abe & Alexander
6/10 9-13 Validity Abe & Alexander
9/10 9-13 Regression Abe & Alexander
10-13/10 Self-study
11/10 9-11 Question time Abe & Alexander
16/10 9-13 Exam

Goals

Refreshment

  • first- and second-year statistics
  • but different!

Foundation

  • play with fundamental concepts
  • create a basis for independent learning
  • recognize statistical terminology

Intuition

  • think like a statistician
  • avoid common pitfalls and fallacies

Assessment

Assignments

Exam

  • 1 hour
  • 2 parts: pen&paper and R
  • 2/3 of grade Basic Skills in Statistics

Grade must be ≥ 4

Literature

Teaching assistant

Sophie Berkhout

Syllabus

Dunglish

  • Course is taught in English
  • Not mother tongue for most of us
  • Don't hesitate to speak out though!
  • And help each another finding the right words.
  • Complex material, interaction is key

Feedback

Warming Up

What is statistics?

Descriptive statistics: A description of some collected data (sample); e.g. the average age of ..

Inferential statistics: What are the properties of a population? The observed data is assumed to be sampled from a larger population. Thus we need population estimates and hypothesis testing (we have uncertainty).

Inferential statistics needs probabilities (Vasishth:)

"If we know what a ‘random’ distribution looks like, we can tell random variation from non-random variation. Specific individual cases are unpredictable, but they follow predictable laws in the aggregate. Once we learn to identify this ‘pattern of chance,’ we can confidently distinguish it from patterns that are not due to random phenomena."

What is probability?

It might rain this afternoon.

I'm pretty confident I'll pass this course.

The dikes are expected to prevent Amsterdam from flooding.

There is a slight chance of recovery.

Smoking can cause infertility.

What is probability?

It might rain this afternoon.

I'm pretty confident I'll pass this course.

The dikes are expected to prevent Amsterdam from flooding.

There is a slight chance of recovery.

Smoking can cause infertility.

There's uncertainty! Probabilities help us quantify it: 'patterns of change'.

What is probability?

  • Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. Which is more probable?
    • Linda is a bank teller.
    • Linda is a bank teller and is active in the feminist movement.

What is probability?

  • Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. Which is more probable?
    • Linda is a bank teller.
    • Linda is a bank teller and is active in the feminist movement.
  • The odds of head or tail is 50%. I toss a coin 50 times, how often will it be head?
  • If you flip a coin twice, what are the odds of getting two heads?
  • And the odds of getting one head and one tail?

What is probability?

The game of Pig.

Every round:

  • All players start standing
  • I roll a die
  • The number adds up to your score (write it down)
  • If I roll a 1, the game is over and your score of this round is lost
  • If you sit down before I roll, you stop but keep your score

You keep the score from the previous round.

Who collects the most coins in 3 rounds?

Probabilities

What is probability? (more formally)

  • The probability of A: \(P(A)\), …
  • is the proportion of elements from some set that satisfy …
  • the condition A

Example: The probability of trowing a 1 with some dice: \(P(X=1)\)

What is probability (more formally)?

Example: Select a third year PM student from all third year psychology students:

What is probability (more formally)?

\(P(selected \ student = PM):\)

\[ \begin{aligned} P(X=PM) &= N_{PM} / N_{TOT} \\ &= 20 / 400 \\ &= 0.05 \end{aligned} \]

What is probability (more formally)?

\(P(selected \ student = Male):\)

\[ \begin{aligned} P(X=Male) &= N_{Male} / N_{TOT} \\ &= 80 / 400 \\ &= 0.20 \end{aligned} \]

Calculating with probabilities:

  • Before we can infer something, based on our collected data
  • we need to know something about probability theory:
  • How do we calculate with probabilities?
  • Two basic rules: Sum Rule and Product Rule
  • Two basic concepts: Dependent vs Independent Probabilities and Conditional Probabilities

Sum Rule (OR)

  • \(P(A \ or \ B) = P(A) + P(B)\)

  • Probabilty of multiple events is the sum of the probabilities of each individual event…
  • Only if these events are mutual exclusive (cannot happen at the same time)
  • Example dices:

\[ \begin{aligned} P(X=1 \ or \ X=2) &= P(X=1) + P(X=2) \\ &= 1/6 + 1/6\\ &= 1/3 \end{aligned} \]

Sum Rule

Experiment: Select one student at random from all third year students: What is \(P(X = PM \ or \ X = SP)\)?

\(P(PM)\) Green; \(P(SP)\) Red

Sum Rule

\[ \begin{aligned} P(PM \; or \; SP) &= (N_{PM} + N_{SP}) / N_{TOT} \\ &= (20 + 40) / 400 \\ &= 0.15 \end{aligned} \] \[ P(PM \; or \; SP) = P(PM) + P(SP) \]

Sum Rule

This only holds if the events are (Mutually Exclusive; Two events cannot happen together).

If not, use a more general Sum Rule:

\[P(A \; or \; B) = P(A) + P(B) - P(A \; and \; B)\]

\[P(PM \; or \; SP) = P(PM) + P(SP) - P(PM \; and \; SP)\]

Product Rule (AND)

So, in some cases we need to know \(P(A \ and \ B)\), what brings us the Product Rule:

  • \(P(A \ and \ B) = P(A) * P(B)\) (if A and B are independent)
  • \(P(A \ and \ B) = P(A) * P(B \ | \ A)\) (if A and B are dependent)
  • \(P(A \ and \ B) = P(B) * P(A \ | \ B)\) (if A and B are dependent)

So, to calculate \(P(X = PM \ and \ X = Male)\) we need to understand (in)dependence and conditional probabilities

(In)dependent events:

  • Independent events: "one event cannot influence the other’s outcome" (Vasishth)
  • "Two events, A and B, are independent if the fact that A occurs does not affect the probability of B occurring. Some other examples of independent events are: Landing on heads after tossing a coin AND rolling a 5 on a single 6-sided die" (Definition of mathgoodies.com)
  • Any other examples?

(In)dependent events:

Example: Back to the lecture hall. \(P(X = PM)\) Green and \(P(X = Male)\) Blue:

(In)dependent events & Cond. Probabilities:

What is \(P(X = PM \ and \ X = Male)\)? (independent?)

  • Is the probability of selecting a male the same as the probability of selecting a male given that he is a PM student?
  • Is \(P(X = Male) = P(X = Male \ | \ X = ML)\)?
  • Or Is \(P(X = PM) = P(X = PM \ | \ X = Male)\)?
  • If so, \(P(X = PM \ and \ X = Male)\) are independent.

  • "In probability theory, conditional probability is a measure of the probability of an event given that (by assumption, presumption, assertion or evidence) another event has occurred." (Wikipedia)

Conditional Probabilities

\(P(X = Male) = 80/400\)

\(P(X = Male \ | \ X = PM) = 15/20\)

Recap

  • Before we can do statistical inference we need to be able to calculate with probabilities (…we just have a sample of the population…)
  • Sum-Rule : What is \(P(A \ or \ B)\)?
  • Product-Rule : What is \(P(A \ and \ B)\)?
  • Independence?
  • Conditional probability: \(P(A \ | \ B)\)
  • Work on this on the assignment!
  • Tip: Venn diagrams often help!

Example (in)dependent events:

Example (in)dependent events (Venn diagram):

Dig deeper

Probability Distributions

Probability distributions

"Can be thought of as providing the probability of occurrence of different possible outcomes in an experiment."

What would the probability distribution of a coin flip look like?

Probability distributions

"Can be thought of as providing the probability of occurrence of different possible outcomes in an experiment."

What would the probability distribution of a coin flip look like?

toss <- sample(c("heads", "tails"), 10000, replace = TRUE)
prob_dist <- table(toss) / length(toss)
barplot(prob_dist)

Binomial distribution

A coin flip has a binomial distribution (two terms).

What might this coin flip stand for?

  • A Xhosa test item scored correct/incorrect.
  • A depression questionnaire item with true/false statement.
  • … (two terms)
  • Now what about a full test or questionnaire?

Binomial distribution

In psychology, we are often not interested in the precise ones and zeros in a series (mistakenly), but in a sum score. Here's a distribution of the sum scores of a Xhosa exam with 40 items.

How does such a distribution arise? The quincunx (or bean machine) simulates this.

Binomial distribution

Left or right: coin flip. Exit: sum score.

Binomial distribution

A demonstration in R (use R console, not RStudio):

install.packages("animation")
library(animation)
balls <- 500
layers <- 15 + 2
ani.options(interval = 0.05, nmax = balls + layers - 2)
quincunx(balls = balls, layers = layers)

Binomial distribution

Quincunx shows the frequency distribution.

What differences do you expect if:

balls <- 20
balls <- 50
balls <- 1000

Will the probability distribution change?

Binomial distribution

Let's give it a closer look.

You take a Xhosa exam with 15 questions. Answer the following questions, and if possible use the quincunx.

  • The sum score of the test is 8. How many possible ways can get you to that sum score? (1)
  • What is the probability of sum score 8? (2)
  • What is the probability of precisely this series (with sum score 8): 000010110011111?
  • What is the probability of sum score 8 or less? (3)
  • Which lowest sum scores have a probability of 10% or less? (4)
  • What are the two factors that determine these probabilities? (binomial theorem!)

Binomial distribution

You can do this in R

choose(15, 8)  # (1) number of ways to score an 8
dbinom(8, 15, .5)  # (2) probability of sum score 8
pbinom(8, 15, .5)  # (3) probability of sum score 8 or less
qbinom(.1, 15, .5)  # (4) lowest sum scores with probability 10% or less

Or simulate the quincunx by sampling 500 sum scores from an infinite number of Xhosa exams

rbinom(n = 500, size = 15, prob = .5)

And you can do the calculations yourself. You'll learn it in Vasishth and do it in assignment 1.

Probability distributions

Dig deeper

In The Joy of Stats, the late Hans Rosling talks about probability distributions and shows us how exciting even descriptive statistics can be.

Explore many probability distributions on Wikipedia.

Play with interactive visualizations of both discrete and continuous probability distributions on Seeing Theory.

Watch a clear tutorial on probability density functions by Joel Schneider.

Cooling Down

Did we learn something?

Assuming you know nothing more about Alice, which of 1-5 is most likely? Or does 6 hold?

  1. Alice is a rock star or she works in a bank.
  2. Alice is quiet and works in a bank.
  3. Alice is a rock star.
  4. Alice is honest and works in a bank.
  5. Alice works in a bank.
  6. There is no definite answer.

Think it through (1 minute) and write down your answer.

Did we learn something?

Assuming you know nothing more about Alice, which of 1-5 is most likely? Or does 6 hold?

  1. Alice is a rock star and works in a bank.
  2. Alice is quiet and works in a bank.
  3. Alice is quiet and reserved and works in a bank.
  4. Alice is honest and works in a bank.
  5. Alice works in a bank.
  6. There is no definite answer.

Think it through (1 minute) and write down your answer.

Where are we?

Today we learned …

  • how to calculate probabilities using the sum- and product-rules,
  • and what probability distributions are.

Next we'll learn …

  • how probability distributions enable us to test hypotheses,
  • the significant role of the central limit theorem,
  • and what confidence intervals are.

Where are we?

descriptive statistics inferential statistics probability sum rule mutually exclusive events independence product rule conditional probability venn diagram discrete probability distribution continuous probability distribution binomial distribution quincunx binomial theorem normal distribution gamma distribution cauchy distribution central limit theorem sample mean standard deviation sampling distribution standard error law of large numbers variance confidence interval 1.96 standard error null hypothesis p-value p-value distribution test statistic t-value z-value student's t-test one-sample t-test two-sample t-test degrees of freedom one- and two-tailed tests statistical significance type 1 and type 2 errors significance level family-wise error rate multiple comparisons problem false discovery rate statistical power observed / predicted power lindley’s paradox effect size prediction / association least squares linear regression linear equation regression coefficients polynomial regression logistic regression explained variation errors and residuals model selection occam’s razor saturated model mean squared prediction error bias-variance trade-off overfitting / underfitting adjusted r-squared cross-validation information criterion statistical inference frequentist inference bayesian inference parametric statistics nonparametric statistics multicollinearity heteroscedasticity

Assignment 1

What

  • assignment can be challenging
  • intention/wording of questions can be challenging
  • feedback: only correct/incorrect and the grade
  • answer sheet: examine the (in)correct answers!
  • statquiz: subscribe!

Why

  • learn by doing
  • practice makes perfect: it exploits the testing effect and gives superior retention
  • best preparation for exam
  • creating quiz questions requires you to have full understanding
  • we learn what is still poorly understood
  • quizzes are fun

Assignment 1

See syllabus and assignment.

Make sure to explain …

  • in your own words
  • what you do,
  • how you do it,
  • and why you do it.

Deadline tomorrow 17:00.

Excellerate your learning

Learn from your peers

  • working together is motivating
  • together you can correct your mind bugs
  • explaining helps you learn

But beware of plagiarism and the exam.

R Markdown Tutorial

What?

R language + Markdown language

  • (Interactive) documents
  • Presentations (you're looking at one)
  • Websites (such as Alexander's)
  • Books, journal manuscripts

Why?

  • Automated reporting
  • Reproducible research
  • Look like a pro

You will use it for

  • Assignments

How to create & compile

  • Update R and RStudio
  • Run install.packages("rmarkdown")
  • Run library(rmarkdown)
  • New File > R Markdown… > Document
  • Save
    • file.Rmd
  • Knit to HTML
    • file.html (submit this one)

How to author

  • Header 1: # Assignment 1 becomes header level 1
  • Header 2: ## Question 3 becomes header level 2
  • Header 3: ### 3a becomes header level 3
  • Italic: _this_ becomes this
  • Bold: __this__ becomes this
  • R code inline
  • R code block

Let's try!

Additional resources