Inference

October 4, 2017

Assignment 1

Pub quiz prize

Ever needed to use the sample function,
without a computer within reach?
And in the dark?

The author of the best question* wins a analogue light-emitting sample function!
*Largest spread across answer options

No correspondence about the result can be entered into.

Pub quiz

Make pairs
Login to kahoot.it or download the app
Enter the game pin

Today

Lecture

Warming up
Sampling distributions
Confidence intervals
Cooling down

Tutorial

Assignment 2

Warming Up

Where are we?

Last time we learned …

how to calculate probabilities using the sum- and product-rules,
and what probability distributions are.

Today we'll learn …

how probability distributions enable us to test hypotheses,
the significant role of the central limit theorem,
and what confidence intervals are.

What is (statistical) inference?

"Statistical inference makes propositions about a population, using data drawn from the population with some form of sampling."

How can we calculate how likely our observed data is?
- sampling distributions
- central limit theorem
How do we quantify the expected variation in our estimation procedure?
- confidence intervals

Sampling Distributions

Sample mean

Imagine that the mean psychology student's perseverance is '20'. In a sample of PML students a mean of 26 is found.

How do we know how (un)likely this sample mean is in relation to the population of psychology students?

Sampling distribution

Which sample means can we expect? We take (or simulate) many samples from the population, and compare it to our PML sample mean.

This gives us the sampling distribution of the sample means. It shows us what we can expect under the null hypothesis (PML students are as perseverant as other psychology students).
How likely is our observation? Is it possible to visualize the probability of our sample mean or a more extreme mean?

Sampling distribution

You can visualize the probability as follows:

How can you quantify this probability?

Calculate the area under the curve (Udo told you all about it).
Why an area (our mean or more extreme)?

Sampling distribution

The area is difficult to calculate. Why?

Your sample mean must come from the same distribution as the null hypothesis. Which one is that?

You can't just simply take that many samples (expensive, time-consuming)

Now what? How can we learn how the null hypothesis is distributed?

Central limit theorem

From which distribution does our sample mean originate?

Central limit theorem: sample means are approximately normally distributed (if the sample size is large enough). Even if the population is not normally distributed itself!

Think it through.

Who can dissect and explain this definition?

Central limit theorem

Example 1: the binomial distribution (e.g., a Xhosa item).

m <- c()
for (i in 1:10000) {
        x <- rbinom(100, 1, .5)
        m[i] <- mean(x)
}
hist(m, prob = T, main = "sampling distribution")

Central limit theorem

Example 2: the gamma distribution (e.g., response time).

m <- c()
for (i in 1:10000) {
        x <- rgamma(100, 2)
        m[i] <- mean(x)
}
hist(m, prob = T, main = "sampling distribution")

Central limit theorem

Milestone 1: If our sample is large enough, we know that its mean originates from an (approximately) normal distribution.

Can we now calculate how (un)likely our sample mean is?

Central limit theorem

No! We don't know the mean and the standard deviation of this distribution.

It makes a difference :)

Central limit theorem

How do we determine the mean?

Mean perserverance of psychology students was 20.
We perform a one-sample t-test.

How do we determine the standard deviation?

The central limit theorem also holds for the standard deviation!
I won't show, but you will.

Central limit theorem

Milestone 2: If our sample is large enough, we know that its standard deviation originates from an (approximately) normal distribution.

We can therefore use this standard deviation (\(s\)) for the sampling distribution (see chapter 3.5).
The standard deviation of the sampling distribution is called the standard error (\(SE\)): \(SE=\frac{s}{\sqrt{n}}\)
Why do we divide by \(\sqrt{n}\)? Think it through!

Statistical inference

Now that we know the exact distribution of our null hypothesis, a normal distribution with a mean of 20 and standard error \(SE\), we can calculate the area under de curve, and thus the likelihood of our mean.

That is, using data drawn from the population with some form of sampling, we can make a proposition about a population.

The area under the curve? Yes, it's the p-value! More about that next class!

Sampling distributions

"A sampling distribution is the probability distribution of a given statistic based on a random sample."

Sampling distribution of the …

median
standard deviation
…

Standard error is the standard deviation of such a sampling distribution.

Dig deeper

Watch a great animation that once again explains it very clearly (source: NYTimes / CreatureCast):

Dig deeper

Watch another demonstration or fool around with it in R (use R console, don't use RStudio) (source: Vistat):

library(animation)
ani.options(interval = 1)
par(mar = c(3, 3, 1, 0.5), mgp = c(1.5, 0.5, 0), tcl = -0.3)
lambda = 4
f = function(n) rpois(n, lambda)
clt.ani(FUN = f, mean = lambda, sd = lambda)

Or buy your favorite distribution.

Warming Up Part 2

Estimate my length

Write down your estimate (independent measurements).
What is your best collective guess?
Let's say: you win E10,- if your average is correct and you lose E1,- if incorrect.
- Would you take this bet?
- What if your collective guess might deviate 2cm from my true length?
- Or 10 cm?
What information can we use to get some estimate of the precision of all your estimates?
How could you make the collective guess more reliable?

Confidence intervals

(Sub) Outline

Two related topics:
- How do we calculate a CI?
- How should (not) we interpret a CI?
We can simulate (in R): example length

Estimate my length

##  [1] 179 190 204 177 187 189 195 186 208 187 192 198 184 178 206 165 197
## [18] 188 198 192 209 176 204 208 188 163 193 182 196 191

What is your best guess of \(\mu\)?

mean(s) = 190.2867181

Estimate my length

We have some randomness: How can we quantify this?

Confidence interval (CI)

\(95\% CI : P(\bar{x} - 1.96 * SE \le \mu \le \bar{x} + 1.96 * SE)\)
\(SE=\frac{SD}{\sqrt{N}}\)
If \(\alpha = 0.05\):
- left border: \(\bar{x} - 1.96 * SE\)
- right border: \(\bar{x} + 1.96 * SE\)

1.96??

qnorm(.025, 0, 1) # left border in standard normal distribution

## [1] -1.959964

pnorm(-1.96)

## [1] 0.0249979

Confidence interval

A CI quantifies the expected variation
If we would repeat the experiment, in 95% of the time, the true pop. mean (\(\mu\)) will fall within the constructed interval

Let's assume: we take 30 samples from a population with: mu = 188 and sd = 10
Give me a 95% CI surrouding a sample mean, based on this population

Let's simulate

If we would repeat the experiment, in 95% of the time, the true pop. mean (\(\mu\)) will fall within the constructed interval

mean(s) - 1.96 * (sd(s) / sqrt(length(s))) # left border

## [1] 186.0866

mean(s) + 1.96 * (sd(s) / sqrt(length(s))) # right border

## [1] 194.4868

Let's simulate

bounds <- matrix(NA, 100, 3)

for(i in 1:100) {
  
  mysample <- rnorm(30, 188, 10) # some sample
  se_mysample <- sd(mysample)/sqrt(length(mysample))
  
  left_border <- mean(mysample) - 1.96 * se_mysample
  right_border <- mean(mysample) + 1.96 * se_mysample
  in_interval <- 188 > left_border & 188 < right_border # 1 Yes; 0 No
  
  bounds[i,] <- c(left_border, right_border, in_interval)
  
}

Let's simulate

The CI is different in every experiment!
\(\bar{x}\) is only an estimate of \(\mu\) and the SE only an estimate of \(\sigma/sqrt(n)\)

head(bounds, 2)

##          [,1]     [,2] [,3]
## [1,] 183.5443 191.7547    1
## [2,] 181.6433 189.4397    1

head(bounds[bounds[,3]==0,], 2)

##          [,1]     [,2] [,3]
## [1,] 190.0145 195.6054    0
## [2,] 188.1520 195.2818    0

Let's simulate

table(bounds[,3])

## 
##  0  1 
##  9 91

Let's visualize

plot(1, xlim=c(1, 100), ylim=c(180, 200), type='n', axes = FALSE, 
     ylab = 'CI', xlab = 'Experiment')
axis(1)
axis(2)
abline(h=188, col='blue', lty =3)

for (i in 1:100) {
  if(bounds[i,3] == 1) {
    lines(x = c(i,i), y = c(bounds[i,1:2]), col = 'darkgreen')
  } else {
    lines(x = c(i,i), y = c(bounds[i,1:2]), col = 'darkred')  
  }
}

Let's visualize

Let's simulate

What if we increase the sample size (e.g. 300)?

Do we have less CI that don't include the true mean? Does the width or the CI change?

Tip: \(95\% = P(\bar{x} - 1.96 * \frac{sd}{\sqrt{N}} \le \mu \le \bar{x} + 1.96 * \frac{sd}{\sqrt{N}})\)

Let's simulate

What if we increase the sample size (e.g. 300)?

s <- rnorm(300, 188, 10)
mean(s) - 1.96 * (sd(s) / sqrt(length(s))) # left border

## [1] 187.536

mean(s) + 1.96 * (sd(s) / sqrt(length(s))) # right border

## [1] 189.7576

Tip: \(95\% = P(\bar{x} - 1.96 * \frac{sd}{\sqrt{N}} \le \mu \le \bar{x} + 1.96 * \frac{sd}{\sqrt{N}})\)

Let's simulate

Revisit this 1.96??

x <- rnorm(30, 188, 10)
mean(x)

## [1] 187.7537

sd(x)/sqrt(length(x))

## [1] 1.609613

188.8 - 1.96 * (1.687/sqrt(length(30))) # left border

## [1] 185.4935

qnorm(.025, 188.8, 1.687) # left border

## [1] 185.4935

Revisit this 1.96??

Interlude: Kahoot!

Get your phone (or laptop)
Go to Kahoot
Make an alias

Interlude: Kahoot!

Statements from: Rink Hoekstra et al. (2014). Robust misinterpretation of confidence intervals

Interlude: Kahoot!

If we would repeat the experiment, in 95% of the time, the true pop. mean (\(\mu\)) will fall within the constructed interval
\(\mu\) has one value (I have one true length). This is either in or outside the CI. Probability is based on the frequent repetition of experiments

Are we all frequentists?

In this course we use a frequentist framework
But there are more options: become a Bayesian
Read for example: http://www.ejwagenmakers.com/inpress/HoekstraEtAlPBR.pdf
Course on Bayesian statistics

Recap

Based on the CLT we know that our best guess for \(\mu\) is \(\bar{x}\) …
We have to choose some confidence level (\(\alpha\))
We can construct a CI such that in X% of the times …
The true mean will fall within our constructed interval

T.test

x <- rnorm(30, 188, 10)
t.test(x)

## 
##  One Sample t-test
## 
## data:  x
## t = 125.9, df = 29, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  184.0789 190.1585
## sample estimates:
## mean of x 
##  187.1187

mean(x) + qt(.025, 29) * sd(x)/sqrt(length(x))

## [1] 184.0789

Dig deeper

http://www.ejwagenmakers.com/inpress/HoekstraEtAlPBR.pdf

http://rpsychologist.com/d3/CI/

http://students.brown.edu/seeing-theory/statistical-inference/index.html#second

Cooling Down

Where are we?

Today we learned …

that the central limit theorem is important because enables us to calculate the probability of observing some \(\bar{x}\),
how to construct a confidence interval,
and that such an interval quantifies the uncertainty in our experimental procedure.

Next we'll learn …

how we can calculate p-values for clear hypotheses,
and which errors we can make in statistical inference.

Where are we?

descriptive statistics inferential statistics probability sum rule mutually exclusive events independence product rule conditional probability venn diagram discrete probability distribution continuous probability distribution binomial distribution quincunx binomial theorem normal distribution gamma distribution central limit theorem sample mean sampling distribution standard deviation standard error one-sample t-test confidence interval 1.96 null hypothesis p-value p-value distribution test statistic t-value z-value student's t-test two-sample t-test degrees of freedom one- and two-tailed tests statistical significance type 1 and type 2 errors significance level family-wise error rate multiple comparisons problem false discovery rate statistical power observed / predicted power lindley’s paradox effect size prediction / association least squares linear regression linear equation regression coefficients polynomial regression logistic regression explained variation errors and residuals model selection occam’s razor saturated model mean squared prediction error bias-variance trade-off overfitting / underfitting adjusted r-squared cross-validation information criterion statistical inference frequentist inference bayesian inference parametric statistics nonparametric statistics multicollinearity heteroscedasticity

Assignment 2

See syllabus and assignment.

Make sure to explain …

in your own words
what you do,
how you do it,
and why you do it.

Deadline tomorrow 17:00.

Excellerate your learning

Learn from your mistakes

many students keep making the same mistakes (even on the exam)
exploit the feedback effect and use the answer sheet to review your past assignment