Google search

Popularity

Google search

Popularity

Obama campaign

Impact

  • 2008 campaign: Sign Up, Sign Up Now, Join Us Now, Learn More
  • Learn More: conversion (sign up) rate +18.6%
  • 2012 campaign: 500 tests, donation conversions +49%, sign up conversions +161%

What about learning?

What's the role of A/B tests, i.e., online randomized field experiments, in the study of online learning?

What about learning?

What's the role of A/B tests, i.e., online randomized field experiments, in the study of online learning?


Nienke Ruijs / Han van der Maas / Gunter Maris / Alexander Savi

What's next

A/B tests in online learning


  • an introduction


  • an illustration


  • lessons, challenges, opportunities

An introduction

Cases

Coursera

  • engagement: home work reminder vs previous activities reminder

Khan Academy

  • sneak preview of advanced material
  • mindset intervention for math problems

DuoLingo

  • introduce pronoun it early in curriculum for Spanish learners

Cases

Coursera

  • engagement: home work reminder vs previous activities reminder

Khan Academy

  • sneak preview of advanced material
  • mindset intervention for math problems

DuoLingo

  • introduce pronoun it early in curriculum for Spanish learners


Mostly anecdotal. Exceptions: MOOCs and ASSISTments.

Cases

Why anecdotal?


  • many users required


  • demanding for fast growing companies: priority, payoff, intervention


  • commercial companies

A missed opportunity?

Evidence-based

Ecologically valid

Double blind

Iterative

Non-invasive

Verification

An illustration

Evaluating interventions

Evaluating interventions in Math Garden to increase the return on investment of online learning.

Math Garden and the curious case of the capitalists

"Capitalists"…

…hiding in the data

Math Garden and the curious case of the capitalists

"Capitalists"…

…hiding in the data

Math Garden and the curious case of the capitalists

Capitalists…

…hiding in the data


Surface learning, deep learning, strategic learning.

Promoting deep learning

  • Many ways
  • Toil time: delaying the question mark
  • Question mark greyed out and inactive during toil time
  • Minimum toil time of 0, 3, 6, or 9 seconds?

Expectations

Status quo

  • fast guess = expensive
  • fast question mark = free

Toil time

  • fast guess = expensive
  • fast question mark = impossible

Expectations

  • generally fewer question mark responses
  • generally slower responses
Random assignment; 3 domains (addition, division, set); ~2*50433 users; ~2*5082348 responses; 14 weeks.

Results: fewer question mark responses?

Three domains. Easy (0), medium (1), and difficult (2) items.

Results: fewer question mark responses?

domain_id y.level term estimate std.error statistic p.value
1 0 (Intercept) 6.965 0.004 512.009 0
1 0 cond_1_bw 1.290 0.009 28.320 0
1 0 cond_2_bw 1.510 0.010 39.957 0
1 0 cond_3_bw 1.500 0.012 33.195 0
1 1 (Intercept) 18.969 0.004 807.042 0
1 1 cond_1_bw 1.252 0.008 26.553 0
1 1 cond_2_bw 1.472 0.010 39.059 0
1 1 cond_3_bw 1.487 0.012 33.443 0
Domain 1. Across difficulty levels. Multinomial logistic regression: correct/incorrect response (y.level) compared to question mark response. Backward difference.

Results: slower responses?

Three domains. Panels: easy (0), medium (1), and difficult (2) items.

Results: slower responses?

domain_id term estimate std.error statistic p.value
1 (Intercept) 8544.932 3.168 2697.040 0
1 cond_1_bw 31.548 8.927 3.534 0
1 cond_2_bw 172.111 8.981 19.164 0
1 cond_3_bw -40.947 8.995 -4.552 0
Domain 1. Across difficulty levels. Backward difference.

Results: what about the capitalists?

Three domains (y-panels). Four conditions (x-panels). Across difficulty levels.

Lessons i, challenges ?,
opportunities !

Lessons, challenges, opportunities

  • Browser compatibility i
  • Manifest or latent learning ?
  • Heterogeneous treatment effects and adaptivity !
  • Big data, power, and exploration !
  • Tapping into the system !
  • Local minima and generalizability ?

Browser incompatibility i

Manipulation check

answer condition toil_time response_in_seconds
? 3 9 8.968
? 3 9 8.995
? 3 9 1.147
? 3 9 1.810
? 3 9 8.979

In some browsers question mark button was greyed out but active, so we

  • looked for known incompatible browsers
  • retrieved browser information from user agent string with ua-parser (non-deterministic)
  • removed users that used known incompatible browsers
  • some illegal responses could not be identified

Manifest or latent learning ?

Manifest

We used proxy measures for learning

  • question mark responses
  • fast responses

signal-to-noise ratio; learning

Latent

signal-to-noise ratio; learning

Figure: mean ability ratings across three domains and all difficulty levels.

Manifest or latent learning ?

Manifest

We used proxy measures for learning

  • question mark responses
  • fast responses

signal-to-noise ratio; learning

Latent

signal-to-noise ratio; learning


Where to locate the learning?

Figure: mean ability ratings across three domains and all difficulty levels.

Heterogeneous treatment effects and adaptivity !

Heterogeneous treatment effects

  • evaluate effects for subgroups
  • not trivial: e.g., fast response relative to median response time? person or item? or absolute (e.g., 3 seconds)?
  • explore!

Adaptivity

  • multi-armed bandit
  • exploration-exploitation trade-off

Big data, power, and exploration !

Too much power?

Exploration

  • complex and dynamical system
  • exploration and crossvalidation
  • all results from exploration set
  • validate on test set

Tapping into the system !

Do users terminate their games?

Do they start playing less frequently?

Multiple-choice versus open-ended questions? Influence on choice for difficulty level? Many possible questions to answer.

Local minima and generalizability ?

Local minima

Generalizability

The role of A/B tests in online learning

Strengths: ecological validity, non-invasiveness, …

Weaknesses: large scale required, local minima, …

Threats: ?

Opportunities:

  • intervening in the complex dynamical system of learning
  • optimizing the return on investment of online learning
  • enabling extensive adaptivity

Thank you













Email savi@uva.nl
Slides www.alexandersavi.nl


Savi, A. O., Ruijs, N. M., Maris, G. K. J., & van der Maas, H. L. J. (2016). The role of A/B tests in the study of large-scale online learning. Manuscript in preparation.