The role of A/B tests in the study of online learning

Google search

Popularity

7000 tests on search algorithm in 2011 alone
Facebook, Twitter, Wikipedia, Etsy, Netflix, Tinder, OkCupid, Booking.com, et cetera

Google search

Popularity

7000 tests on search algorithm in 2011 alone
Facebook, Twitter, Wikipedia, Etsy, Netflix, Tinder, OkCupid, Booking.com, et cetera

Obama campaign

Impact

2008 campaign: Sign Up, Sign Up Now, Join Us Now, Learn More
Learn More: conversion (sign up) rate +18.6%
2012 campaign: 500 tests, donation conversions +49%, sign up conversions +161%

What about learning?

What's the role of A/B tests, i.e., online randomized field experiments, in the study of online learning?

What about learning?

What's the role of A/B tests, i.e., online randomized field experiments, in the study of online learning?

Nienke Ruijs / Han van der Maas / Gunter Maris / Alexander Savi

What's next

A/B tests in online learning

an introduction

an illustration

lessons, challenges, opportunities

An introduction

Cases

Coursera

engagement: home work reminder vs previous activities reminder

Khan Academy

sneak preview of advanced material
mindset intervention for math problems

DuoLingo

introduce pronoun it early in curriculum for Spanish learners

Cases

Coursera

engagement: home work reminder vs previous activities reminder

Khan Academy

sneak preview of advanced material
mindset intervention for math problems

DuoLingo

introduce pronoun it early in curriculum for Spanish learners

Mostly anecdotal. Exceptions: MOOCs and ASSISTments.

Cases

Why anecdotal?

many users required

demanding for fast growing companies: priority, payoff, intervention

commercial companies

A missed opportunity?

Evidence-based

Ecologically valid

Double blind

Iterative

Non-invasive

Verification

An illustration

Evaluating interventions

Evaluating interventions in Math Garden to increase the return on investment of online learning.

Math Garden and the curious case of the capitalists

"Capitalists"…

…hiding in the data

Math Garden and the curious case of the capitalists

"Capitalists"…

…hiding in the data

Math Garden and the curious case of the capitalists

Capitalists…

…hiding in the data

Surface learning, deep learning, strategic learning.

Promoting deep learning

Many ways
Toil time: delaying the question mark
Question mark greyed out and inactive during toil time
Minimum toil time of 0, 3, 6, or 9 seconds?

Expectations

Status quo

fast guess = expensive
fast question mark = free

Toil time

fast guess = expensive
fast question mark = impossible

Expectations

generally fewer question mark responses
generally slower responses

Results: fewer question mark responses?

domain_id	y.level	term	estimate	std.error	statistic
1	0	(Intercept)	6.965	0.004	512.009
1	0	cond_1_bw	1.290	0.009	28.320
1	0	cond_2_bw	1.510	0.010	39.957
1	0	cond_3_bw	1.500	0.012	33.195
1	1	(Intercept)	18.969	0.004	807.042
1	1	cond_1_bw	1.252	0.008	26.553
1	1	cond_2_bw	1.472	0.010	39.059
1	1	cond_3_bw	1.487	0.012	33.443

Results: slower responses?

domain_id	term	estimate	std.error	statistic
1	(Intercept)	8544.932	3.168	2697.040
1	cond_1_bw	31.548	8.927	3.534
1	cond_2_bw	172.111	8.981	19.164
1	cond_3_bw	-40.947	8.995	-4.552

Results: what about the capitalists?

Lessons i, challenges ?,
opportunities !

Lessons, challenges, opportunities

Browser compatibility i
Manifest or latent learning ?
Heterogeneous treatment effects and adaptivity !
Big data, power, and exploration !
Tapping into the system !
Local minima and generalizability ?

Browser incompatibility i

Manipulation check

answer	condition	toil_time	response_in_seconds
?	3	9	8.968
?	3	9	8.995
?	3	9	1.147
?	3	9	1.810
?	3	9	8.979

In some browsers question mark button was greyed out but active, so we

looked for known incompatible browsers
retrieved browser information from user agent string with ua-parser (non-deterministic)
removed users that used known incompatible browsers
some illegal responses could not be identified

Manifest or latent learning ?

Manifest

We used proxy measures for learning

question mark responses
fast responses
…

signal-to-noise ratio; learning

Latent

signal-to-noise ratio; learning

Manifest or latent learning ?

Manifest

We used proxy measures for learning

question mark responses
fast responses
…

signal-to-noise ratio; learning

Latent

signal-to-noise ratio; learning

Where to locate the learning?

Heterogeneous treatment effects and adaptivity !

Heterogeneous treatment effects

evaluate effects for subgroups
not trivial: e.g., fast response relative to median response time? person or item? or absolute (e.g., 3 seconds)?
explore!

Adaptivity

multi-armed bandit
exploration-exploitation trade-off

Big data, power, and exploration !

Too much power?

Exploration

complex and dynamical system
exploration and crossvalidation
all results from exploration set
validate on test set

Tapping into the system !

Do users terminate their games?

Do they start playing less frequently?

Multiple-choice versus open-ended questions? Influence on choice for difficulty level? Many possible questions to answer.

Local minima and generalizability ?

Local minima

Generalizability

The role of A/B tests in online learning

Strengths: ecological validity, non-invasiveness, …

Weaknesses: large scale required, local minima, …

Threats: ?

Opportunities:

intervening in the complex dynamical system of learning
optimizing the return on investment of online learning
enabling extensive adaptivity
…

Thank you

Email savi@uva.nl
Slides www.alexandersavi.nl

Savi, A. O., Ruijs, N. M., Maris, G. K. J., & van der Maas, H. L. J. (2016). The role of A/B tests in the study of large-scale online learning. Manuscript in preparation.

What's next

An introduction

Cases

Cases

Cases

A missed opportunity?

An illustration

Evaluating interventions

Math Garden and the curious case of the capitalists

Math Garden and the curious case of the capitalists

Math Garden and the curious case of the capitalists

Promoting deep learning

Expectations

Results: fewer question mark responses?

Results: fewer question mark responses?

Results: slower responses?

Results: slower responses?

Results: what about the capitalists?

Lessons i, challenges ?,opportunities !

Lessons, challenges, opportunities

Browser incompatibility i

Manifest or latent learning ?

Manifest or latent learning ?

Heterogeneous treatment effects and adaptivity !

Big data, power, and exploration !

Tapping into the system !

Local minima and generalizability ?

The role of A/B tests in online learning

Thank you

Lessons i, challenges ?,
opportunities !