#BbWorld15 + DevCon Keynote – Predicting Learner Outcomes with Learning Analytics

Speaker: Ryan Baker, Associate Professor of Cognitive Studies at Teachers College, Columbia University, Program Coordinator of TC’s Masters of Learning Analytics, and President of the International Educational Data Mining Society

In recent years there has been increasing interest in predicting a range of learner outcomes from the types of data now available in systems like Blackboard. In this talk, I will discuss my research laboratory’s work to predict a range of outcomes, from course success, to preparation for future learning, to long-term educational attainment and participation in scientific communities of practice. I will discuss the feature engineering necessary to take the best advantage of data mining methods, and will discuss core considerations to take into account when developing predictive analytics, to ensure the validity and actionability of predictions. I will discuss these themes in the context of successful projects to predict learner outcomes, from undergraduates to K-12 to adult learners.

(Live Streaming Available)


We all want our students to succeed!

  • But how do we get there? How do we best help students? We have some simplistic solutions such as academic advisors talking to students who have already failed, deans talking to a student with a behavior problem, or using “at-risk” or “first year” student demographics, but these all come late and don’t always make it to the right students.
  • In recent years… we have a new opportunity with gathering up more data.
  • We have deeper integration of systems into universities – where data feeds into predictive models and predictive models feed into intervention practices.
  • Learning analytics is possible because we have more and better data and more and better methods.
  • Online use of LMSs is now a standard solution and there is common use.  This provides access data around discussion forums, assignments, eTextbooks, and overall online learning activities.  These data are sitting in our LMSs waiting to be tapped.
  • International Educational Data Mining Society LINK
  • Prediction, structure discovery, relationship mining, distillation of data for human judgment, discovery with models are all parts of data gathering and analytics. (ETM/LA Baker & Siemens 2014)
  • We need predictions that are important, timely, actionable, and multi-dimensional.
  • What is important?
    • Course completion
    • Course learning
      • Not just tests, but also robust learning
      • Knowledge retention
      • Transfer to new contexts
      • Preparation for future learning
    • Graduation
    • Successful professional career
    • Donation to the alma mater
  • What are the models that predict course completion?
  • Demographics is predictive yet potentially problematic. Demographics are helpful to test models rather than say “profiling” students.
  • Sources of useful data:
    • Discussion Forums with Amount of Participation and the Topics being Discussed
    • Other Communication with Instructor (Barber & Sharkey, 2012)
    • Grades on Early Assignments (Olama et al., 2014; Wolff et. al., 2014, Baker et al., 2015)
    • Performance and Behavior within Online Learning (Aleven et al., 2004; Pardos et al., 2013; literally thousands of papers)
    • Use of eTextbooks and other Online Resources (Wolff et. al, 2014; Brooks et al., 2014; Baker et al., 2015)

It’s more difficult to get data from in class in person courses.  In online courses it’s easier to gather and collect data that can feed models.

  • Data is useful because they are actionable:
    • Student who has not opened textbook yet –> encourage to get started
    • Student games the system rather than learning in online homework –> connect behavior to course grade, discuss implications with students, give extra learning activities
    • Student bombs first assignment –> get them extra academic support
  • Blackboard provides resources in Learn 9.1 and Learning Analytics in Learn SaaS and more advanced systems like Bb Analytics products that provide LMS data, demographic SIS data, and performance over time.
  • The more powerful models and able to predict outcomes.
  • Soomo Learning is an eTextbook provider with data and activity monitoring of students.
  • The data is typically easy to gather, the hard part is figuring out how to engineer all of this data into something useful.
  • Turning the mess into meaning (feature engineering) consists of simply extracting features from data, and other times involves building a model of some complex behavioral construct and then putting that model into your predictive model.
  • Standardizing data makes it easier like TinCan, Caliper, and the LearnSphere format but validation of the data is essential.

We can build models that predict student outcomes and these models can be used for interventions that make a difference!

  • We need to continue our work to make better predictive models, for more actionable factors, and for cheaper.
  • We need better validation of the models we have.
  • We need practices around how to best use the models.
  • We need students to succeed!