Friday, November 28, 2014

DALMOOC episode 8: Bureau of pre-learning

I see a lot of WTF behavior from learners. This is bad... or is it?
Oh hey!  It's week 6 in DALMOOC and I am actually "on time" this time!  Even if I weren't it's perfectly OK since there are cohorts starting all throughout the duration of the MOOC (or so I suspect), so whoever is reading this: Hello!

This week the topic of DALMOOC is looking at behavior detectors (types of prediction models).  Behavior detection is a type of model (or types of models) that we can infer from the data collected in the system, or set of systems, that we discussed in previous weeks (like the LMS for example).  Some of these are behaviors like off-task behavior such as playing candy crush during class or doodling when you're supposed to be solving for x. Other behaviors are gaming the system, disengaged behaviors, careless errors, and WTF behaviors (without thinking fastidiously? time fun? you decide ;-) ). WTF behavior is working on the system but not the task specified.  As I was listening to the videos this week I was thinking about gaming behaviors‡ I was thinking that not all gaming behavior is bad.  If I am stuck in a system, I'm more apt to game it so that I can move and, and try to salvage any learning, rather than just get stuck and say eff-it-all.  I wonder what others think about this.

Some related problems to behavior detectors are sensor free affect detection of boredom, fun, frustration, or delight.  Even with sensors, I'd say that I'd have problems identifying delight. Maybe my brain looks a certain way in a MRI machine when I get a sense of delight, but I as a human this is a concept that would be hard to pin down.

Anyway - another things discussed this week is Ground Truth. The idea is that all data is going to be noisy so it won't be one "truth" but there is "ground truth". I guess the idea here is that there is no one answer to life, the Universe and everything, so we look our data to determine an approximation of what might be going on.   Where to do you get data for this? Self-reports from learners, Field Observations§, text analysis, and video coding. The thing I was considering (and I think this was mentioned) is that self-reporting isn't that great for behaviors students, after all most of us don't want to admit that we are gaming the system or doing something to subvert the system. Some people might just do it because they don't care, or because they think that you exercise is stupid and they will let you know, but most, I think, would care what others think, and might have some reverence for the instructor, thus prevent them from accurately self-reporting.

One of the things that made me laugh a bit was an example given of a text log file where the system told the learner that he was wrong but in a cryptic way. This reminds me of my early MS DOS days, when I was vising relatives who had Windows 3.1 (for workgroups!) and I was dumped from the GUI to a full window DOS environment.  I didn't know any commands, so I tried natural language commands...and I got the dreaded "error, retry, abort" and typing any three (or combination of those three) words did not work. Frustration! I thought I had broken the computer and no one was home!

Another thing that came to mind with these data collection methods is the golden triangle (time, quality, cost).  Not every method is equal to other methods of data collecting. For instance video coding is slowest, but it is replicable and precise.

Moving along, we talked a bit about  Feature Engineering (aka rational modeling, aka cognitive modeling ) which is the art of creating predictor variables. This is an art because it involves lore more than well defined principles. This is also an iterative process.  Personally I was ready to write this off but the art and iteration aspect is something that appeals to me rather than just cold hard black-boxes. The idea with this is that you go for quantity at first, not quality, and then you iterate forward, further defining your variables.  Just like in other projects and research you can build off the ideas of others; there are many papers out there for what has worked and what hasn't (seems like advice I was also given at my EDDE 801 seminar this past summer).  Software you can use for this process include Excel (pivot tables for example) and OpenRefine (previously Google Refine). A good thing to remember is that feature engineering can over-fit, so we're going back to last week where we said that everything over-fits to some extent.

Finally we have diagnostic metrics. My eyes started glazing over a bit with this.  I think part of it was that I didn't have my own examples to work with so it was all a bit abstract (which is fine). I am looking forward to the spring 2015 Big Data in Education MOOC to go more a bit in depth with this.  So what are the diagnostic metrics mentioned? (might need a more detailed cheat-sheet for these)
  • ROC -- Receiver operating Characteristic curve good for a two-value prediction (on/off, true/false, etc.)
  • A' -- related to ROC - probability that if the mode is give an example from a category, it can identify which category it came from.  A' more difficult to compute compared to kappa and only works with two categories. Easy to interpret statistically.
  • Precision -- probability a data point that is classified as true, is really true
  • Recall -- probability that a data point is actually true when classified as true

We also covered Regressors such as:
  • Linear Correlation -- if X's values change, do Y's values change as well?  Correlation is vulnerable to outliers.
  • R-squared -- correlation squared. also a measure of what percentage of variance in dependent measure is explained by a model.  Its usage depends on which community has really adopted it.
  • Mean Absolute Error (MAE) tells you avg amt of which the prediction deviate from actual values
  • Root Mean Square (RMSE) does the same but penalizes large deviations

Finally, there are different types of validity (this brings me back to my days in my first research methods course):
  • Construct validity -- Does your model measure what it says it measures?
  • Predictive validity -- Does your model predict the future as well as the present?
  • Substantive validity -- Do the results matter? (or as Pat Fahy would say "so what?" )
  • Content Validity -- Does the test cover the full domain it's meant to cover?
  •  Conclusion validity -- Are conclusions justified based on the results?

So, that was week 6 in a nutshell.  What stood out for you all?

† Image from movie Minority Report (department of precrime)
‡ Granted, I need to go and read more articles on gaming behaviors to know all the details, this was just an initial reaction.
§ There is a free android app for Field Observations that they've developed
blog comments powered by Disqus