Tuesday, November 25, 2014

DALMOOC episode 7: Look into your crystal ball

Whooooa! What is all this?


Alright, we're in Week six of DALMOOC, but as usual I am posting a week behind.  In previous weeks I was having a top of fun playing with Gephi and Tableau. Even thought the source material wasn't that meaningful to me I was having fun exploring the potential of these tools for analytics. This week we got our hands on Rapidminer a free(mium) piece of software that provides an environment for machine learning, data mining and predictive analysis. 

Sounds pretty cool, doesn't it?  I do have to say that the drag and drop aspect of the application does make it ridiculously easy quickly put together some blocks to analyze a chunk of data. The caveat is that you need to know what the heck you are doing (and obviously I didn't ;-) ).  I was having loads of issues navigating the application, and I somehow managed to not get some windows that I needed in order to input information to, and I couldn't find where to find the functions that I needed...  Luckily one of my colleagues was visiting who is actually working on machine learning and was able to give me a quick primer on Rapidminder - crisis averted.  I did end up attempting the assignment on my own, but I wasn't getting the right answer.  With other things to do, I gave up on the optional assignment ;-)

With that software experience this past week, what is the use of prediction modeling in education? Well (if you can get your software working ;--)  ), the goal is to develop (and presumably use) a model which can infer something (a predicted variable) from some combination of other aspects of data that you have on hand (a.k.a. predictor variables).  Sometimes this is used to predict the future, and sometimes it is used to make inferences about the here and now. An example of this might be using a learner's previous grades in courses as predictors for future success.  To some extent this is what SATs and GREs are (and I've got my own issues with these types of tests - perhaps something for another post).  The key thing here is that there are so many variables in predicting future success. It is not just about past grades, so take that one with a grain of salt.

Something that goes along with modeling is Regression: You use this when there is something you want to predict and it is numerical in nature. Examples of this might be number of student help requests, how long it takes to answer questions, how much of an article was read by a learner, prediction of test scores, etc. A regressor is a number that predicts another number.  A training model is when you use data that you already know the answers from and try to build a model to teach the algorithm.

There are different types of regressions.  A linear regression is flexible (surprisingly so according to video), and it's a speedster.  It's often more accurate than more complex models (especially ones you cross-validate). It's feasible to understand your model (with some caveats).

In watching the videos last week, some examples of regression algorithms I got conceptually from a logic perspective, but some just seem to go right over my head.  I guess I need a little more experience here to really "get it" (at least from an applied sense)

Another way to create a model is Classification: You use this when there is something you want to predict (label) and that prediction is categorical, in other words it is not a number, but a category such as right and wrong; or will drop, or persevere through course. Regardless of the model you create, you always need to cross validate the model you are using for the level you are using it in (e.g. new students? new schools? new demographics?) otherwise your model might not be giving you the information you think it's giving you.

This week, for me, was yet another reminder that I am not a maths person.  Don't get me wrong, I appreciate the elegance of mathematics, but I honestly don't care about optimizing my algorithms through maths.  I'd like to just know that these certain x-algorithms work for these y-scenarios, and I want easy ways to use them :)  Anything beyond that, for me, is overkill.  This is probably why I didn't like my undergraduate education as much as I've enjoyed my graduate education:  I wanted to build things, but my program was focusing on the nitty gritty and engine performance :)




SIDENOTES
  • Alternative episode title: Outlook hazy, try again later
  • Neural Networks have not been successful methods (hmmm...no one has told this to scifi writers ;-) sounds cool, even though they are inconsistent in their results)

blog comments powered by Disqus