Monday, September 10, 2012

AngryMath: Udacity Statistics 101


The prospect of massive-scale online schooling seems to be all the rage at the moment. Recent competing initiatives include Khan Academy, OpenCourseWare, Udacity, Coursera, and edX (the latter ones sponsored by top-name schools such Stanford, Harvard, or MIT, or else founded by ex-faculty members). The idea of universal and free access to college programs from top researchers has fired the imagination of many in the blogosphere, and some have predicted the imminent collapse of traditional universities in the face of this ?tsunami?.

As a college educator myself, I felt compelled to survey one of these courses, so as to assess their general quality, advantages, and disadvantages. (Perhaps there would be some techniques that I could fold into my own courses.) This summer, Sebastian Thrun's Udacity unveiled a new course, Introduction to Statistics, taught by Thrun himself, which I felt would be ideal for my purposes ? my current job largely specializing in teaching statistics at one of the community colleges in the City University of New York (and my master's degree being in Mathematics & Statistics). Having enrolled, I proceeded through the entirety of the course, watching all of the lecture videos and taking all of the web-based quizzes and the final exam.

In brief, here is my overall assessment: the course is amazingly, shockingly awful. It is poorly structured; it evidences an almost complete lack of planning for the lectures; it routinely fails to properly define or use standard terms or notation; it necessitates occasional massive gaps where ?magic? happens; and it results in nonstandard computations that would not be accepted in normal statistical work. In surveying the course, some nights I personally got seriously depressed at the notion that this might be standard fare for the college lectures encountered by most students during their academic careers.

Below I will try to pick out a ?Top 10? list of problems with the course. These are not comprehensive, but I feel that they do give a basic sense for the issues involved. Generally, the lectures and the overall sequence feel like they haven't been planned out in advance (and as a result, they don't connect together very well). One lecture is interrupted by a visitor walking into Thrun's office as he records it, and this is left in the video itself (Unit 17.8). Other lectures use a data set of students' guesses about Thrun's weight for a hypothesis test on his actual weight ? which, not being a population parameter, is totally incorrect and ?an abuse? (as he admits himself in Unit 32.1); yet this semi-accidental data set was convenient to access, and so was apparently considered acceptable.

But probably the best example of the lack of planning is how radically off-syllabus the course went from its initial advertising. Now, I've taught courses where things didn't go entirely according to plan ? maybe a lecture went a half-day long, but never in all my years of teaching has a course so massively diverged from the initial plan or course description. Below you can compare the starting advertised syllabus (before any lectures were posted) to the revised final syllabus (after the lectures were actually produced). You'll see that they are remarkably different.

Initial syllabus:


  1. Visualizing relationships in data ? Seeing relationships in data and predicting based on them; dealing with noise

  2. Processes that generates data ? Random processes; counting, computing with sample spaces; conditional probability; Bayes Rule

  3. Processes with a large number of events ? Normal distributions; the central limit theorem; adding random variables

  4. Real data and distributions ? Sampling distributions; confidence intervals; hypothesis tests; outliers

  5. Systematically understanding relationships ? Least squares; residuals; inference

  6. Understanding more complex relationships ? Transformation; smoothing; regression for two or more variables, categorical variables

  7. Where to go next ? Statistics vs machine learning; what to study next; where statistics is used; Final exam



Corrected syllabus:


  1. Visualizing relationships in data ? Seeing relationships in data and predicting based on them; Simpson's paradox

  2. Probability ? Probability; Bayes Rule; Correlation vs. Causation

  3. Estimation ? Maximum Likelihood Estimation; Mean, Mean, Mode; Standard Deviation, Variance

  4. Outliers and Normal Distribution ? Outliers, Quartiles; Binomial Distribution; Central Limit Theorem; Manipulating Normal Distribution

  5. Inference ? Confidence intervals; Hypothesis Testing

  6. Regression ? Linear regression; correlation

  7. Final Exam


Now, I've become fairly ?religious? about the text of mathematics ? reading the details correctly, and writing with precision, being absolutely paramount. (And I've found that for my remedial students, this fairly simple-sounding skill is a nearly insurmountable stumbling block.) When I saw the Udacity interface, I was initially excited; instead of a lecturer standing in front of a chalkboard, the frame is focused on the writing surface, which gives us the opportunity to highlight and be careful about the writing (this being similar to Khan Academy, etc.) But soon I became keenly disappointed at how poor and unclear the written presentation was.

There are at least two related issues. The first is that new terms and symbols are almost never given written definitions. Personally, I find that discussions and questions usually return to the definitions of terms, so setting those out carefully is the first and most important task. Here, new terms are casually described in the audio track, but they are neither technically careful nor visible to the viewer. I think this is exacerbated by the course's commitment to not following any textbook or other written source ? after the first encounter, there is no capacity to search, index, or reference back to terms or definitions that you might need later on (and this holds as well for specialized symbols for sums, products, conditionals, logical operators, etc., that tend to materialize for the first time in the middle of a problem).

But the second issue is that the algebraic manipulations themselves are uniformly sloppy and disjointed; some bits of the work will be written down, the next bit discussed verbally, then another unrelated scrap written down, etc. There are unfixed typos in words and equations. Statements and tables go unlabeled, so when a problem is done you can't tell from looking at it what the point was. Notation varies unpredictably: at different points in the course, the symbols ?, x-bar, and E(x) are all used for the sample mean without introduction or warning. Usually formulas are absent until given in summary at the end of a section, and then disparaged as being ?confusing and complicated? (Unit 9.10) or ?really clumsy? (Unit 9.15), which I think is a great pedagogical loss for learning to read and write math properly. At one point you get to see the assistant instructor write that ?0.1 = 0.06561? (Problem Set 2.6), which to me is an unforgivable, cardinal sin. In many cases one would have to rely on the discussion forums for a fellow student to present a clear and complete piece of written math for any of the example problems.

The pattern of lectures goes like this: A video nugget of a few minutes will be shown (perhaps 2-5 minutes), which leads to a web-based quiz question (prompting for retries until success), and then a brief video explanation of the answer. In general, I like this idea of frequent questioning and I do the same thing in my own classes: regular check-ins for myself and my students that we've successfully communicated the ideas at hand.

But a couple of things make this wonky here. One is that, obviously, the communication is not really two-way; neither Thrun nor the system is really ?listening? to take note of when a presentation has misfired and needs clarification. Another is that the quiz regime timing seems forced and frequently not at a point when there is really a legitimate new idea to check in on. I would guess that as much as half the time a question is actually asked before students have been given the tools to answer it, being used as a means of introducing a new section. Things like, ?Don't get disturbed if you don't know the answer? (Unit 1.4), or ?I'd be amazed if you got this correct!? (Unit 9.13), are heard frequently. These kinds of questions seem inherently unfair and, I can only imagine, discouraging to many students.

Astoundingly, the Udacity Introduction to Statistics course manages to go almost its entire length without ever mentioning or making any distinction between the population and sample in a study. I say I'm ?astounded? because in my classes (and any one I've surveyed or looked at), this is the key idea in introductory inferential statistics. It's the very first thing that is mentioned in my class (or the book), and it's the very last thing on the last day, too. It's the entire reason why inferential statistics is necessary in the first place. In fact, the very word ?statistics? means measures for one (sample) and not the other (population) ? but you'll never learn that from this class.

As a result, Thrun goes the entire course using the symbols ? and ? to indicate the mean and standard deviation of both a random variable (population) and a limited data set (sample), whereas normally they indicate only the former. He'll switch between the two essentially without notice, saying something like ?the observed standard deviation? (Unit 25.3), or ?our empirical mean? (Unit 25.4). The x-bar notation appears late in the course, mid-way through a problem statement ? and then being used to indicate the mean of a population in a hypothesis test, which is exactly reversed from normal usage (Problem Set 5.5). And the customary (unbiased) formula for sample standard deviation is entirely missing from the course, necessitating annotated instructor comments to point out that the results you get from this class would not be acceptable in any other venue (Unit 27.3).

A similar astounding absence: The entire sequence of Udacity's Introduction to Statistics passes without ever calculating any values for normal curves. Again, since the course is committed to being independent of any outside resource (no textbook, no tables, no statistical software suite), the result is that calculating probabilities or values for normal distributions is simply impossible and never occurs. Students don't have any opportunity to develop an intuition for normal-curve probabilities. The Empirical Rule (the 68/95/99% rule-of-thumb for standard deviations) is never mentioned. When the time comes to compute confidence intervals, Thrun is forced to give the direction, ?just multiply this value over here with 1.96 ? the magic number!? (Unit 24.19), not having any way to explain where this comes from, nor even mentioning at the time that this is specific to a 95% confidence level.

Thrun spends a surprising amount of time developing the actual formula for a normal curve, but no calculations are made with it and its utility in an introductory course is highly questionable. The absence is doubly weird because at one point he asserts, ?That's the purpose of the normal distribution for the sake of this class... we just do it for the normal distribution where things are relatively easy to compute?. (Unit 20.15)

Another bizarre gap: what one would think to be the keystone to inferences for a mean, the Central Limit Theorem (the fact that the distribution of possible sample-mean values automatically takes on a normal shape with large sample size) is never clearly stated, nor its importance explained. There is an optional programming unit with the name in the title (Unit 19), which does generate a bell-shaped histogram of a few thousand randomized sample means, and ends by stating that how this relates to the Central Limit Theorem will be discussed in the next unit. The next unit is on the Normal Distribution, but it still neglects to actually state the CLT, and instead winds up engaging in a rather baroque discussion to wit, ?it's a transition from a discrete space of finitely many outcomes to a space of infinitely many outcomes? (Unit 20.14). There's a later point where Thrun says, ?Remember the Central Limit Theorem? Remember what it said?? (Unit 25.2), and weirdly, this is the first time he actually outright (if very briefly) states it. This is cursorily tied into how confidence intervals work (blink and you'll miss it), and also said to relate to ?1.96 the magic number? in an unverifiable way (Unit 25.2-3). It's enormously unclear, and I think a distressing misstep.

Throughout the course, lectures and exercises veer rapidly between utterly trivial and nigh-impossible. I think this is a reflection of the one-way communication channel, such that Thrun can't have any awareness of what counts as easy and what counts as hard to the students. Frequently the ?problem sets? at the end of a section will have work that is dramatically different than anything shown in the lectures. The first half-dozen units of the class are fairly long and obvious presentations of reading different tables and charts and linear relationships. Then at some point he switches into a remarkably difficult ?complete the proof? exercise demonstrating that the sample mean is in fact the correct Maximum Likelihood Estimator for the population mean (Problem Set 3.1; not that he uses the terms sample/population) ? granted that this is ?optional?, but the course hasn't had any proofs at all to that point, the overall strategy of the proof isn't declared, and it involves numerous calculus concepts. Even my graduate text in statistical inference (Casella/Berger) felt compelled to present and explain that proof in its entirety. (Later, when he revisits this same exercise again in Unit 23, Thrun actually does finally explain the technique, which I presume to be a response to earlier complaints in this regard.)

Similar whiplash will be experienced at other points in the course. For example, one student wrote in the discussion forums for the course (regarding a different problem), ?Questions such as this one and the one before it 'Many Flips' are counter productive. The previously explained course material was mostly very smooth and gradual. Reaching 'Many Flips' felt like crashing into a reinforced concrete wall.? (Link). That's a perfect description of what I think the experience will be for many first-time students.

The course ends with a web-based final exam with 16 questions in the same vein as the section quizzes that have appeared all along. Upon completion, the student is able to print out a PDF ?certificate of accomplishment? saying that they've taken this course from Udacity, with one of several success levels (Highest Distinction for all 16 questions correct, High Distinction 13/16, Accomplishment 10/16, or Completion 8/16).

Now obviously, a somewhat delicate issue is that this is a completely worthless, faux-certification for a number of reasons. Obvious ones would be: (1) Udacity has no accreditation, oversight, or recognition from any outside body, and (2) the questions are all fixed and the answers are probably posted somewhere online in full. But even more importantly, and what really surprised me, was: (3) the fact that you can re-submit all of your answers as many times as you like until they are confirmed correct (just like the quizzes; and some are even multiple-choice). Another would be: (4) the final exam is just remarkably easy; could this be a response to recent criticisms that only a tiny percent of students who register for courses like these ever complete them? If this is a PR problem for Udacity, then obviously they can reduce the difficulty of a course to whatever level generates a desired completion rate.

Recently, the blog ?Godel's Lost Letter and P=NP? by Georgia Tech's Richard Lipton had a lengthy post considering a perceived security problem with programs like Thrun's at Udacity: namely, that a student could freely register multiple accounts and keep taking the final exam until they achieved an acceptable score. But this overlooks the rather blatant fact that no one need go to such lengths, since the system already allows you to re-submit each individual exam item as many times as you like until success. Apparently Thrun's own response to Lipton's concern was to propose tracking of IP addresses to identify duplicate students, which bizarrely suggests a complete lack of awareness of how his own final exams work. (?Well Thrun told me about it in person when I visited his company this winter. They also can track IP addresses and they can see what is going on with their students.?; ?Cheating or Mastering??, August 21, 2012)

As if the content-based problems noted above weren't enough, running throughout Thrun's presentations is a routine, suspiciously hard-sell call for how stellar the class was and how much you, the viewer, have learned. Personally, I found this to be both grating and a thou-doth-protest-too-much lampshading of the flaws of the course. (You might think that I'm being too harsh, but puncturing this kind of stuff is, after all, the raison d'?tre of the AngryMath blog). He says: ?You now know a lot about scatter plots!? (Unit 3.12) (yeah, lots). ?Isn't this a lot of fun? Isn't statistics really great? (Unit 6.16) (surely someone thinks otherwise). ?You are a very capable statistician at this point!? (Unit 32.12) (hyperbole at best). ?When people say this is a contradiction... just smile [in disagreement] and say you took Sebastian's Stats 101 and you understand.? (Unit 22.5) (yeah, I'll get right on that).

Finally, here's a core a problem that multiplies and exacerbates all the others. In normal college teaching, a truly dedicated instructor will go through a never-ending process of constant refinement and improvement for their courses, based on two-way interaction and feedback from live students. (I know I do; I've taught my introductory statistics course several dozen times and I still sit down and note possible improvements after almost every single class session.)

So in theory, any of the problems that I've noted above could be revisited and fixed on future pass-throughs of the course. But will that happen at Udacity, or any other massive online academic program? I strongly suspect not ? likely, the entire attraction for someone like Thrun (and the business case for institutions like his) is to be able to record basic lectures once and then never have to revisit them again. Or in other words: All the millions of students using these ventures will be permanently experiencing the shaky, version-1.0 trial run of a new course, when the instructor is him- or herself just barely figuring out how to teach it for the first time, and without the benefit of two-way feedback or any refinements.

Based on my review of the Udacity Introduction to Statistics course, I see some compelling strategic advantages for live in-class teachers, that will not be soon washed away by massive online video learning. Chief among them are the presence of actual two-way communication between teacher and students, such that the instructor can modify, expand, and respond to questions when appropriate (in regards to clarity of presentation, quiz questions, missing pieces, and rationalizing difficulty levels); and the ability to engage in a cycle of constant improvements and refinements every time the course is taught by a dedicated teacher. Also, I feel that written text is ultimately more useful than videos, being more elegant and precise, easier to search and index key terms and examples, suffering fewer technical problems, easier to update, and generally being truer to the form of mathematical written presentation in the first place. In addition to these, Thrun's lectures at Udacity have a stunning number of critical flaws (in regards to planning, sequencing, clarity, writing, and missing major topics) that leave me amazed if any actual intro-level student manages to make their way through the whole class.

Perhaps the upshot here is a restatement of the old saw: ?You get what you pay for.? (Udacity being currently free, with a mission-statement to remain that way). Or else another: ?Don't take a class from a world-famous researcher, because they don't really have time or interest for teaching.? Obviously, Sebastian Thrun is not just a teacher-by-online-video; he's also a Google Vice-President and Fellow, a Research Professor of Computer Science at Stanford, former director of the Stanford AI Laboratory, head of teams competing in DARPA challenges, and leads the development of Google's self-driving car program. How much time or focus would we expect him to have for a freshman-level introductory math course? (Not much; in one lecture he mentions that he's recording at 3AM and compares it to his ?day job? at Google.) Some of these shortcomings may be overcome by a more dedicated teacher. But others seem endemic to the massive-online project as a whole, and I suspect that the industry as a whole will turn out to be an over-inflating bubble that bursts at some point, much like other internet sensations of the recent past.

Source: http://www.angrymath.com/2012/09/udacity-statistics-101.html

wmt human nature arkansas football howard johnson blackhawks real housewives of new jersey levon helm

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.