Universities & the student data trail

The Guardian has an interesting article out right now about universities collecting and using data on students to improve retention, etc.

By plotting library usage against academic achievement they discovered that students who did not use the library were more than seven times more likely to drop out of their degree than those who did.

Lots of interesting ethical issues here. Hopefully we will be able to discuss this sort of practice in my Predictive Analytics (Data Mining) class this Fall (ISC 420).

From the Email Way-Back Machine

I’ve been harvesting emails from a technical mailing list for a project I’m working on. The list started in 1995. I noticed that back in the 90s, before we really thought about Y2K issues, lots of people were still using email clients that used 2-digit years as the From: header. (For example “23 May 95″ instead of “23 May 1995″). After a while this behavior drops off, and everyone used 4-digit years.

At what point did the developers of email software “get the memo” to change to a 4-digit year?

Google chart showing prevalence of 2-digit years in one email list 1995-2013. The biggest change was from 1995-1996 when email software switched to 4-digit years.

This shows that 1995-1996 timeframe was the big change for switching to a 4-digit year in most cases. There were still a few stragglers after 1997, but fewer and fewer each year. (And 13 messages were still written incorrectly in 2000; 10 of which had “100″ as the year. The other three listed “00″.)

View original data