YouTube lifecycle data available (preview)
By Mat Morrison September 6th, 2006
In Stories
A few weeks ago, I posted a note about a project I’m carrying out as part of my ongoing interest in memetics, contagious messaging and so on.
Geek alert: The rest of this story may not be wildly exciting to normal people.
It’s taken me a while to get everything together; the data, the analysis tools etc. Because it’s partly a hobby (which means that I enjoy doing the tinkering myself when I have the time, rather than letting someone better qualified do it), and partly because I didn’t know whether there was any value in the exercise, it’s taken me longer to pull together, and is in a clunkier format than I might have wished. Something that I had imagined as being fairly simple and elegant is (in fact) tied together with string and masking tape at this stage (well, it is a prototype).
What went wrong?
- Bad data: I used a brilliantly versatile piece of software called Anthracite (Mac only) to write the web scraper. I’m not really using it in the optimal manner, and that led to some anomalies and artefacts that needed to be edited out by hand. Dull, repetitive work.
- Bad data: The machine that was doing the scraping was all set up to do it automatically at the same time every night. While I was away one weekend, a powercut did for that. I lost two days’ of data, and had to throw away a third day to maintain accuracy during the analysis phase (otherwise the increments would be artificially high). Again, I had to do this by hand for 130 records.
- Insufficient skill and power: I thought I’d do all the analysis in MS Access, but I couldn’t find a way to do what I needed. In the end, Access just became a way of collating information.
In the end, it’s taken Anthracite, a text editor, Access, Perl and Excel (and an eight-stage process) to do what I thought I could do in Anthracite and Access alone. But I’ve done it. If I were to do it again, I’d probably just use Perl (or more to the point, get someone else to do it).
What have I got? Well, the interesting thing is, I don’t know yet. I’ve got raw data on 130 YouTube videos. I’ve tracked the views they received daily for up to 20 days of their lifecycle. That’s about it at this stage.

Some rough and ready charting suggests that most videos peak views occur during the second 24-hours of their life, and drop off during the third. But I’m not interested in most videos; I want to identify those that achieve some kind of epidemic success.
I’ll continue to look at this data. If it looks as though there’s anything interesting emerging (particularly about non-standard lifecycles), I’ll post it here. Meanwhile, I make the raw (edited) data available here: YouTube Lifecycle data. If anyone wants to look at the tools and scripts I pulled together, please just ask, and I’ll send them to you.
1 Ben // Sep 21, 2006 at 11:29 am
Hi Mat,
interesting project! I found a similar tool that you might be useful : http://www.vidstats.com/
(I found it a bit patchy with pulling up data on any username)
2 Mat Morrison // Sep 21, 2006 at 4:09 pm
Yep, Ben: I think that rather eclipses my puny efforts in most ways. *sigh*
3 What we can learn from the real evangelists? | Mediaczar // Jul 15, 2008 at 10:17 am
[...] (and criticised) study by Tubemogul on the short shelf life of online video reminded me of some research into views on YouTube videos I did back in 2006. I only looked at about 130 random YouTube videos for the first 20 days of their [...]