It’s been uncommonly quiet on this site for the last few weeks, as we’ve been busy with various projects, and overseas for a range of research activities. This may well prove to be the calm before the storm, though – we have a number of very exciting new social media research outcomes in the pipeline, and our preliminary analysis of Twitter use in the latest Labor leadership spill a couple of weeks ago was just the start.
To rekindle our interest in new methodological advances in Twitter research, here’s a new method which my colleagues Darryl Woodford, Troy Sadkowsky and I came across on Tony Hirst’s fabulous blog OUseful.info a few months ago – and as Tony points out, there’s also a Microsoft Research paper which pursues a similar approach, so this method may have been developed almost simultaneously by a number of researchers. What we’re about to do is quite possibly more difficult to explain than to demonstrate, so let’s start with an illustration which shows the followers of former reinstated Australian Prime Minister Kevin Rudd’s account @KRuddMP.
(We gathered these data before Rudd’s return to the top job, so any effects of the latest Labor leadership change won’t show up here just yet – we’ll examine these new changes in a future post, once the dust has settled.)
The following graph draws on two key points. First, the list of current followers for any one account which is returned by the Twitter API is in (almost) perfect chronological order; in Rudd’s case, from follower number one to follower 1.2 million or so (at the time we gathered the list). Following Tony’s example, we call the position of each follower in the overall list their follower accession number. Second, through a different API request we are able to determine when each of these followers first joined Twitter, practically down to the second they joined – that’s their account creation date. If we plot those two parameters against one another, this is what we get (as always, click on the images to access larger versions…):
It’s immediately obvious that a curve emerges which looks suspiciously like a growth curve for Rudd’s follower base. As Twitter tells us, Rudd joined on 18 October 2008 – quite possibly without using the @KevinRuddPM username at the time; that handle was revealed only on 13 November that year, it seems, and changed to @KRuddMP after he lost the leadership to Julia Gillard in June 2010 (the existing follower base remains unaffected by such renamings of a Twitter account). Soon after, he began to accumulate followers, and his numbers swelled rapidly from mid-2009 onwards.
But why this pronounced curve, when (in principle) Twitter accounts of any age could become followers at any time? Well, here’s the clever bit of logic that Tony Hirst came up with:
- None of the follower accounts could become followers before they themselves were created.
- Some of the follower accounts will have become followers quite shortly after they were created – within minutes or hours. (This is even more likely in recent times, since Twitter began recommending high-profile accounts that new users may find interesting to follow.)
- Thus, the lower edge of the graph above is a pretty good approximation of the size of the follower base at any one moment – and we’re able to trace over time how an account’s follower base has developed.
In other words: from time to time, an account will follow @KRuddMP very shortly after it itself has been created. This means that any new followers joining after that account must have followed @KRuddMP at a time after that date; we are therefore able to calculate an approximate join date for those accounts even if their own creation dates are well in the past. Going through the list of followers from number one to number 1.2 million, noting down each most recent account creation date we come across and assigning it as a join date to all later followers until we come across an even more recent account creation date, we are effectively tracing @KRuddMP’s follower growth curve, then. (See – I did say this was easier to illustrate than to explain…)
As long as the target account itself remains the same, this works, incidentally, even if the account has been renamed – as is the case here, since we know that @KevinRuddPM changed its name to @KRuddMP following the 2010 ALP leadership spill. Interestingly, a pronounced vertical line also emerges – most of Rudd’s follower accounts were created after February 2009 or so, while the space to the left of that line is comparatively empty. This isn’t unique to Rudd, however: overall, most Twitter accounts in the world were created after that time, as the site emerged as a mass medium, so the darker shading to the right of that divide simply shows when Twitter itself became a popular social media platform.
There is one significant caveat, however: gathering an account’s list of followers, we only gather information about those followers who at present still remain as followers. If there was a large group of followers who left at some point before we gathered our data, we will not find any evidence of that exodus; we can only see when those followers who have stuck with the target account until now began to follow it. (Also, if someone followed, unfollowed, and then re-followed an account, we will only be able to see their presence since the last time they re-followed.)
There’s also a smaller issue with the Twitter API which seems to result in some recent followers to be placed (erroneously) at the start of the list – in Rudd’s case, there are a few accounts which were created in 2012 that show up amongst Rudd’s first few hundred followers, which is obviously impossible. So, we’ll ignore the first few followers in the list from now on. On that basis, then, If we isolate the growth curve only, here’s what we’re left with:
But something’s not quite right yet. The Rudd graph above shows a very rapid growth in followers which seems to begin as abruptly as it ends. Between late June 2009 and late January 2010, @KevinRuddPM (as it was known then) picked up a whopping 700,000 new followers, for no obvious reason; its follower accession rate was substantially smaller both before and after this time. Here’s the account’s estimated follower growth per day:
A closer look at some of the accounts which joined during this time reveals some unusual patterns. Many of the new follower accounts began to follow Rudd essentially within minutes of being created; many follow only a handful of other accounts, and usually those of celebrities such as @PaulaAbdul and @CindyCrawford; many never bothered to change their Twitter user avatar away from the default ‘egg’; many posted only a handful of times, often promoting mail-order pharmaceuticals or ‘guaranteed followback’ schemes; and many haven’t tweeted again since 2011 or so. In short: these clearly aren’t genuine followers, but spam accounts which are attempting to give themselves the veneer of respectability by following a range of leading Twitter users (with an apparent preference for accounts whose identity has been verified by Twitter).
Update: Andrew Richardson (@Andrew303) has helpfully pointed out that Twitter introduced the first iteration of its Suggested User List in early 2009, featuring a handful of major / important / notable Twitter users whom new users were all but forced to follow as part of the sign-on process. Rudd was one of the very few Australians on that (global) list, and – obviously – benefitted significantly. I strongly suspect that Twitter spammers also followed these accounts disproportionately in order to look ‘legit’.
Here’s a graph which distinguishes the number of very newly created accounts (created an hour ago or less) that followed @KevinRuddPM / @KRuddMP from the rest of his new followers each day. The result is striking: between late June 2009 and late January 2010, the influx of such new followers is massive, peaking at over 5,000 followers per day. (And remember that we’re only seeing those of these spam accounts which still exist today; many others may have been deleted by Twitter for spamming in the meantime.)
To be perfectly clear about this (and avoid any conspiracy theorists or lazy journalists who want to turn this into a story about “Kevin Rudd’s fake followers”): I’m not suggesting here that Rudd ‘invited’ or even ‘bought’ these spam followers in order to boost his numbers and appear more liked. Rather, the opposite is most likely true: these spam accounts followed Rudd precisely because he is a prominent Twitter user, with a verified account – following his account makes them look more legit. Update: plus, as Andrew has pointed out, many new users followed Rudd simply because Twitter itself told them to, as part of the sign-on process.
The kinks at the start and end of this period of spam follower influx are very pronounced; if we eliminate the spammers (and assume there are no genuine reasons for a significant increase and slowdown in Rudd’s follower accession rate at these points), the corrected graph should show a much smoother growth curve. A little trial and error reveals that we can remove the majority of suspicious followers by disregarding any accounts which followed Rudd within 90 minutes of their own creation; applied to the entire dataset of over 1.2 million followers, this removes more than half of Rudd’s followers.
Tracking Events through the Accession Timeline
Based on this approach, then, it now becomes possible to trace the impact of events in the target account’s timeline on their follower base – at least where they’ve had a positive impact, since we can’t identify unfollowings. After dropping any users who followed Rudd within 90 minutes of creating their accounts (removing most spam accounts, but probably also a number of genuine followers), we’re left with about 570,000 followers, whose accession is distributed as follows:
At first glance, this may not look like much: from early 2009 onwards, Rudd’s follower accession curve looks fairly steady and uneventful. But the devil is in the detail. Switching to a view which shows the estimated number of new followers per day, we can clearly identify several notable spikes in Rudd’s follower accession numbers. These are closely aligned with various recent events, including the 2010, 2012, and 2013 leadership spills (and in the case of 2012, Rudd’s resignation as Foreign Minister which preceded it). There’s also a notable lull in follower accession following the 2010 election which (eventually) returned Julia Gillard as Prime Minister, indicating perhaps a momentary honeymoon for Gillard and a corresponding lack of interest in Rudd’s return as PM. Interestingly, we can also make out a sharp spike on the day that Australian media reported that @KRuddMP had surpassed the 1 million followers mark – followers beget followers, it seems.
It’s not in isolation that this approach is most powerful, though. So, in a follow-up post we’ll look at the comparative Twitter follower accession rates of a greater number of leading Australian politicians, to see how they may reflect the waxing and waning fortunes of the various political parties. Stay tuned!
Finally: I’m sorry to say this, but contrary to what we usually do at Mapping Online Publics we won’t be able to share the tools that we’ve used to create these accession graphs – for the moment, they’re built to run on very specific infrastructure and they wouldn’t be much use to anyone outside of the project team. With a little knowledge of the Twitter API, and a lot of patience in gathering the data, you should be able to replicate this approach reasonably easily, though. Again, I recommend Tony Hirst’s excellent site OUseful.info for more on this method.