For this blog, I have used data sets which include the first million and recent million IDs discussed in recent posts, in addition to new data from our CCI Data Scientist Troy Sadkowsky covering ID’s between 1000000000-1,000,999,999 (1 million Ids) and 1,001,000,000-1,011,000,000 (10 million Ids). This data covers both the first few months of Twitter operation, as well as periods in early 2011, late 2012 and early 2013, as seen below:
Because these accounts are of different ages, for some comparisons it is useful to look at the ratios, that is the statuses, followers, favorites etc on a per-day basis. There is still a potential bias here towards newer accounts, in that they are perhaps likely to be more active having recently decided to join the platform, so that is something to be aware of in viewing the data that follows. In this post I do not intend to speculate too much on what the data means (excepting one specific example, which we will come to), as much of it needs further work to refine, however hopefully this first look at the data we are obtaining is interesting nonetheless.
Statuses Per Day
For each of the charts below, what is shown are the statuses per day for all accounts generated on a particular day. the volume of account registrations (which we estimated at 1 million IDs per 8 hours in a recent post) means that for the more recent datasets the data only covers accounts created over the period of a few days. In Tableau, these ratios are generated by the formula: ([statuses_count]/((**creation date**-([created_at]/1000))/84000)), where the creation date is the unix time at which the data was collected. Given that, at the time, this process took some hours to run, there is an element of imprecision here, however in the grand scheme of things the impact should be minimal.
Followers Per Day
These visualisations show an increased numbers of followers per day in new accounts, however that is to be expected given that there is a finite number of twitter users, and newer accounts will show an increased velocity of followers.
Followers Per Day (Less than 2000 – Excludes 13 accounts)
This is a slightly more interesting visualisation than the one above, as it excludes accounts including @BarackObama which heavily skewed the scale. Highlighted here is that newer accounts often have a lower rate of followers/day than those a few months old. Also interesting is that a number of the early accounts maintain high followers/day ratio, which perhaps speaks largely to the identify of those early adopters, such as news organisations and Starbucks which was discussed in a previous post.
Friends Per Day
This data has a similar caveat to that given above, in that newer accounts are likely to follow more people in their early days than once the account has aged. Nonetheless, as with the followers chart above, the drop-off on the more recent accounts is interesting, which perhaps suggest that the following of large number of users is not an immediate process, but comes having established a presence on Twitter for a period of months.
Friends vs. Followers
Having established that baseline data, I also mapped these ratios against each other. For example, here is Followers per Day vs. Friends Per Day, colour-coded for data-set (the darker the colour the newer the account, the exact breakdown can be seen on the colour key). I’ve again cut out extremities in both the followers and friends count to show the majority of the user base:
Statuses vs. Favorites.
One of these charts stood out however, and sparked my interest for future work. Anecdotally, several have commented on the recent increase of ‘favoriting’ on Twitter, with some users appearing to adopt the feature as equivalent to the Facebook Like. The below chart, which shows Statuses Per Day graphed against Favorites per day shows an interesting pattern in the bottom left corner. In this chart, light blue is the ‘new data’ (11 million IDs from c. December ‘12), red are the ‘first million’ Ids, and brown are the ‘recent million’ (e.g. March ’13).
As you can see, there appear to be a significant number of users who favorite as many as 30 tweets a day, but don’t post new content; thus mimicking the ‘lurker’ behaviour identified on other platforms. Additionally, the data may show a rise in favoriting, in that while there are a chunk of early users who are averaging over 10 favorites/day (and we can’t tell for sure whether this is a lot of favoriting recently or a steady rate over the years), amongst more recent users we are seeing favorite counts of up to 70/day, suggesting this may be a more recent phenomenon.
And, as the first chart above shows, there’s a lot more data to collect and analyse!