Processing Twitter — Darryl Woodford, 8 April 2013
The First Million IDs on Twitter

Following on from Friday’s post, in which we looked at a number of recent accounts on Twitter, this post considers the first million Twitter IDs.

When did they join?

IDbyTime

As you can see from the above graph, which shows Account creation date along the horizontal and ID along the vertical, a spattering of accounts were registered from late March to July 2006, with the first at approximately 20:50 on 21 March, ID#12. It is worth mentioning here that these are only accounts that are still active, and so it is impossible to access data for those which are no longer active. A slight increase in the registration rate of users occurred between November and December 2006, while late December ‘06 to early January ‘07 saw a sharp increase, corresponding to US publicity, which tapered off to a steadier rate until March ‘07 at which point we see a second publicity driven spike which took the IDs over the 1 million range. It is worth noting that of these 1 million IDs, only 48,546 accounts remain active.

Where in the world (is Twitter user x)?

map

This map contains some of the limitations discussed on Friday; namely that it is created by time zone, although interestingly does not show the same bias toward alphabetically prominent time zones such as Amsterdam as is present with new users. Also interesting here is the prominence of Italy, with about 4% of the active accounts coming from Italy, far higher than any non-English speaking country, and higher than Australia. The US is dominant, with well over 50%, however the neighbour to the north, Canada, accounts for only 0.3%, again a sharp contrast to the newly created data. Other hotbeds of early Twitter activity (those with over 1,000 of the 40,000 accounts) are limited to Australia and the United Kingdom.

What have they been up to since 2006?

StatusbyID

The above chart shows total statuses posted, and shows some interesting patterns; whilst there are a few users with 400,000 plus tweets, the majority have managed to restrict themselves to 200,000 or less across the past 7 years, with the majority clustered below 50k. As we saw with the join times graph previously, there are a number of missing IDs (and thus 0 statuses) around the middle of the chart. The below chart, in which users are placed into ‘bins’ of 100,000 IDs again shows a fairly average status count among the users, suggesting that those early users whose account is still active (i.e. hasn’t been deleted) have tweeted more-or-less the same number of times as those joining during one of the publicity cycles.

StatusbyID-bin

Followers and Followees

followersbyID

Here, I have removed eight data points from the visualisation, which shows the numbers of followers by account ID; users with 29.3m (Barack Obama), 16.5m (Twitter themselves), 7.67million (New York Times), 7.5 million (CNN), 3.5 million (Starbucks), 3.2 million (BBC World), 3.17m (Mashable) and 2.6m (TechCrunch) followers. One random note from removing these is that Twitter themselves have an ID in the 783,000 range, while Starbucks are in the 31,000 range – clearly an early priority of the Twitter developers was not to create a corporate identity for themselves!

Again, we see a large number of IDs in the centre of the graph with little activity, replicating previous data, with two more populous clusters to either side. Here though, the later users show a marked increase in connectedness over those on the left side of the graph. The very early adopters (perhaps those who were in some way connected to a member of the development team), while tweeting regularly, may then be less connected than those tech aficionados who joined during the early phase of publicity.

followingbyID

The above diagram shows the number of accounts a user is following. Here, two accounts have been cut from this diagram for visualisation purposes; one following 665,279 users (Barack Obama) and the other 229,915. Otherwise, we see a fairly similar pattern as with statuses and followers, the missing IDs in the middle, with users to the left and right having fairly similar distributions to the followers graph above, re-enforcing the suggestion that later IDs seem to be more connected than the early users.

Overall then, an interesting distribution of early Twitter uses, which is in some ways similar and some ways different from the more recent users discussed on Friday. Now just to fill in the missing hundreds of million!

About the Author

Darryl Woodford

Darryl Woodford is a Postdoctoral Research Fellow in the ARC Centre of Excellence for Creative Industries & Innovation, based at Queensland University of Technology. His research includes works on the video game and gambling industries.

Related Articles

Share

(0) Readers' Comments

Comments are closed.