Update: revision 1.2 of metrify.awk is now available (still at the link below), and introduces some further functionality, which is outlined here.
This is the final instalment of my four-part introduction to the metrify.awk script for generating detailed metrics for specific Twapperkeeper/yourTwapperkeeper hashtag archives. Over the last couple of posts, we’ve mainly dealt with overall stats for the hashtag, as well as for specific, definable percentiles of more or less active users. Finally, now, it’s time to look more closely at patterns within the overall userbase.
For this, we’re using the final (and by far the largest) data table which metrify.awk generates. To produce a full table, by the way, the skipusers=1 command-line argument must not be specified this time around – otherwise the only per-user data which metrify.awk will output is each user’s number of tweets. With skipusers off, on the other hand, we get a great deal more – but a word of warning: for large datasets, processing times can also increase quite considerably. For each user, metrify.awk tracks which user percentile they’ve been assigned to, how many tweets they’ve sent and received (in the form of public @replies or retweets – note that this does not include any non-hashtagged tweets, which would not be included in the original dataset, of course), as well as how these sent and received tweets break down into our by now familiar categories:
- original tweets
- genuine @replies
- unedited retweets
- edited retweets
as well as
- @replies received
- genuine @replies received
- retweets received
- unedited retweets received
- edited retweets received
(with these metrics again provided both as a total number, and as a percentage of all tweets sent or @replies received, respectively). Again, with the exception of URLs, these will add up to the total:
- edited retweets + unedited retweets = retweets
- retweets + genuine @replies = @replies
- original tweets + genuine @replies + retweets = total number of tweets
as well as
- unedited retweets received + edited retweets received = retweets received
- retweets received + genuine @replies received = @replies received
But wait, there’s more – we can also calculate the ratio between these incoming @replies and the tweets sent by the user, to get a sense of the resonance of their activities:
- @replies received : total tweets sent
- genuine @replies received : total tweets sent
- retweets received : total tweets sent
- unedited retweets received : total tweets sent
- edited retweets received : total tweets sent
So, let’s see what these data tell us. In the first place, let’s look more closely at that small group of highly active users: here’s a graph for the top 150 most active participants (i.e. slightly more than the top 1%):
We see immediately that even amongst this top group, there’s a very pronounced long tail distribution: just two #auspol users (you know who you are) contributed more than 10,000 tweets each, and a total eight contributed more than 5,000 tweets each. Beyond those hyper-active few, we’re quickly dropping down towards the just over 500 tweets achieved by each of the users at the end of that top 150 (and further as we move into the second and third percentile groups). Additionally, the graph above also shows a breakdown of those tweets into original tweets, genuine @replies, and retweets – and remarkably, the lead user here achieved their position mainly by sending copious amounts of @replies…
The total activity distribution across all 14,133 active #auspol participants, by the way, looks like this:
An extreme activity distribution if ever I’ve seen one!
But of course, tweeting a lot is only one side of the coin on Twitter: if nobody is reading (and responding), the user’s influence may still not be particularly great. So, instead of tweets sent, we can also examine the @replies received (showing the top 150 users again):
This gives us a much better idea of who’s central to the conversation, I think: these are the users receiving the largest amount of @replies and retweets (and in this case, mostly genuine @replies, which is remarkable in its own right). It should be noted that – since we’re looking at @replies received here – this list may also include users who are only mentioned, but never actively participated in the hashtag; in the case of #auspol, this includes accounts like @JuliaGillard and @TonyAbbottMHR, for example, both of whom are present in the top 50 @reply recipients.
For those users, of course, it’s impossible to calculate the ratio of @replies received to tweets sent (since they didn’t send any) – but for the rest, that ratio may also be valuable, as an indication of what we might call resonance. A user receiving a great number of @replies (whether genuine @replies or retweets) for a comparatively small number of tweets could be said to have substantial resonance; a user tweeting a great deal, but receiving few @replies in return for their efforts, has relatively little resonance.
There are plenty of different ways to examine such resonance, using the different metrics which metrify.awk provides us with; as one example, here I’ve plotted the ratio of genuine @replies (i.e. non-retweets) received per sent tweet against the total number of tweets for the fifty most active users:
For the very lead users, then, their resonance rating isn’t actually all that great: the top user receives a genuine @reply roughly only for every second tweet they’ve sent, and for the next most active users this gradually increases to a 1:1 ratio. A handful of others, on the other hand, break through the parity barrier, receiving (on average) more than one genuine @reply for each tweet they’ve sent. Remarkably, though, one user in the top fifty even received an average of more than two genuine @replies for each of the over 2000 tweets they contributed to #auspol!
(Again, I should stress here that we’re only counting those @replies which are contained in our dataset – which in this example means @replies which were themselves tagged with the #auspol hashtag. In the absence of comprehensive data on non-hashtagged Twitter traffic we have no way of knowing how much non-hashtagged follow-on communication may also have occurred – our measures of tweet resonance, therefore, only measure resonance within the hashtagged conversation.)
Phew – well, with these posts at least we’ve started to scratch the surface of the Twitter metrics which metrify.awk can generate for a given dataset. Exactly how any of these metrics may be used in any specific case depends on the research questions to be examined, of course. Go experiment – and let me know if there are other metrics which we could add to the script as well!