We’ve been neglecting the blog a little – not because there hasn’t been anything worth writing about, but rather because there’s been too much going on. So, before our big trip to Europe in August and September (more on that soon), it’s time to clear the backlog of updates. And what better way to start than with an early map of Australia. No, we’re not talking about ancient seafarers’ maps here (though there are some similarities): part of our aim with the Mapping Online Publics project has always been to develop a better understanding of the Australian Twittersphere – to go beyond the observation of individual hashtag conversations, and to examine the overall network of Australian Twitter users (similar to what we’ve started, and are also continuing, with the Australian blogosphere).
So, over the past few months we’ve worked with our project partners at Sociomantic Labs in Berlin to identify as many Australian Twitter users as we could find, and to trace their networks of followers and followees. The core problem in this is to define what constitutes an Australian user, of course – here, we’ve been relying on a combination of the timezone they’ve set for themselves (e.g. ‘Brisbane, GMT+10′), the location they’ve started in their profile, and other characteristics. This isn’t without its drawbacks, of course – some users may never have set their profile information; some have even deliberately set their details ‘wrongly’ (following the disputed Iranian elections, some users set their timezone to Tehran time, for example, to show sympathy and/or confuse Iranian authorities trying to find the accounts of local dissidents); some use non-standard descriptions of their location (Brisvegas, Brisneyland) or are in Australian cities whose names also occur elsewhere (there’s a Toronto, Texas, and Bolivia here, and any number of suburbs called Paddington). And some users are simply very confused – quite a few users with timezones set to GMT-10 should have chosen GMT+10, and vice versa…
Still, in any large human network, such messiness will always occur, and it’s not stopped us from finding a very large number of genuine Australian users already. We will be missing a few, no doubt, and include a few who aren’t Australian in any sense of the word – but that doesn’t stop us from identifying the overall shape of the network pretty well. To do so, what we’ve done is to start with seed lists of users drawn from some of the particularly ‘Australian’ hashtag communities we’ve examined in the past – such as #qldfloods, #ausvotes, and (the Australian version of) #masterchef; here, we can reasonably assume that the vast majority of participants are Australian(-based) users. We’ve checked them against our criteria, and identified the follower/followee networks of those who passed the test; from there, it’s a simple snowball process of moving further and further out: to followers, followers of followers, and so on.
That process is slow and data-intensive (not least given Twitter’s access rate limitations), and far from complete; so below I’m presenting no more than a progress update. Think of it as one of those early explorer maps: the east coast may be pretty accurately charted already, and bits of the northwest are already sketched as well, but we’re not sure yet whether we’re dealing with a number of islands or a continent, and whether there’s an inland sea after all.
Our work also begins to provide some answers to the question of how many Twitter users there are in Australia in the first place. There’s been some significant variation in the numbers we’ve seen in the past – from a few hundred thousand to more than two million – and nobody seems to have any clear idea as yet. Our work to date has found some 550,000 Twitter users which pass our ‘Australianness’ test, but the snowball crawl is still ongoing, and my best guess so far would be that we’ll end up between one and two million in total – we’ll update this further as we go, of course. I should also note in this context that what our method will find are only those users who have a connection to another Twitter user (i.e. who follow or are followed by someone) and who haven’t set their accounts to private, so there will be some undercounting here; at the same time, what we’re missing this way are likely to be mostly those accounts which aren’t being used much, anyway.
But enough of the preamble – time to get to the maps. Of the 550,000 users we’ve identified so far, we have reliable data for about 450,000 (follower networks for the rest are still being retrieved), with over 4 million follower/followee connections between them. We’ve mapped them here (using Gephi‘s Force Atlas 2 algorithm), with node size showing indegree (how many followers a user has) and colour showing outdegree (how many others the user follows). The map shows only the users themselves (positioned close to the people they’re connected with); drawing in all the connections between them would have just resulted in a massive yellow blob. First, here’s a zoomable map showing all 450,000 users (full PNG – 8000×6000, 7MB – available here):
Already, obvious clusters emerge from this depiction. They become clearer (and change their position slightly) when we display only those users who have more than five incoming connections (i.e. followers) – which leaves us with ‘only’ 150,000 users (full PNG – 8000×6000, 10MB – available here):
The effect of reducing the network down to the more strongly connected users is to bring out those clusters more. Removing the users who have very few followers essentially takes out the yellow fog between the clusters; by removing them, we’re not losing much information, as these rarely-followed users are close to being Twitter lurkers anyway.
To begin to get a sense of what may be behind the clusters we’re seeing, I’ve looked only at the usernames involved so far – that’s a very limited approach, of course, but it does already provide us with some indication of what’s going on here. For example, the clouds of users towards the top right, at some distance from the overall network, are dominated by accounts whose names are related to Adelaide (or South Australia), and to wine – with accounts such as @SouthAussieWine providing an obvious link between the two. On the opposite end of the map are a number of clusters related to sport in general, as well as specific sporting codes – including the various forms of football played in Australia, and clustered around a number of key accounts representing specific sportspeople (e.g. Shane Warne’s @Warne888) and sports media. Here’s an annotated version (full PNG – 8000×6000, 20MB – available here):
It’s important not to read too much into these maps just yet, however – for example, the whole top left quadrant of the map is taken up with news, journalism, and politics, but this may simply be a result of the seed lists of Australian users that we started with: given that the #ausvotes population was one of our starting points, that region of the Twittersphere may simply have been especially well mapped so far. That said, the strong representation of professional fields – journalism, media, public relations, etc. – does seem to fit the continuing perception of the Twitter population as centred around urban professionals in the 25-45 age bracket. By contrast, music and fashion are relatively disconnected from the bulk of the network, such as we can see it so far, as offshore clouds towards the bottom right – and there’s an even more distant satellite cluster of users whose focus is on teen culture: the teen magazine TV Hits and the Australian teenage band Short Stack are central to this community.
(Additionally, there’s also a far outlier at the top of the graph – or in the bottom right corner of the full network graph, above -, containing users from the Philippines, which may need an explanation. Here, we’re coming up against the limits of our methodology: as it turns out, Twitter does not offer Filipino users a choice of their own timezone – GMT+8 for Manila -, so at least some of them seem to have set their account timezones to ‘Perth (GMT+8)’ instead; this, in turn, leads to some of them being misrecognised as Australians by our data gathering tools. Being largely disconnected from the main Australian network we’ve identified, they’re easily identified as false positives, of course.)
So much, then, for a first look at the overall Australian Twittersphere. As we move forward from here, we’ll start to have a closer look at some of these clusters, to understand what the leading accounts are, and to determine some of the underlying metrics of the network (just how isolated or overlapping are these clusters, for example – do sports and arts have any overlaps, or teens and politics?). And of course we’ll be very interested to see whether the further connections data which will be gradually added as we continue to snowball through the network will fundamentally change the shape of the map, or whether in this first sketch we already see the overall outlines of the continent. Many of the outlying islands which are visible in the first map, but disappear in the reduced map, will also still need to be explored more fully. Exciting times!