Mapping the Australian Blogosphere Some More

My previous post outlined a few more steps I’ve taken in cleaning up our emerging dataset of links in the Australian blogosphere (current limitations of our data are also listed there). It’s time to take those cleaner data for a spin, then. Beyond mapping the interlinkages between our known blogs during the period of 17 July to 27 August 2010 (roughly coinciding with the Australian federal election campaign), as I did a couple of posts ago, I’ll now work off the cleaned dataset which contains only those links which:

  • originate from those sites in our list which we have confirmed to be (independent or professional) Australian blogs; and
  • point to sites which are more than merely functional (i.e. sites which aren’t on tge destination filter list at the bottom of my previous post).

What I’m especially interested in as I work with these network data is:

  1. Which non-blog sites appear prominently in the network, and in what contexts; and
  2. which blog sites appear to serve as connectors between the various components of the overall network.

So, feeding the network data (close to 3.4 million links) into Gephi and filtering out any sites which don’t at least receive ten incoming links from anywhere in the network, here’s what we get (PDF here):

Node colours in this graph indicate the number of incoming links a site has received (indegree); node sizes indicate betweenness centrality within the blogosphere (put simply, how important a node is in connecting parts of the overall network of blogs). Amongst the darker nodes in the network are some of the sites we’d expect to figure strongly in any larger network: in decending order of indegree ranking,,,,,,,,,, Most of them are linked to frequently by sites across the entire network, which is why a number of them are placed in the no man’s land at the centre of the map, between the main clusters of closely interconnected blogs.

None of these sites are blogs, of course – and since this network graph is based purely on links originating from blogs, it does not contain any links originating from these sites. So, their betweenness rating remains very low – put simply, these sites are not part of the Australian blogosphere, and so by definition they can’t be important connectors between individual subsections of the blogosphere. That’s not to say that Google or other sites aren’t providing important pathways for people to find their way from one blog to another – but if we’re interested in tracking interconnections purely within the blogosphere, without leaving it at any one point, they can’t be counted.

Genuine interconnectors within the blogosphere start to appear only somewhat further down the list. These are the larger nodes in various shades of orange – in descending order of betweenness ranking:,,,,,,,,,,,,,,,,,,, and These, in essence, are those blogs which have managed to make a name for themselves beyond their immediate community of like-minded bloggers, and (for readers following the links between blogs) might serve as entry points into these clusters.

Here’s a version of the previous graph with node labels added (grab the PDF if you can’t make out some of the small print):
With the labels added, it also becomes a little easier to work out what the themes of the clusters are. Largely, they match the clusters which we’ve seen in our previous visualisation of these data – but as the previous graph contained only the known blogs, and this one also has the destination sites which they commonly link to, they’re distributed here a little differently, of course. (Once again, I should note: so far we’re only tracking a subset of the overall Australian blogosphere, of course – so I’m not claiming that these clusters are all that exists in the Australian blogosphere!)

Towards the top left, there’s a large and internally divided group of news and politics blogs; below the centre, there are various food blogs, gradually transitioning into a looser cluster of parenting blogs as we move further outwards and down from the centre. To the right of the centre, around the prominent blog (that’s Meet Me at Mike’s, by the way, if like me you’ve wondered whether this was a blog about meat), there are a range of blogs related to arts, crafts, and design. Further to the right is a tight cluster of fashion and interior design blogs, and between and slightly above these two clusters is a looser grouping of what appear to be style and possibly wedding blogs (note that I’m going only by the blog URLs for now – we’ll look more closely at the content of these blogs at a later stage).

It’s probably worth noting, too, that of the major generic destination sites, Facebook, Flickr, and Wikipedia are located more closely towards these latter clusters – Flickr in particular is very close to the food and parenting blogs, which makes sense if a common practice on these blogs is to post (Flickr-based) pictures of successful gourmet creations, for example. Google and Amazon also appear in this context (Amazon is closer to the arts and crafts blogs, interestingly).

By contrast, sites like ABC or Sydney Morning Herald are positioned much closer to the politics cluster, for obvious reasons – these bloggers frequently link to and critique the mainstream news reports of the day as part of their blog posts, of course. Additionally, these mainstream news sites are also closely connected with this cluster because they are heavily interlinked with the blogs and commentary sites run by mainstream news organisations – for the ABC, sites like ABC Unleashed, for the SMH, the National Times and the various Fairfax columnists’ blogs.

Once we allow Gephi’s clustering algorithm to assign node colours, this becomes even clearer (PDF here):
In fact, what this does nicely is single out the tight network of interlinked News Ltd. sites (in green) which sits within the wider news and politics cluster (in yellow). Many of these are likely to disappear from view, incidentally, as we fine-tune our algorithms to get rid of merely functional links that appear in page headers, footers, and sidebars – but for now, they’re here.

Also clearly visible in the graph is the difference between unidirectional links (in grey, curved) and mutual interconnection (in black, straight lines). Especially between the clusters, such mutual links point to the possibility for nodes to act as bridges between different communities within the blogosphere – and in that context, it may be significant that there are a few such mutual interconnections amongst the various clusters on the right and bottom (food, parenting, arts, crafts, design, style, decorating), but none between these themes and the news and politics blogs on the left of the graph. Anecdotally, political bloggers are often seen as relatively single-minded – even narrow – in their topical focus; if this division between politics and other pursuits remains so pronounced as we increase the proportion of the Australian blogosphere whose activities we track, and as our data capturing algorithms become more robust, then our findings would seem to support this perception.

Finally, then, here’s another cluster graph, again with the URLs added (PDF here):
And that’s it for now. As part of our next steps, it will be interesting to see how this network changes outside of election time, of course – and naturally we’ll also be interested to further grow the overall number of blogs we track.

Feature image by the tartanpodcast.

