Twitter is widely used in Australia, but we don’t actually know such a great deal about the structure and dynamics of the Australian Twittersphere. Back in 2011/12, our research began to identify Australian Twitter users and map their follower/followee connections in order to develop a better understanding of the structure of the network and from this determine some of the key themes and topics driving activity in the Australian Twittersphere, and we’re currently in the process of substantially extending this work. In this post I’m starting to share some first findings from this work.
First things first: here’s our methodology for getting to this point. Over the course of several months in 2013, the tools developed by our data scientist Troy Sadkowsky used the Twitter API to access the publicly available profile information for each account then in existence; we simply pinged every user ID from 0 through to (at that point) upwards of 2 billion, and recorded the information returned. This resulted in data for some 750 million accounts – the size of the global Twitter userbase (or more precisely, account base) around September 2013. (We’ll share some analysis of the global trends in Twitter account sign-ups in a separate post in the near future.) This comprehensive snapshot of global Twitter accounts provides us with an opportunity to go looking specifically for Australian users. To do so, we drew on three key elements of each user profile: the free-text profile description and location fields as entered by the account creator, as well as the profile timezone they chose from the pull-down menu of presets offered by Twitter. On the basis of the latter, we selected all users who had chosen one of the eight state-based Australian timezone options, while for the former two fields, we developed a long list of search terms relating to Australian towns, cities, and states, and to Australia itself, using a number of common variations. Any account that matched our criteria for “Australianness” in any of these three fields has been included in our selection. To go through the full list of search terms would take up another post, but we worked with a list of the 50-odd largest cities in Australia, added in a handful of popular variations, included the state names and their abbreviations, and also used terms such as “Australia”, “Stralya”, “down under”, and others. Following a test run, we further refined these terms, to include popular misspellings (“Austalia”, “Tasmainia”) and remove false positives. This turned out to be a somewhat time-consuming exercise: many place names in Australia are re-used from Europe (“Perth”, “Ipswich”) or duplicated in other new world countries (Brisbane, California; Victoria, British Columbia); some Australian place names also appear in popular media (some users claim to be from the “City of Townsville” or indeed the “Ciudad de Townsville” in homage to the Powerpuff Girls, or from Finding Nemo’s “42 Wallaby Way, Sydney”). Where possible we’ve filtered out any false positives which could be clearly identified. In the end, this process of filtering the total dataset of over 750 million Twitter accounts left us with some 2.8 million accounts whom we are confident to classify as ‘Australian’ for the purposes of this study. For many of these, we are also able to assign a likely state and/or city, based on which of our search terms helped identify the account; here, we give greatest credence to the information contained in the location field of the Twitter profile, followed by description and timezone. Where we identified users only based on their timezone, we have assigned a state, but have refrained from assigning them to the state’s capital city. Inevitably, some false positives will remain in our dataset, and some accounts will be miscategorised – “Sydneysider now living in Melbourne” or “Australian in New York” may lead to false location assignments, and descriptions like “Korean student in Brisbane” or even “Dreaming of travelling through Australia” would have matched our search terms, but do not relate to the accounts of Australian users in a narrow sense. However, given the size of the total dataset our best-match approach using automated processes is the best option available to us, and I’d guess that some 90-95% of the accounts we’ve matched are genuine Australian users: either Australians in Australia, Australians elsewhere in the world, or non-Australians living in Australia. The outliers from this population are likely to show up in our further analysis, too. There will also be some false negatives, of course: accounts which give no indication of their Australian connections anywhere in their location, description, or timezone details (including users who have filled in none of their profile details at all). It seems likely that the greatest number of these will be amongst the most recently registered accounts (whose owners may not yet have had a chance to fully customise their Twitter settings), so we’ll largely ignore this group for now – we’ll re-run our survey of the total Twitter userbase at some point in the future to examine how these accounts may have developed, as well as to gather data on the accounts which were created after our initial data-gathering exercise finished in September 2013.
By the end of August 2013, then, the Australian Twittersphere included some 2.79 million accounts, by our criteria. Per capita, using the Australian Bureau of Statistics’ figures for September 2013, this would translate to a 12% sign-up rate, though that figure must be viewed with some caution: some Twitter users will operate multiple accounts (e.g. for private and professional use), while in other cases several users will share the same group account. This is why we’re careful here to speak of 2.79 million accounts, rather than users. This figure is in line with existing reports and guesstimates for the size of the Australian Twittersphere, if somewhat below the 4 million Australian accounts that Twitter, Inc. itself apparently boasted some months ago. Figures from the company itself should always be taken with a grain of salt, of course; they’re largely released for corporate promotion reasons, and may well reflect the total number of Australian-based accounts ever created, rather than the number of accounts which are still in existence at present (which is what we measured). On the other hand, there is also an unknown number of accounts which our methods would not identify as Australian, based on publicly available profile details, but which Twitter, Inc. (which would have identified the IP address from which a Twitter account was created) would classify as Australian. This also explains some of the discrepancy in numbers. Here’s how that population has grown month by month over the seven years covered by our dataset (click on the images for larger versions): From a slow start over the first couple of years (which is similar outside of Australia, too), there’s finally a sudden and rapid rise in new registrations per month in early 2009, peaking at over 100,000 new account registrations each in March and April 2009. (And there may well have been more than this: the 100,000+ accounts we see joining in these months are only those which were still in existence when we gathered our data in late 2013, of course.) From this early excitement, things slow down considerably towards the end of 2009 – and then trends start to point upwards again: the average number of new accounts joining per month during the following years is somewhere around 40-50,000. Finally, there is a substantial increase in new registrations in August 2013; this may be partly related to the impending federal election, but probably also reflects the fact that Twitter’s spam bot-checking systems may not yet have had a chance to remove any offending new accounts. We should also note, though, that what our data cannot (yet) tell us is the number of accounts which are being deleted each month, and how those deletions compare to the influx of new accounts. We’ll have a better indication of this after the next iteration of our survey, which will allow us to examine the discrepancies between the two datasets: accounts present in the September 2013 dataset but absent from the new iteration must have been deleted (by their owners, or by Twitter, Inc.) in the meantime. State-by-state patterns vary quite considerably at times. There are unusual spikes in ACT and Queensland account registrations between April and September 2012, for example which do not appear to be motivated by specific local events; ACT sign-ups per month rise from below 1,000 to over 4,000 accounts during that period, for example. From a preliminary review of the accounts which joined during that time, it appears that a considerable number of them belong to fans of The Janoskians, One Direction, and other teen bands, so perhaps there was a concerted effort by some of these bands to get their fans on Twitter? Other spikes are clearly driven by more sinister motives. The large spike in generically ‘Australian’ accounts in January 2013 is caused almost entirely by a large number of spam bots being created at virtually the same time, for example: of the 1,106 new accounts on 16 January 2013 alone, we counted 170 accounts claiming to be “Australia’s support member for the Global Information Network”; 153 offering “Australian Business for Sale listings”; 155 promoting “software and services in Singapore. Australia. China and Japan”; and 164 accounts claiming to be an “Independent Mortgage Broker in Australia” – that’s almost two thirds of the ‘Australian’ accounts for that day. Clearly Twitter’s spam account filters still have some way to go. But genuine events in the world also result in increased sign-ups. During the first quarter of 2011, for example, we see a considerable spike in new Queensland-based accounts on 11 and 12 January, as floodwaters threaten inner-city Brisbane, and during the following days; in Victoria, New South Wales, and other states the sign-up rate also increases notably. Similarly, as a devastating earthquake hits Christchurch, New Zealand, on 22 February, Australians also sign up in larger numbers than usual. The pattern does not repeat (other than perhaps in Queensland, once again) following the 11 March earthquake and tsunami on the east coast of Japan, however. The graph above also shows a considerable dip in new registrations on 18 February 2011 – this may well be due to an outage in Twitter’s account registration systems. The geographical distribution of these accounts should necessarily be treated with a certain degree of caution, given the vagaries of correctly identifying cities and states from the free text provided by users in the location and description fields. However, the patterns we’re able to determine from our best guess at the likely location of each user do reflect both the overall distribution of the Australian population and the relative likelihood (based on infrastructural and socioeconomic factors) of local residents joining Twitter that we would expect to see: The major population centres are clearly leading the way. Sign-up rates per capita seem to be strongest in the state capitals and on the Gold Coast, but this may be an artefact of our approach, which focussed on identifying mentions of the 50-odd major population centres in Australia in the location and description fields of users’ Twitter profiles. Because of the greater national and international recognition of such centres, city users may state that they’re from state capitals while those from small rural and regional locations might just mention their state. In a further iteration of our work, we’ll check against a longer list of localities in Australia, and the patterns may well change. We’re on more solid ground when we examine the sign-up rates for each state. This aggregates users who name specific cities with those who only specify a state, and accounts for some 2.4 million of our total 2.8 million identified accounts – about 420,000 accounts we identified as ‘Australian’ referenced only generic terms (“Australia”, “down under”, etc.), but did not include any more specific location details. For most states, the sign-up rate ranges between 8 and 11 per cent, with Queensland and (perhaps somewhat surprisingly) the Northern Territory taking the lead of this group. There are likely to be any number of factors which have resulted in these slight differences in Twitter adoption across the country; for Queensland, for example, the well-publicised utility of Twitter during recent natural disasters may well have contributed to an above-average take-up. If the 420,000 accounts which we could not allocate to any specific state were distributed proportional to the states’ population figures, this would boost each sign-up rate by another 1.8 percentage points, incidentally. But the major story here is of course the ACT, which records a whopping per capita take-up rate of 30%. We’ll have to look more closely into what factors are responsible for this pattern – but so far we have not seen any indications that an unusually large number of false positives have slipped through our net. There are, however, unusually many accounts whose only identifying feature is their ACT timezone setting, and it is always possible that people from other UTC+10 timezones (for example in the northern hemisphere) might have chosen the ACT timezone rather than searching for their own options in the pull-down menu available on the Twitter site. Another factor that might drive the abnormally high number of accounts with some relation to the ACT is a combination of the socioeconomic make-up of the ACT population, and the fact that (as the seat of the federal government) there will be a very substantial number of organisational accounts, politicians, journalists, public servants, and other likely Twitter adopters in Canberra and surrounds. Additionally, there may also be a significant discrepancy between the number of formally registered ACT residents and the number of people who actually live and/or work in Canberra at least part of the time. If we break down state numbers per city, the capital cities unsurprisingly account for the majority of Twitter accounts. There are also many accounts for which a city couldn’t be determined – these are accounts which merely chose an Australian timezone, which named only their state in the location or description field, or which stated a location other than the 50-odd most populous Australian cities we searched for. Further, though, it is also notable that Queensland’s Twitter population appears to be most geographically dispersed: in addition to the Gold Coast (which is a major population centre in its own right, of course), it also boasts the widest range of other centres with Twitter userbases numbering above 1,000 accounts. This is largely reflecting the population distribution across various regional centres in central and far north Queensland, but may also point to the useful role Twitter now regularly plays during Queensland’s summer storm season. So much for a first overview of the overall figures. Over the next months, we’ll delve much more deeply into the patterns which this massive dataset of Australian Twitter accounts reveals – and we’ll also develop a number of approaches to mapping the follower/followee networks of this Twitter population.