Analysis Crisis Twitter — Jean Burgess, 22 January 2011

As I’m sure you’re aware, last week was pretty rough for Queensland (and then New South Wales and Victoria), as devastating flash floods ripped through Toowoomba and the Lockyer Valley, quickly followed by extreme river flooding in Ipswich and Brisbane that saw thousands of homes inundated. As in any emergency situation or other ‘acute event’, public communication played a vital role during all phases of the flooding – from warning, to emergency, and – eventually – to recovery, relief and rebuilding.

In this and the related Media Ecologies project in the CCI, we’re trying to understand how public communication is constituted through the operation of the broader media ecology, including social media as well as the full range of other communication technologies and practices that individual citizens have at their disposal. So we’re throwing all the research tools we have in our kit (and developing some new ones) at analysing public communication during the floods – initially through the lens of social media, and particularly, Twitter.

Axel has already posted a first look at some overall patterns of Twitter activity during the most acute period of the event, and at the end of the post asked our readers to nominate research questions and ideas for us to investigate – thanks very much to those who’ve contributed ideas so far. There is much more to do of course, and we’re on the case. In this and subsequent posts, I’m focusing on some patterns in the uses made of various media platforms and sources by Twitter users during the flood.

Despite its relatively low levels of take-up population-wide, Twitter is a useful place to begin research into crisis communication. It allows individual citizens to share their own experiences and feelings in real-time; and as was dramatically highlighted in the 2010 Australian Federal election as well as in the past week, it also serves as a filtering and distribution space for a wide range of official and unofficial information and media sources; including sources (like radio and word-of-mouth) that aren’t even ‘online’. On a personal note, while marooned for several days on a temporary island with the luxury of electricity and an internet connection but surrounded by flooded neighbours with neither, I also learned that Twitter can serve as a very useful conduit between the outside world, local radio, eyewitness accounts, and word-of-mouth.

In the remainder of this post, I’m going to take a look at the most used media sources and platforms during the floods; and then drill down from there a little bit to get a sense of how these patterns shifted as the nature of the situation shifted.

I’m working with a Twapperkeeper archive of tweets containing the #qldfloods hashtag, downloaded on 20 January, and containing a total of 39096 tweets. (Unfortunately there seems to be a chunk of data missing from the 17th/18th, but I’m still hoping it’ll show up. Update: as of 25/01/11 the archive seems to have been backfilled. ) The most logical way in to understanding what other media were being used is to extract, count, and then have a good look at the links within the tweets themselves, so that’s what I’ve done.I first extracted all the URLs into their own column, then resolved all the shortened links to their target URLs, and then stripped those long URLs back to the domain name – preserving the links at each step. If anyone wants to understand, critique or replicate the process, see the technical details at the bottom of the post.

It turns out that out of 39096 tweets captured in the archive so far, there were 15674 instances of links being included – which would be around 40% of tweets containing links if there were a one-to-one relationship. Even given that some tweets contain multiple links, that’s a pretty high percentage. To my mind, this might be an indication that sharing and passing along information (rather than sharing personal opinions) was a pretty high priority for people using the #qldfloods hashtag.

As a quick aside, I wondered how these numbers might compare to the data we have previously captured and analysed for a very different kind of national ‘event’ – the 2010 Federal election conversation using the #ausvotes hashtag.

Extracted = Tue Dec 7 (six months worth of data)
Tweet count = 514103
URL count = 93268
‘Percentage’ = 18%

The time periods aren’t really comparable, but maybe this is a very faint first indication of how to understand the (significant) differences between different types of mediated ‘events’ – for one thing, undoubtedly there was a lot more room for and expectation of personal opinion, discussion and analysis during the election? I’ll leave that one hanging for now.

What were the most linked-to media resources?

Using a simple pivot table I calculated the total count for each domain (and its variations), and produced a table showing the 50 most-frequently linked-to domains (use the drop-down box to expand the list):

[table id=6 /]

In this list, I’ve removed the domain names for URL shorteners – even after resolving the URLs, there were a lot of these URL shorteners left behind, mainly because of links at the ends of tweets being cut off as they were retweeted – something users should be careful of in future if they’re trying to pass on potentially life-saving information!

The detailed analysis will come later (and please let me know in the comments if you’re curious about anything in the list), but a couple of observations:

1. One thing that strikes me, and surprised me a little, is the dominance of image hosting services overall, with twitpic taking top spot by a long margin, and yfrog, flickr and plixi further down. I’ve gone ahead and extracted all links to known image hosting services (as well as all the youtube links), and will be taking a closer look at this, but based on a quick browse it seems that first-hand flood images were posted and shared for a range of purposes, including what we might call ‘citizen journalism’ – from the spectacular (“where did Southbank go?”) to the informational (“current river level at X street”) and the mundane (“look at my muddy boots”).

2. Facebook – the dominance of facebook links should give us all pause, I think. Far more popular than Twitter in Australia overall, it too was a platform for a range of media, information and interpersonal communication. As well as hosting images, donation appeals and so on, it was a primary (and as far as I could tell, very effective) channel of emergency communication for the Queensland Police Service. I think this has some pretty big implications.

3. The Queensland Government website features prominently – but primarily due to the high presence throughout the flood period of the official website for the Premier’s Appeal. And by the way, if you haven’t already donated

But nothing hugely surprising – and given the fast-moving nature of a crisis event, not terribly useful either. We are probably more interested in questions like what sources of information and other media content were relied on most at the different phases of the flood event – from early warning, through to the (protracted) emergency situation, and then onto the recovery and relief effort. And one day a long way down the track, there will be memorialisation and collective memory, too.

So i’ve recalculated the counts for these domain names on a daily basis (use the drop-down box to expand):

[table id=7 /]

Now we’re getting somewhere. By breaking this very simple data set down like this, I think we can see a couple of shifts occurring, and I’m prepared to make a few rough and simplistic guesses about these shifts based on a quick scan through the relevant data.

At the beginning of data collection, Australian twitter users (who as far as we know are concentrated in metropolitan areas) were linking to news sources, user-uploaded images/videos, and relief-oriented websites in response to the flashfloods that had just torn through Toowoomba and Lockyer Valley; this activity then merged with news and information sources about the acute emergency occurring as the flood waters rose in Ipswich and Brisbane; while at the height of the flood experience for Brisbane users, the links are dominated by first-hand images, which then subside slightly by the 16th in favour of relief and non-profit websites, and a wider range of news outlets as emergency information gives way to commentary and analysis.

From there, it’s easy enough to backtrack to the tweets and individual links associated with each domain name on a given day for much closer analysis, but I’m leaving it there for the day. Once again can I reiterate our invitation to let us know in the comments if there are particular aspects of the media mix used by the Twitter population during the floods that you’d like to know more about?

Notes on method

For those who need to know, here’s the script to truncate long URLs back to the domain name. It’s a brutal substitution hack of the existing urlextract.awk script, using the simplest regular expression I could come up with – basically, it says “match http:// followed by anything you like, and stop at the first space, question mark or forward slash”. So it isn’t perfect, for example leaving the www. (or lack thereof) intact and preserving subdomains, but works well (and fast) enough for this sort of exercise I think.

Before running this, I prepared the data in the following way:

1. explodetime.awk – to enable me to create day-by-day pivot charts – script and instructions here.
2. urlextract.awk – to place each link in a separate column
3. urlresolve.awk – to resolve shortened links and other referrers to their target URLs – this literally took all day…Axel’s post here explains this link extraction and resolution process.


# urltruncate.awk - strip hyperlinks down to domain name
#
# This script takes a CSV archive of tweets in Twapperkeeper format 
# It strips long hyperlinks down to their domain names

# Run urlextract.awk and urlresolve.awk first
# expected data format:
# longurl,url,text,[other Twapperkeeper fields]
#
# output format:
# domain, [original fields]
#
# Released under Creative Commons (BY, NC, SA) by Jean Burgess and Axel Bruns - je.burgess@qut.edu.au / a.bruns@qut.edu.au

BEGIN {
	getline header
	print "domain," header
	IGNORECASE = 1
}

$0 ~ /http:\/\/([^ \/\?]+)/ {

a=0
do {
	match(substr($1, a),/http:\/\/([^ \/\?]+)/,atArray)
	a=a+atArray[0, "start"]+atArray[0, "length"]

	if (atArray[0] != 0) print atArray[0] "," $0

} while(atArray[0, "start"] != 0)

}

About the Author

Jean Burgess is a Professor of Digital Media and Director of the Digital Media Research Centre (DMRC) at Queensland University of Technology. She is @jeanburgess on Twitter.

Related Articles

Share

(5) Readers' Comments

  1. Pingback: Tweets that mention Mapping Online Publics » Blog Archive » Media use in the #qldfloods -- Topsy.com

  2. As just discussed via Twitter, it would be interesting to group individual websites into bundles of sources (main stream media news items, blog posts, photos) where applicable to see how the relations develop over time.

  3. Fascinating and very useful information, thank you. I’m quite surprised not to see mention of the way SM was used to rally volunteers, which for me was a standout, but doesn’t seem to feature?

    My understanding is that BCW put out a call on BCC Twitter & their related FB page simultaneously for volunteers to go to various depots to fill sandbags. Something like 6 or 7 hours later, something like 7,500 had turned up, and the “no more volunteers please!” tweet went out. 1800 sandbags were filled almost immediately it seemed, another 1800 the next day. Subsequently, about 22,000 (?) volunteers with printed form, ID, gumboots and brooms turned up to be bussed around in the immediate aftermath clean up. WOW! It was my impression that all these folks were identified and rallied ONLY through spectacular SMEM in cooperation by all gov agencies, as you say mainly through FB, but not only QPSMedia, I would say BCC was the game changer, which then spread virally both visably (RT) and invisibly (FB status updates, text messages, phone calls and so on). Maybe it was mainly photos to twitpic at a time when other messages were mainly not on SM at all, or going as DMs? Could that be supported by your data?

    I’m basing this on periodic screen grabs of the BCC website/FB and QPSMedia FB page I took between 12th and 20th, only for personal interest, not-work related. I do know that the QPSMedia site was worked by at least one intern, and would be interested to know if anyone has any data about the knowledge and experience of SM operators and whether that has any impact on the success of the various sites. I only saw one ALL CAPS OVEREACTION!!!! which was quickly deleted and it was BCC..maybe they had interns as well?

    This is only representative of one person’s experience and personal interest, but standing in a poorly lit New York street at the height of the floods, and fraught with worry for immediate family so far away in affected areas, I have to say that GE.TT became my new best friend of all time for at least a few moments – for there, I found and listened to an update by Assistant Commissioner Peter Martin’s uploaded within 15 minutes of a presser that I had unfortunately just missed, and was available to me almost instantly on the iPhone App, in the dark, within 5 secs of my going to look for it. I had it running while I put the key in the door, and went to pick up the landline (VOIP in our house). Twitter and FB are such a great aid to multi-tasking, it’s not a question of this or that, it’s all of it, with as many windows open as possible and multiple # streams, then disseminating to those who are 87, or 3 years old, and filtering it through more calmly!

    I take it the #hashtags arise organically, but the minute I saw QPS using #qldfloods, that was it. I had better info about another small town in another small continent at the tip of my fingertips, than I did about what was happening down the road, where by coicnidence another storm was taking place and the council had shut up shop and gone home. I guess I had the police scanner, but Brisbane’s was better quality, and clearer and cleaner.

    There is no data that can weight appropriately one tool over another as having more value or greater significance – to whom? To me in New York or you in the CBD wanting to know if the road you normally drive is still open. I think what is the most fascinating part to me about SMEM is not how the responders use it, which is mostly visible, but how much of it is passing between private/ public/ and back again, and how difficult it’s going to be figure it all out.

    Thank you for your valuable work.

  4. Pingback: Mapping Online Publics » Blog Archive » Extracting images from Twapperkeeper archives

  5. Pingback: Mapping Online Publics » Blog Archive » Image sharing in the #qldfloods