Data journalism has been a media buzzword for a while. Despite the fact, a lot of tasks are computerised, it remains a difficult discipline for reporters to master.
At the news:rewired conference, today in London, Helena Bengtsson, data projects editor with The Guardian, Martin Stabe, head of interactive news with the Financial Times, and Sophie Sparks, team leader EMEA for Tableau Public, shared some insider tips:
There are several ways to get your hands on data. There are licensed sets of data from organisations like Bloomberg and Reuters, and also various open feeds of public data. A newsroom could also create their own data set by asking its users questions, scraping the web for structured data, or connect with APIs.
During the Panama Papers research, Helena Bengtsson said she had to scrape a lot of extra data sets to match with the information gathered. She gathered lists of businesses and people by scraping websites, financial reports, and rich lists, she said.
Data cleaning and structuring
“People say tools now make data journalism easy to do, but as the Panama Papers showed it isn’t,” said Martin Stabe.
Computers have made a lot of stories possible, but at the same time there is still a limit to what they can do to help. A lot of the work still had to be done by hand, as Bengtsson explained.
“I would have loved to get the Panama Papers in ten year from now, when we know more about data processing and language processing, because I think there are a lot of stories still there, that aren’t found yet,” she added.
What many people don’t realise is that data from spreadsheets is often unstructured. A lot of the Panama Papers data needed to be cleaned up. There also had to be made a choice what to clean and what to leave aside, Bengtsson said.
“We weren’t able to structure all these emails, because then we would be busy till 2020,” she added.
A question on many people’s lips was, how do they find a story from a huge set of data like the Panama Papers. Bengtsson explained most of the stories were fishing expeditions. They approached the data with certain search terms or methods and hoped it would bring a story forth.
One of the methods was matching long lists of MP’s, the rich and wealthy, and world leaders with the dataset to figure out similarities.
“To compare and check if there are similarities, my computer needed three days to go through the data,” she said.
Before being able to visualise data, it is important to figure out the best way to present it. Not every visualisation works well for every type of data.
“It is important to understand the guidelines, don’t be ruled by them. Once you know them, you can bend the rules”, explained Sophie Sparks.
She gave five tips to visualise data:
- Bar charts are to show discrete blocks of data
- Line charts show trends “They are great at visualising continuous values like time.”
- Don’t overdo in slices for pie charts (2-3 at most) “I really love pie charts, but some people really hate them.”
- Use a Scatter-plot to look at two continuous variables “It is great to show the correlation between two variables.”
- Maps are great, but hard to see smaller countries. “You often end up showing how big countries are and not necessarily anything to do with data.”
While creating visualisations, it is important to always keep the reader in mind, said Stabe.
“All of our referendum graphs went through an extensive design process,” he said. “Not only look and feel are important, but also what questions will the reader be asking.”
It is also important to think of how people will use it. Online, you also can’t expect your visualisation to function as part of a whole.
“In a newspaper you have all your furniture around the story, online you have to consider if it can work on it’s own as well,” Stabe said.
Readers might end up taking screenshots to share and the visualisation has to be able to “exist without all the elements around them,” he added.