Yesterday, I went to All Souls College, Oxford, for a data visualisation workshop organised by the Digital Panopticon project.
The project – a collaboration between the Universities of Liverpool, Sheffield, Oxford, Sussex and Tasmania – is studying the lives of over 60,000 people sentenced at the Old Bailey between 1780 and 1875, to look at the impact of different penal punishments on their lives.
It aims to draw together genealogical, biometric and criminal justice datasets held by a variety of different organisations in Britain and Australia to create a searchable website that is aimed at anyone interested in criminal history – from genealogists to students and teachers, to academics.
This is a huge undertaking, and it is no wonder that the project aims to harness digital technologies in making the material accessible to a wide audience. But how could data visualisation techniques help?
The data visualisation workshop – #dpdataviz on Twitter – heard from three academics at Oxford who use data visualisation: Min Chen, Professor of Scientific Visualisation at the Oxford e-Research Centre; William Allen from Migration Observatory; and Arthur Downing from All Souls College.
Min Chen looked at the four levels of data analysis and visualisation:
1. the disseminative level, or “This is…!”
2. the operational level, or “What?”
3. the analytical level, or “Why?”
4. the innovative level, or “How?”
He stressed the importance of considering the audience you are creating visualisations for; you need to identify who they are, and what you want them to get from the visualisation. Good data visualisations, to me, are the ones that you go, “Oh!” at; the ones that move away from simply telling you something to making you feel as though you are discovering something yourself.
However, the three speakers also highlighted various ways in which often complex data can be visualised – such as Downing’s use of network analysis – but also pointed out the dangers of skewing figures by using overly simplistic tools.
Allen, for example, pointed to analysis of the terms used to describe immigration in the British press over a two-year period, showing how some diagrams – such as a bubble chart that focused on a single term – could suggest that one word was used more than another by the media, when a different diagram might show that actually this wasn’t the case.
The workshop then went on to look specifically at the challenges facing the Digital Panopticon team, looking at how elements of the Old Bailey data – such as gender or occupation, the type of crime committed, and where a convicted person was transported to – could be visualised, and the drawbacks of tagged data.
For example, only a minority of entries refer to a defendant’s occupation – particularly in the 19th century – and there is repetition of job titles (servant and servants being classed as separate jobs, for example, and different spellings – e.g. tailor/taylor – also being classified as different occupations) which can make visualisations skewed.
Another issue lies in the sheer amount of data that the Digital Panopticon team will be processing over the next couple of years. They are looking at the lives of thousands of individuals, using lots of different sources that will give them a huge amount of data on people’s health, height, weight, criminal record, social background, and so on. How do you present that in a way that is informative, accurate, yet easy to process and understand?
That is still an issue to be decided; but this workshop really showed how historians can use data visualisation to present their research in new ways, whilst still being academically rigorous. And now I’m off to have a play with Tableau Public, one of the many data visualisation applications I learned about on the workshop. I’m hoping the results will make it onto this blog at some point in the not too distant future. 🙂
The Digital Panopticon project can be followed on Twitter – @digipanoptic