One of my final year modules was “Advanced Data Structures”. It provided an introduction to the subjects of:
- Relational Databases
- “Big Data”
- Artificial Intelligence -> Machine Learning -> Deep Learning
- NoSQL and OO Databases
- OLAP
I found these great, and am keen to learn more. The advent of large data sets and the ability to process them to find insights is fascinating.
Our second term project for this required us to analyse UK Road Traffic Accident data for the year 2015. This was a sheet of ~250,000 incidents, each with 40+ data points. I was pleased enough with 2 of these to present them here.
RTA Visualisation
Taking the incident data, it was possible to total the number of incidents in each section of a 1Km resolution grid of the UK. This data was imported to Unity, a custom texture created and it rendered to a quad. The road infrastructure and population centres are clearly visible.
RTA Time/Incident Count Heat Map
Every when the safest time to travel is? By extracting the data and totalling incidents for every half hour of the year, it was possible to produce the following heat map.
At first glance it does not seem to show much. But look closer and some details become clear.
- The commuting/school run times are clear 08:00 – 09:00 and 16:00-19:00
- Green spikes in the morning data (and slightly less clear yellow ones in the afternoon) indicate the weekend, with less of the above
- Morning incidents lessen in the period Jun-Aug, the school holiday season
- Clear lessening of incidents in the Christmas holidays
- Further investigation is required, but the reddening of incidents in Nov – Dec may reflect increased incidents due to weather/lighting
- Most revealingly, there seems to be a horizontal layer, where the first 30 minutes of the hour has more incidents than the last. Is this because people are running late? Or an issue with the original data collection?
All in all, quite revealing. And easily the launch platform for further investigation.