Exploring Public Transport Trends in London Using Snowflake

Yash Chauhan
4 min readSep 23, 2024

--

London is one of the world’s most iconic and diverse cities, with a population exceeding 8.5 million people, representing over 300 languages. The city’s historic roads, originally designed for horse-drawn carts, have faced significant challenges in accommodating the growing population and evolving transportation needs. This rapid growth necessitated the development of an efficient public transport system, which is currently managed by Transport for London (TfL).

TfL oversees the operation of the London Underground, buses, trams, Docklands Light Railway (DLR), river services, taxis, and even the Emirates Airline cable car. They’ve also made much of their transport data publicly available, making it an excellent resource for data analysis.

In this blog, I’ll walk you through a project I recently worked on, where I used Snowflake to analyze TfL’s transport data, uncovering some fascinating insights about London’s public transportation trends.

Understanding the Dataset

The dataset I worked with was a modified version of TfL’s public transport data, loaded into a Snowflake database called TFL. This dataset contained records of public transport journeys across various transport modes, with the following columns:

  • MONTH: Representing the month of the year.
  • YEAR: The year the data was recorded.
  • DAYS: The number of days in a given month.
  • REPORT_DATE: The date on which the data was reported.
  • JOURNEY_TYPE: The type of transport (e.g., bus, Underground, DLR).
  • JOURNEYS_MILLIONS: The number of journeys (in millions) made using each transport type.

Project Objective

The main objective of this project was to use SQL queries to answer specific questions regarding the transport trends in London. This involved calculating the total number of journeys made by transport type, identifying the least and most popular transport years, and analyzing the Emirates Airline’s usage patterns.

1. Most Popular Transport Types in London

The first step was to identify which transport types were the most popular among Londoners. I executed a SQL query to group the data by JOURNEY_TYPE and sum the total number of journeys in millions for each type. Here’s the query I used:

SELECT JOURNEY_TYPE, SUM(JOURNEYS_MILLIONS) AS TOTAL_JOURNEYS_MILLIONS
FROM TFL.JOURNEYS
GROUP BY JOURNEY_TYPE
ORDER BY TOTAL_JOURNEYS_MILLIONS DESC;

Findings:

  • Buses were the most popular mode of transport, with a staggering 24,905 million journeys recorded over the years.
  • The Underground & DLR came in second, with 15,020 million journeys.
  • The Overground services recorded significantly fewer journeys, around 1,666 million.
  • Trams and TfL Rail also showed a relatively small portion of the transport market.

This information highlights how essential buses and the Underground are to London’s daily transport needs, serving as the backbone of the city’s public transportation system.

2. Exploring the Popularity of the Emirates Airline

The Emirates Airline cable car, launched in 2012 just before the London Olympics, is one of the more unique modes of transport in London. But how often do people actually use it?

To answer this, I wrote a query that identified the top five months where the highest number of journeys were made on the Emirates Airline:

SELECT 
MONTH,
YEAR,
ROUND(JOURNEYS_MILLIONS, 2) AS ROUNDED_JOURNEYS_MILLIONS
FROM TFL.JOURNEYS
WHERE JOURNEY_TYPE = 'Emirates Airline'
ORDER BY ROUNDED_JOURNEYS_MILLIONS DESC
LIMIT 5;

Findings:

  • The highest number of journeys (0.53 million) was recorded in May 2012, around the time of the cable car’s launch and the buildup to the Olympics.
  • June 2012 followed with 0.38 million journeys, while April 2012 saw 0.24 million journeys.
  • After the initial boom, the usage of the cable car seemed to drop, with only 0.19 million journeys in May 2013 and May 2015.

The trend suggests that the Emirates Airline was initially popular due to the excitement of its launch and the Olympics but has since settled into a niche role within the transport network.

3. Least Popular Years for Underground Travel

One of the most significant disruptions to London’s public transport was the COVID-19 pandemic. I wanted to explore how the pandemic affected Underground journeys, especially compared to previous years.

To find the least popular years for Underground travel, I grouped the data by YEAR and filtered the results to only show Underground & DLR journeys. Here's the query I used:

SELECT YEAR, JOURNEY_TYPE, SUM(JOURNEYS_MILLIONS) AS TOTAL_JOURNEYS_MILLIONS 
FROM TFL.JOURNEYS
WHERE JOURNEY_TYPE = 'Underground & DLR'
GROUP BY YEAR, JOURNEY_TYPE
ORDER BY TOTAL_JOURNEYS_MILLIONS
LIMIT 5;

Findings:

  • Unsurprisingly, 2020 saw the lowest recorded journeys, with only 310 million journeys made, due to the pandemic lockdowns and restrictions.
  • 2021 saw a slight recovery, with 748 million journeys, but still significantly lower than pre-pandemic levels.
  • 2022 showed further improvement, with over 1,064 million journeys, signaling a return to normalcy.
  • By comparison, 2010 and 2011 recorded significantly higher volumes, with over 1,096 and 1,156 million journeys, respectively.

The pandemic had a clear and dramatic effect on public transport usage, with 2020 being the hardest hit. However, it’s promising to see the gradual recovery of transport services in the following years.

Conclusion

Working on this project using Snowflake provided an excellent opportunity to explore London’s complex and diverse public transport system. By leveraging Snowflake’s powerful SQL querying capabilities, I was able to efficiently analyze large datasets and derive meaningful insights.

Key Takeaways:

  • Buses and the Underground are the most heavily relied-upon forms of transport in London, accounting for the vast majority of journeys.
  • The Emirates Airline saw an initial surge in popularity during its launch, but usage has since declined.
  • The COVID-19 pandemic caused a dramatic drop in Underground travel, but the recovery trend is visible in the 2021 and 2022 data.

This project also solidified my understanding of Snowflake’s data warehousing capabilities, particularly its ability to handle complex queries and large datasets with ease. As I continue to explore more real-world datasets, I’m excited to apply these insights to further deepen my expertise in data analysis and SQL.

--

--

No responses yet