ROCKBUSTER MOVIE RENTALS - WALKTHROUGH

  • Rockbuster Stealth LLC is a movie rental company facing stiff competition from streaming services such as Netflix and Amazon Prime. In order to stay competitive, they plan to launch an online video rental service.

  • I’m hired as an analyst by Rockbuster Stealth’s business intelligence (BI) department to help with the launch strategy for a new online video service.

  • SQL – Query and analyze data

    Excel – SQL queries are saved as CSV files

    Tableau – Create visualizations based on the queried data

Key Questions from Executives

1. Which movies contributed the most/least to revenue?

2. What was the average rental duration for all videos?

3. Which countries are Rockbuster customers based in?

4. Where are customers with a high lifetime value based?

5. Do sales figures vary between geographic regions?

The below image is an entity relationship diagram (ERD) of Rockbuster’s databases; it consists of various tables each holding data about different entities. The arrows indicate a relationship between them, and will always point to a column within a table that has a gold key icon. This means that both tables share a similar column, and can be joined to develop further insights.

The ERD provides an overview of what relationships exist, acting as a guide and reference on how to navigate through the analysis.

EXPLORATORY ANALYSIS

The key questions revolve around customers and sales, specifically in terms of films. Looking at the ERD, there’s a “film” table, so I query the descriptive statistics (minimum, maximum, and average) on the columns that contain quantitative data. This is to check for any outliers that could skew the data.

Data Quality Check - Film Table

There are no unusual values found within the four queried results.

Film Table Results

Customers are the ones driving the profits for the company, so it’s equally as important to check this table too.

Data Quality Check - Customer Table

There are no unusual values found within the three queried results.

Customer Table Results

DATA CLEANING

There were no unusual values found within the "film” and “customer” table. But what about potential duplicates? Those could also skew our results by giving false information.

I selected the key columns and the count of each value, grouping them together and checking if any values have a quantity greater than one. Luckily, no results appeared in the below code, indicating there are no duplicates!

SQL QUERIES

With two of the main tables checked and cleaned, I feel confident in diving into the more complex queries and start answering the key questions from stakeholders. Many times, the question contains keywords that help guide me in my analysis.

The keywords here are movies and revenue, which tell me I need the name of the film and the amount of money it generated. Looking at my ERD, there’s a “payment” table with a column called amount. This is a good start, but I’m still missing the name of the movie. As I scan the ERD again, there’s a “film” table with a column called title. In order to go from “payment” to “film”, I must conduct multiple joins where tables share similar columns.

Just showing the title of the film and how much revenue it made doesn’t tell the stakeholders much though. I decide to then add various columns to add more context to the film and its statistics

The left code orders revenue by descending order showing the top 5 movies and the right code orders revenue by ascending order automatically, showing the worst 5 movies.

Q1: What movies contributed the most/least to revenue?

Q2: What was the average rental duration for all videos?

Films can be categorized by genre, rating, or both! In this case, I decide to group them by genre because there are only five ratings, whereas there are 17 different genres. This could provide a snapshot of what genre of film customers enjoy watching more.

Q3: Which countries are Rockbuster customers based in? + Q4: Where are customers with a high lifetime value based?

The keywords here are customers and countries, so I know which tables to look for in the ERD and to join them developing further insights. The below code can be used to answer both key questions, which is why they were combined here. For Q3, the customer_id column plays a big role when creating visualizations as I can count the number of IDs that exist in each country.
For Q4, I’ve ordered the total amount customers have paid in descending order, and the results below are a small snippet showing the top 5 customers.

Hypothetically speaking, the more customers a country has, the more revenue that will be generated. To test this, I first queried which countries have the highest number of customers.

Q5: Do sales figures vary between geographic regions?

From here, I queried the countries and their total revenue, ordered by descending order of revenue. I wanted to compare the results and see if the hypothesis was correct — and it was!

ADDITIONAL ANALYSIS

Although the key questions were answered, it’s not enough to produce a full story. Compared to a meal, it’s like the main course. But without an appetizer or dessert, it’s not a full meal; therefore, adding in more information to provide context will encapsulate the analysis. Using the same logic to approach the above queries, I think of what else could be important and aid in decision-making. I can then dig into the dataset and develop further insights with other columns.

VISUALIZATIONS

After the analysis is complete, it’s time to transform them into visualizations that provide clarity through pictures. Results don’t mean anything if they cannot be understood. Visuals can help by displaying comparisons and are much more eye-appealing too!

MINOR ISSUES AND TAKEAWAYS

The project itself consisted of mainly two tools, PostgreSQL and Tableau, with Microsoft Excel playing a small part. There were no issues with querying from the database and obtaining the information I intended to. However, one minor issue was when I created a visualization of the top five countries by revenue using Tableau.

In the left picture below, the red shape outlines the country of Japan, however the text is far from it. When zooming in and out, the text will jump from left to right. But in the right picture, that is what I see when creating it. I have not yet clicked “save”, and it looks normal, but once I save and display it like the link above, it distorts again.

Glitched Version

Correct Version

WRAPPING IT ALL UP

With the abundance of information, I construct a story around the key questions. As mentioned earlier, the additional analysis acts like an appetizer, and can help build up to the main points. To end it all off, recommendations are presented and act as dessert; it’s the final touch which completes everything and leaves stakeholders satisfied!