Data Analysis with SQL

In this post, we will examine the Lead Merchants Sales Database using SQL. We will answer the below questions to pull insights from this 2 million line database and make a flat data file to use in Excel.

Please follow the link to download the database and related files.

Questions

  1. What are monthly sales volumes (total units sold, revenue, profit)
  2. Regional sales figures (total units sold, revenue, profit)
  3. Sales by product (total units sold, revenue, profit)
  4. List top 10 countries by revenue
  5. Top 100 customers by profit
  6. Report top 5 employees who made the highest profits

SUMMARY OF RESULTS

1- What are monthly sales volumes (total units sold, revenue, profit)

2019 Monthly sales volumes of Lead Merchants

2- Regional sales figures (total units sold, revenue, profit)

2019 regional sales figures of Lead Merchants

3- Sales by products (total units sold, revenue, profit)

An extract of 2019 figures by products. The first 14 products can be seen from the extract, a full list is downloaded as a csv file.

4- List top 10 countries by revenue

Top 10 countries sorted by revenue.

Swedish customers bought 25.978 products and generated £760k in revenue for LM.

5- Top 100 customers by profit

An extract of top 100 customers, the full list is downloaded as a csv file.

The first customer bought £81.623 of products and created £14.178 of profits.

6- Report top 5 employees who made the highest profits

High performing employees: highest performer generated £6,3m of revenue £1,1m in profits for LM.

DETAILED ANALYSIS

TABLES

The Lead Merchants sales database consists of 5 tables. The main table in the database shows the realised orders in 2019. 2 of the remaining tables store details of products sold and customers who made these orders. The other 2 tables store the country information of the customers and employees.

Products Table Orders Table Customers Table Countries Table Employees Table 
IDPIDPIDPCountry_CodePIDP
Product Order Date Customer Name Country Employee Name 
Product Type Product IDFCountry IDFRegion Region ID 
Cost (£) Customer IDFRegion ID  
Price Units Sold Employee IDF 
Stock 
Lead Merchants Database Tables

Analysis

We will use PostgreSQL, which is a free software, for analysis with SQL. In case you use other software, the code mentioned in the file will be, to a great extent the same.

You can download PostgreSQL and find instructions for installation on their website. Make sure to download the version for your operating system.

Below there is a step by step explanation of the analysis of sales data. To follow this analysis, basic SQL knowledge is required.

A. Create the Database and the Tables

1. Create the Lead_Merchants database (right-click on the databases and create database)

2. Open the query tool and create 5 tables (coding is in a separate file). Refresh tables with right-click, refresh.

3. Import data to tables, right-click on each table and import CSV files, 5 csv table data will be loaded to 5 table

Below is the table import dialogue box. The default option is export, which if not switched to import will write an empty file on your data. Keep a backup of all files.

B. Now start examining the data in the tables:

  1. Analyse the orders table:

Code –

Select * FROM orders;

2 million rows loaded, below is an extract of orders table.

              The orders table has 5 columns: id, order_date, product_id, customer_id, units_sold

              In the messages tab, you can see the total number of rows.

              Code –

SELECT * FROM orders
     WHERE product_id = 'prd05'
     GROUP BY product_id; 

                             Total 96230 units of Product 5 is sold.

             

  • Do this examination for all other tables

C. Establish the relationships between the tables

To set these tables as a database, we have to establish the relationships (connections) between tables. Each table is created with its primary keys. We will now set foreign keys and use these primary keys to define relationships.

For the products table, the primary key is ‘id’. (P: Primary key, F: Foreign key)

The arrows show the relationships between the tables. P sets a column as Primary Key and F as Foreign Key. Relationships are defined between primary keys and foreign keys. All primary keys have unique values in each row, and foreign keys can have the same values in more than one row which make the relationships one to many.

  • Set foreign key for orders table on customer_id column and link this column with customers table id column:

ALTER TABLE public.orders
ADD CONSTRAINT “customers_FK”
FOREIGN KEY(customer_id) REFERENCES public.customers(id);

  • Set foreign key for orders table on product_id column and link this column with products table id column:
ALTER TABLE public.orders
ADD CONSTRAINT "products_FK"
FOREIGN KEY(product_id) REFERENCES public.products(id);

  • Set foreign key for customers table on country_code column and link this column with countries table country_code column:
ALTER TABLE public.customers
ADD CONSTRAINT "country_FK"
FOREIGN KEY (country_id) REFERENCES public.countries (country_code);
  • Set foreign key for countries table on employee_id column and link this column with employees table id column:
ALTER TABLE public.countries
ADD CONSTRAINT "employee_FK"
FOREIGN KEY (employee_id) REFERENCES public.employees (id);

D. The sales database is ready for analysis

Code –

SELECT * FROM orders, products WHERE products.id = orders.product_id;

Combines orders and products tables, connecting from products_id & id columns, shows all columns from both tables and 2 million rows

Code –

SELECT * FROM orders, products, customers
     WHERE products.id = orders.product_id 
     AND customers.id = orders.customer_id 
     LIMIT 5;  

Combines orders, products and customer tables, shows all columns from all tables and displays only 5 rows

Code –

SELECT product_id, product_category, product, SUM(units_sold) AS "total sales"
     FROM orders, products 
     WHERE orders.product_id = products.id 
     GROUP BY product_id, product_category, product 
     ORDER BY "total sales" DESC; 

Combines orders, products and customer tables, shows selected 3 columns and total of units_sold columns from all tables, summarises by product_category and product and displays total units sold for each product in 25 rows

Code –

SELECT customer_name, country, product_category, product,
     SUM(Units_sold) AS Total_Sales, 
     SUM(products.price * orders.units_sold) AS Price 
     FROM orders, products, customers, countries 
     WHERE products.id = orders.product_id 
     AND customers.id = orders.customer_id 
     AND customers.country_id = countries.country_code 
     GROUP BY customer_name, country, product_category, product; 

Total 19133 rows.

Now let’s do our questions

Q1- Monthly Figures

Code –

SELECT EXTRACT(MONTH FROM order_date) AS MONTH,
     SUM(Units_sold) AS Total_Sales, 
     SUM(products.price * orders.units_sold) AS Revenue,
     SUM(products.p_cost * orders.units_sold) AS _Cost,
     (SUM(products.price * orders.units_sold)  
          - SUM(products.p_cost * orders.units_sold)) AS Profit
     FROM orders, products
     WHERE products.id = orders.product_id
     GROUP BY EXTRACT(MONTH FROM order_date);

Q2- Regional Figures

Code –

SELECT region, 
      SUM(Units_sold) AS Total_Sales, 
      SUM(products.price * orders.units_sold) AS Revenue, 
      SUM(products.p_cost * orders.units_sold) AS _Cost, 
      (SUM(products.price * orders.units_sold)  
           - SUM(products.p_cost * orders.units_sold)) AS Profit
      FROM orders, products, countries, CUSTOMERS
      WHERE products.id = orders.product_id
      AND customers.id = orders.customer_id
      AND countries.country_code = customers.country_id
      GROUP BY region;

Q3 – Figures by Products

Code –

SELECT product,
     SUM(Units_sold) AS Total_Sales, 
     SUM(products.price * orders.units_sold) AS Revenue,
     SUM(products.p_cost * orders.units_sold) AS _Cost,
     (SUM(products.price * orders.units_sold) 
          - SUM(products.p_cost * orders.units_sold)) AS Profit
     FROM orders, products
     WHERE products.id = orders.product_id
     GROUP BY product;

Q4 – Top 100 Countries by Revenue

Code-

(if you copy and paste the formula below some browsers alter the “-” sign. Replace/re-enter this sign in query editor if you receive an error)

SELECT country,
     SUM(Units_sold) AS Total_Sales, 
     SUM(products.price * orders.units_sold) AS Revenue,
     SUM(products.p_cost * orders.units_sold) AS _Cost,
     (SUM(products.price * orders.units_sold) 
          - SUM(products.p_cost * orders.units_sold)) AS Profit
     FROM orders, products, countries, CUSTOMERS
     WHERE products.id = orders.product_id
     AND customers.id = orders.customer_id
     AND countries.country_code = customers.country_id
     GROUP BY country
     ORDER BY Revenue DESC
     LIMIT 100;

Q5 – Top 25 Customers by Profit (with countries)

Code –

(if you copy and paste the formula below some browsers alter the “-” sign. Replace/re-enter this sign in query editor if you receive an error)

SELECT customer_name, country,
     SUM(Units_sold) AS Total_Sales, 
     SUM(products.price * orders.units_sold) AS Revenue,
     SUM(products.p_cost * orders.units_sold) AS _Cost,
     (SUM(products.price * orders.units_sold)
         - SUM(products.p_cost * orders.units_sold)) AS Profit
     FROM orders, products, countries, CUSTOMERS
     WHERE products.id = orders.product_id
     AND customers.id = orders.customer_id
     AND countries.country_code = customers.country_id
     GROUP BY customer_name, country
     ORDER BY Profit DESC
     LIMIT 25;

Q6- Top 5 Employees by Profit

Code –

(if you copy and paste the formula below some browsers alter the “-” sign. Replace/re-enter this sign in query editor if you receive an error)

SELECT employee_name,
     SUM(Units_sold) AS Total_Sales, 
     SUM(products.price * orders.units_sold) AS Revenue,
     SUM(products.p_cost * orders.units_sold) AS _Cost,
     (SUM(products.price * orders.units_sold)
           - SUM(products.p_cost * orders.units_sold)) AS Profit
     FROM orders, products, countries, customers, employees
     WHERE products.id = orders.product_id
     AND customers.id = orders.customer_id
     AND countries.country_code = customers.country_id
     AND countries.employee_id =employees.id
     GROUP BY employee_name
     ORDER BY Profit DESC;

Last Task – Create a flat file

Code –

SELECT EXTRACT(MONTH FROM order_date) AS MONTH, customer_id, customer_name,
     product_category, product_id, product, country, country_id, region, 
     countries.region_id, employee_id, employee_name,
     SUM(Units_sold) AS Total_Sales, 
     SUM(products.price * orders.units_sold) AS Price,
     SUM(products.p_cost * orders.units_sold) AS _Cost, 
     SUM(products.stock) AS Stock
     FROM orders, products, customers, countries, employees
     WHERE products.id = orders.product_id
     AND customers.id = orders.customer_id
     AND countries.country_code = customers.country_id
     AND employees.id = countries.employee_id
     GROUP BY EXTRACT(MONTH FROM order_date), customer_id, customer_name,
     product_category, product_id, product, country, country_id,
     region, countries.region_id, employee_id, employee_name;
Flat table

This is a flat table prepared to work in Excel. Daily data is summarised by months and row count der-creased from 2.000.000 to 229.560.

Download this table from PostgreSQL as a .csv file.

This is the end of the Analysis with SQL, next, we will do analysis with EXCEL first using the flat file prepared in SQL and then with Power Pivot with the full dataset.

In the final section, we will work with the new business intelligence tool, Power BI.

Back to Data Analysis, Reporting, Visualisation.