Skip to content

pokwir/Transforming-and-Analyzing-Data-with-SQL

Repository files navigation

Final-Project-Transforming-and-Analyzing-Data-with-SQL

Project/Goals

The goal of the project was to extract, clean, transform and gain business insight from raw data collected from an e-commerce site.

  1. Extraction. The five unstructured data pools were downloaded and migrated it into a temporary data pool. An initial preview was made on a GUI based apple numbers. Relationships, and mapping of data varriables for loading into postgresql was made.
  2. Cleaning. Data cleaning was made in postgresql.
    Using appropriate QA queries in postgresql, unwanted or redundant columns that didn't have observations were dropped, table modifications were done to ensure accurate data types, and duplicate observations were removed.
  3. Transformation. Data was converted to match destination formatting. Using appropriate SQL queries, unit conversions were made to common human comprehensible format eg time stamps and dates that are in epoch. This ensured the quality of the data preceding analysis.
  4. Quality Assurance. A quality assurance check was conducted to ensure that data was ready for analysis. This involved making sure that there was no duplicates in unique field values, ensuring appropriate data types, and invalid data. Risks were identified and documented that would guide analysis.
  5. Analysis. Business insights were conducted in postgresql. A report of findings were documented.

Results

  1. Revenue generated by the site by traffic sources. Queries on the data was done to determine total revenue generated by the site from visitors that made a purchase. Further analysis was done to determine how those users got to the site.
  2. Revenue generated by events. Data was analyzed for revenue and how users interacted with the site - page views and events like filling and completing a form or solving a puzzle.
  3. Regions (Cities and countries) where site visitors are from and top selling products in those regions, and patterns in product categories associated with those cities and countries.

Challenges

  • Duplicates — The data set especially the All_sessions table had quite a number of duplicates.
  • Withheld observations. Some country names and cities were Withheld from the dataset.

Future Goals

Investigate/ followup on missing/ inaccurate observations in the all_sessions table.

About

The first week data science LHL project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors