A BIG DATA ANALYSIS CASE STUDY

Advertising Channel Analysis

Client

The Client is a leading market research entity.

The Challenge

Although the Client had a solid analytical system, they believed it wouldn't be able to meet the company's future requirements. The Customer recognized this and was looking for an innovative future-focused solution. The System-to-be was designed to handle the constantly growing volume of data and to allow for comprehensive analysis of advertising channels.

The Client had decided on the System's architecture and was now looking for an experienced team to help them implement it. The Client was satisfied with the long-term cooperation with RPAiX and contacted our consultants to complete the migration from the existing analytical System to the new one.

Solution

The Client’s Business Intelligence team worked closely with RPAiX’s Big Data team throughout the project. The latter was responsible for the implementation of the original idea.

The Client’s architects chose the following frameworks for the new analytic System:

  • Apache Hadoop – Data storage
  • Apache Hive – Data aggregation and query.
  • Apache Spark is a tool for data processing.

Amazon Web Services was selected as a cloud computing platform over Microsoft Azure.

The Customer requested that the old and new systems were used in parallel during the migration.
The solution consisted of five major modules.

  • Data preparation
  • Staging
  • Data warehouse 1
  • Data warehouse 2
  • Desktop application

Data Preparation

Raw data from multiple sources has been provided to the System, including TV views, mobile device browsing history, website visits data, and surveys. The System can process over 1,000 types of raw data (archives and TXT). Data preparation was coded in Python and included the following steps:

  • Data transformation
  • Data parsing
  • Data merging
  • Data loading into the System

Staging

Apache Hive was the core of this module. Data structure at that stage was very similar to raw data and did not establish connections between different sources such as TV and the internet.

Data warehouse 1

The next block was similar to the one before, but it also used Apache Hive. Data mapping was performed there. The System, for example, processed respondents’ data from radio, television, and internet sources. It also linked users’ IDs from different data sources according to mapping rules. This block had an ETL written in Python.

Data warehouse 2

The block used Apache Hive and Spark to process data. It was able to calculate sums, averages, probabilities and other business logic. Spark Data Frames were used for processing SQL queries via the desktop app. Scala was used to code ETL. Spark also allowed the filtering of query results based on access rights granted by system users.

Desktop application

The System allowed cross-analysis of nearly 35,000 attributes. It also built intersection matrixes that allow multi-angle data analytics for different markets. The Customer could create custom reports in addition to the standard reports, such as Reach Pattern and Reach Ranking, Time Spent or Share of Time, etc. The System provided quick replies in easy-to-understand charts after the Customer had selected several parameters (e.g., TV channel, customer group, and time of day). Forecasting could be of great benefit to the Customer. For example, forecasting could be used to forecast revenue based on advertising budget and expected reach.

Results

The new System was capable of processing many queries 100 times faster than the old solution at the project's conclusion. As a result, the Customer was able, thanks to the invaluable insights gained from the analysis of nearly 30,000 attributes, to conduct a comprehensive analysis of advertising channels for different markets.

Technologies & Tools

The new System was capable of processing many queries 100 times faster than the old solution at the project's conclusion. As a result, the Customer was able, thanks to the invaluable insights gained from the analysis of nearly 30,000 attributes, to conduct a comprehensive analysis of advertising channels for different markets.

Apache Hadoop, Apache Hive, Apache Spark, Python (ETL), Scala (Spark, ETL), SQL (ETL), Amazon Web Services (Cloud storage), Microsoft Azure (Cloud storage), .NET (desktop application).

*All case studies are for illustration purposes only. Due to NDA agreements between the client and the development team, project details cannot be disclosed.

Industry

Marketing and Advertising

Technologies

Hadoop, Python, Scala, Spark, AWS, Big data, Cloud, Azure