
Advertising Channel Analysis
Client
The Challenge
The Client had decided on the System's architecture and was now looking for an experienced team to help them implement it. The Client was satisfied with the long-term cooperation with RPAiX and contacted our consultants to complete the migration from the existing analytical System to the new one.
Solution
The Client’s Business Intelligence team worked closely with RPAiX’s Big Data team throughout the project. The latter was responsible for the implementation of the original idea.
The Client’s architects chose the following frameworks for the new analytic System:
- Apache Hadoop – Data storage
- Apache Hive – Data aggregation and query.
- Apache Spark is a tool for data processing.
Amazon Web Services was selected as a cloud computing platform over Microsoft Azure.
The Customer requested that the old and new systems were used in parallel during the migration.
The solution consisted of five major modules.
- Data preparation
- Staging
- Data warehouse 1
- Data warehouse 2
- Desktop application
Data Preparation
Raw data from multiple sources has been provided to the System, including TV views, mobile device browsing history, website visits data, and surveys. The System can process over 1,000 types of raw data (archives and TXT). Data preparation was coded in Python and included the following steps:
- Data transformation
- Data parsing
- Data merging
- Data loading into the System
Staging
Apache Hive was the core of this module. Data structure at that stage was very similar to raw data and did not establish connections between different sources such as TV and the internet.
Data warehouse 1
The next block was similar to the one before, but it also used Apache Hive. Data mapping was performed there. The System, for example, processed respondents’ data from radio, television, and internet sources. It also linked users’ IDs from different data sources according to mapping rules. This block had an ETL written in Python.
Data warehouse 2
The block used Apache Hive and Spark to process data. It was able to calculate sums, averages, probabilities and other business logic. Spark Data Frames were used for processing SQL queries via the desktop app. Scala was used to code ETL. Spark also allowed the filtering of query results based on access rights granted by system users.
Desktop application
The System allowed cross-analysis of nearly 35,000 attributes. It also built intersection matrixes that allow multi-angle data analytics for different markets. The Customer could create custom reports in addition to the standard reports, such as Reach Pattern and Reach Ranking, Time Spent or Share of Time, etc. The System provided quick replies in easy-to-understand charts after the Customer had selected several parameters (e.g., TV channel, customer group, and time of day). Forecasting could be of great benefit to the Customer. For example, forecasting could be used to forecast revenue based on advertising budget and expected reach.
Results
Technologies & Tools
Apache Hadoop, Apache Hive, Apache Spark, Python (ETL), Scala (Spark, ETL), SQL (ETL), Amazon Web Services (Cloud storage), Microsoft Azure (Cloud storage), .NET (desktop application).
*All case studies are for illustration purposes only. Due to NDA agreements between the client and the development team, project details cannot be disclosed.