
FMCG company elevates data quality through the implementation of advanced data engineering best practices within its established data pipeline

An FMCG multinational company was aiming to get support for its existing ADF ETL process. The support included existing process support and updates/addition in the system. Data quality and data availability are their major focus areas. Data was stored in Azure Data Lake and transformed using medallion architecture in Azure Synapse Analytics (Azure Databricks).

A Real Estate company was looking forward to generating more traction in their sales and marketing initiatives. They had few existing and few new requirements to focus on.

FMCG company elevates data quality through the implementation of advanced data engineering best practices within its established data pipeline
credits-Microsoft
Current System and Challenges
Data from all the countries was getting consolidated in Data Lake using Azure Databricks. The data storage was following medallion architecture. The data came from logs, flat files and business applications. Existing data pipeline had individual transformations for each data source. There were new markets/locations which had started sending data, and those were required to be streamlined into the current system.
Below are the challenges faced:-
1. Performing Data Quality check on new data.
2. Make sure the new data is available the next day for reporting.
3. Purging/Masking of sensitive data.
Solution Provided and its Impact
We spent some time on understanding the existing system, once that was done, we implemented scalable and reusable solution for the new source files into the existing data pipeline. We were able to work through all the challenges.
1. Enhanced Data Quality : Created Azure Databricks notebook to identify
aberrations in data files and appended logic to handle them while the data is
flowing in.
2. Next-Day Reporting :Troubleshooting any issues coming up in the daily run
ofpipeline same day itself.
3. Data Security Measures :Since there were multiple layers in which data was
available, we have implemented a process through which we could make sure
the purge has happened for data. Masking was done for personal sensitive
data.