One of our clients recently faced an issue with loading and summarizing increasing volumes of data using a legacy database. In addition to a sizable bulk of historic data, large amounts of additional data were being generated regularly. By this point the data had become so voluminous as to push the limits of the existing system’s processing capacity. It was starting to consume unreasonable amounts of processing time as well as the end users’ time for report generation.
Provision of services, billing, transaction processing, and almost every facet of customer interaction is becoming increasingly data-driven. Organizations are generating ever increasing amounts of more versatile and complex data. Moreover, they are starting to generate data at multiple simultaneous locations. All this data needs to be collated and processed quickly to enable informed business decisions. Many of our clients have been using very popular and effective legacy data processing systems successfully for years. These systems are now proving to be simply incapable of meeting these new requirements at scale. It is time to think in a radical new direction.
After some analysis and discussion on this case we listed the following key considerations:
In this case we recommended a solution using Azure Data Factory (ADF). There were multiple reasons that made Azure the natural choice for this situation.
We started with a PoC (proof of concept) that involved combining ADF with Snowflake, Azure’s cloud data warehouse.
The PoC showed that it was now easy for the client to run analyses on structured data using their preferred BI tools. It also proved to be surprisingly cost-efficient compared to any conventional solution. Below is a high level architectural diagram showing the entire work flow.
Once Source and destination are created then Pipeline is required to transfer the data from source to destination.
With no code needed we can set up the pipelines in Azure, set the source and destination and voila we have the solution ready to go.
Step 1: Properties set in Pipeline
Step 2: Source set in Pipeline
Step 3: Destination set in Pipeline
Step 4: Set output settings
Step 5: Validate the Summary
Step 6: Deploy the Pipeline
Step 7: Deployment Watch window
ADF does a masterful job of ETL, combining multiple sources and types of data usably. In addition to processing it allows for monitoring. ADF sends alerts and makes it easy to take corrective actions where needed. Combining the best of SaaS, PaaS and IaaS, ADF is an excellent example of leveraging the massive power and scale of the cloud, making possible (and affordable) things that are simply unimaginable with traditional infrastructure.Talk to