How to load data incrementally using Azure Data factory
Incremental loading runs periodically after the initial loading of data and it loads the updated data(Delta data).Note:
There should be a column in the source to identify last processed data (for example- created date, last modified date etc)Workflow:
1.Load the full data and identify the column to check the last updated data.
2.Run the pipeline to load delta data.
3.Copy the delta data to the destination.
There are 2 activities in pipeline and 1 stored procedure. First activity to copy from source to staging and second staging to the destination. The stored procedure is used to empty the staging after 2nd activity.
Consider an example where the source is Mysql and destination is SQL server.
In activity 1 we are trying to copy data from MySQL to staging. Staging table can be at SQL server or blob. While loading the data inactivity, constraint the query to load only delta data. For example - query to load data where created_date is greater than yesterday.
Output dataset of activity 1 will be the input of activity 2, ensures if activity 1 completed only then activity 2 will start.
Activity 2 copies data from staging to destination table. It has two input dataset, output of activity 1 and data from staging. After that stored procedure will run.
At SQL server, create a stored procedure to truncate staging table, remove duplicates from destination table. The stored procedure has input dataset as the output of activity 2 datasets. There is no output of stored procedure so needs to create dummy output dataset which is used only to specify the schedule for running the stored procedure activity. Dummy dataset creation, table name should be blank.
Calling stored procedure in the pipeline.
Data Architect @Urvi Verma
Major Challenges Faced By Retail Outlets
April 07, 2020
Application of AI for Search Engine Optimization (SEO)
February 29, 2020
Market Basket Analaysis
December 27, 2019