The T-SQL MERGE statement was introduced in SQL Server 2008, and in many ways simplified common tasks such as dimension loading. However, like most things in life, there are a few gotchas to consider. For example, the merge statement will ”…perform insert, update, or delete operations in a single statement…”, MSDN. The keyword here is OR. In other words, each source record (based on USING clause) will result in one operation being performed. This works great for dimension loads which include only SCD1 type attributes where we will either insert a new record or update existing records that have changed (upsert).
Things get a little hairier when you need to update a dimension with SCD2 attributes. In this case we need to perform two operations when a source record is matched with an existing dimension record. The existing record needs to be expired, and a new record needs to be inserted (current). No problem. We can use the MERGE OUTPUT and wrap the merge statement with an INSERT statement. By doing so we end up with a statement that will perform the two needed operations (UPDATE & INSERT).
After the above into consideration and adding a few other needed features, a dimension load statement can quickly become very complex. LeapFrogBI boils all of this complexity down to a very simple selection. When deciding which dimension template to use consider the following definitions.
D1000 – Single Batch | Source must be limited to 1 dimension key version per load. Only use if this is always the case. |
D1001 – Multi Batch Load | Source can include multiple dimension key versions (each must have unique recordtime). |
D1002 – Multi Batch Load w/ Constraints | Same as D1001 but adds handling for SCD2 attributes with constraints in place. |
D2000 – Existing Dimension | Use an existing dimension which is maintained outside of LFBI. Result in a view being created which points to the existing dimension. |
Below is an example of a D1002 dimension load merge statement with the the following functionality included:
- Keep track of record counts for logging purposes
- Handles hybrid dimensions (combination of SCD0, SCD1, and SCD2 attributes)
- Accepts multiple versions of records with same dimension key (unique effective time required)
- Overcomes insert with nested merge & constraint limitation described above
Dimension Load Using MERGE
--Create load count variables DECLARE @Insert AS int = 0 DECLARE @Upsert AS int = 0 DECLARE @SCD2 AS int = 0 --Create current load table variable --Holds dimension key and recorddate for current batch DECLARE @Current AS table ( [ProductID] int, [SYS_Start] datetime2(7) ); --Load source local temp table --Group by key and recorddate to ensure that duplicates are not included SELECT p227.[SYS_RecordTime] AS [SYS_Start], p227.[ModifiedDate] AS [ModifiedDate], p227.[ProductID] AS [ProductID], p227.[Name] AS [Name], p227.[ProductNumber] AS [ProductNumber], p227.[MakeFlag] AS [MakeFlag], p227.[FinishedGoodsFlag] AS [FinishedGoodsFlag], p227.[Color] AS [Color], p227.[SafetyStockLevel] AS [SafetyStockLevel], p227.[ReorderPoint] AS [ReorderPoint], p227.[StandardCost] AS [StandardCost], p227.[ListPrice] AS [ListPrice], p227.[Size] AS [Size], p227.[SizeUnitMeasureCode] AS [SizeUnitMeasureCode], p227.[WeightUnitMeasureCode] AS [WeightUnitMeasureCode], p227.[Weight] AS [Weight], p227.[DaysToManufacture] AS [DaysToManufacture], p227.[ProductLine] AS [ProductLine], p227.[Class] AS [Class], p227.[Style] AS [Style], p227.[ProductSubcategoryID] AS [ProductSubcategoryID], p227.[ProductModelID] AS [ProductModelID], p227.[SellStartDate] AS [SellStartDate], p227.[SellEndDate] AS [SellEndDate], p227.[DiscontinuedDate] AS [DiscontinuedDate], p227.[rowguid] AS [rowguid] INTO #Source FROM [austx01\ss2008r2].[ProtectedFields].[psa].[p_Product] AS p227 --Create temp table which will hold SCD2 current version records --Temp table used to overcome SQL limitation related to insert with nested merge where constraints exist SELECT TOP 0 * INTO #SCD2Insert FROM [austx01\ss2008r2].[ProtectedFields].[mart].[d_Product2] --Load each batch. Manual break when all records are loaded. WHILE 1=1 BEGIN --Delete last load from source table variable (no deletes on first batch) DELETE FROM #Source FROM #Source s Inner Join @Current c ON s.[ProductID] = c.[ProductID] and s.[SYS_Start] = c.[SYS_Start] --Clear current table variable DELETE FROM @Current --Load current table variable --Identify current batch (group by dimension key and return minimum recorddate). INSERT INTO @Current SELECT [ProductID] , min([SYS_Start]) AS [SYS_Start] FROM #Source GROUP BY [ProductID] --Load dimension if there are records to process IF @@ROWCOUNT > 0 BEGIN --Insert and SCD1 (Upsert) MERGE [austx01\ss2008r2].[ProtectedFields].[mart].[d_Product2] AS tgt USING (Select s.[SYS_Start], s.[ModifiedDate], s.[ProductID], s.[Name], s.[ProductNumber], s.[MakeFlag], s.[FinishedGoodsFlag], s.[Color], s.[SafetyStockLevel], s.[ReorderPoint], s.[StandardCost], s.[ListPrice], s.[Size], s.[SizeUnitMeasureCode], s.[WeightUnitMeasureCode], s.[Weight], s.[DaysToManufacture], s.[ProductLine], s.[Class], s.[Style], s.[ProductSubcategoryID], s.[ProductModelID], s.[SellStartDate], s.[SellEndDate], s.[DiscontinuedDate], s.[rowguid] FROM #Source s Inner Join @Current c ON s.[ProductID] = c.[ProductID] and s.[SYS_Start] = c.[SYS_Start]) AS src ON (tgt.[ProductID] = src.[ProductID] or (tgt.[ProductID] is null and src.[ProductID] is null)) WHEN NOT MATCHED THEN /*Insert New Records*/ INSERT ([SYS_Start], [SYS_End], [SYS_Current], [ModifiedDate], [ProductID], [Name], [ProductNumber], [MakeFlag], [FinishedGoodsFlag], [Color], [SafetyStockLevel], [ReorderPoint], [StandardCost], [ListPrice], [Size], [SizeUnitMeasureCode], [WeightUnitMeasureCode], [Weight], [DaysToManufacture], [ProductLine], [Class], [Style], [ProductSubcategoryID], [ProductModelID], [SellStartDate], [SellEndDate], [DiscontinuedDate], [rowguid]) VALUES (src.[SYS_Start], '9999-12-31 23:59:59.9999999', 'TRUE', src.[ModifiedDate], src.[ProductID], src.[Name], src.[ProductNumber], src.[MakeFlag], src.[FinishedGoodsFlag], src.[Color], src.[SafetyStockLevel], src.[ReorderPoint], src.[StandardCost], src.[ListPrice], src.[Size], src.[SizeUnitMeasureCode], src.[WeightUnitMeasureCode], src.[Weight], src.[DaysToManufacture], src.[ProductLine], src.[Class], src.[Style], src.[ProductSubcategoryID], src.[ProductModelID], src.[SellStartDate], src.[SellEndDate], src.[DiscontinuedDate], src.[rowguid]) /*SCD1 Updates*/ WHEN MATCHED AND tgt.[ModifiedDate] <> src.[ModifiedDate] or (tgt.[ModifiedDate] is null and src.[ModifiedDate] is not null) or (tgt.[ModifiedDate] is not null and src.[ModifiedDate] is null) or tgt.[Name] <> src.[Name] or (tgt.[Name] is null and src.[Name] is not null) or (tgt.[Name] is not null and src.[Name] is null) or tgt.[ProductNumber] <> src.[ProductNumber] or (tgt.[ProductNumber] is null and src.[ProductNumber] is not null) or (tgt.[ProductNumber] is not null and src.[ProductNumber] is null) or tgt.[MakeFlag] <> src.[MakeFlag] or (tgt.[MakeFlag] is null and src.[MakeFlag] is not null) or (tgt.[MakeFlag] is not null and src.[MakeFlag] is null) or tgt.[FinishedGoodsFlag] <> src.[FinishedGoodsFlag] or (tgt.[FinishedGoodsFlag] is null and src.[FinishedGoodsFlag] is not null) or (tgt.[FinishedGoodsFlag] is not null and src.[FinishedGoodsFlag] is null) or tgt.[Color] <> src.[Color] or (tgt.[Color] is null and src.[Color] is not null) or (tgt.[Color] is not null and src.[Color] is null) or tgt.[SafetyStockLevel] <> src.[SafetyStockLevel] or (tgt.[SafetyStockLevel] is null and src.[SafetyStockLevel] is not null) or (tgt.[SafetyStockLevel] is not null and src.[SafetyStockLevel] is null) or tgt.[Style] <> src.[Style] or (tgt.[Style] is null and src.[Style] is not null) or (tgt.[Style] is not null and src.[Style] is null) or tgt.[ProductSubcategoryID] <> src.[ProductSubcategoryID] or (tgt.[ProductSubcategoryID] is null and src.[ProductSubcategoryID] is not null) or (tgt.[ProductSubcategoryID] is not null and src.[ProductSubcategoryID] is null) or tgt.[ProductModelID] <> src.[ProductModelID] or (tgt.[ProductModelID] is null and src.[ProductModelID] is not null) or (tgt.[ProductModelID] is not null and src.[ProductModelID] is null) or tgt.[SellStartDate] <> src.[SellStartDate] or (tgt.[SellStartDate] is null and src.[SellStartDate] is not null) or (tgt.[SellStartDate] is not null and src.[SellStartDate] is null) or tgt.[SellEndDate] <> src.[SellEndDate] or (tgt.[SellEndDate] is null and src.[SellEndDate] is not null) or (tgt.[SellEndDate] is not null and src.[SellEndDate] is null) or tgt.[DiscontinuedDate] <> src.[DiscontinuedDate] or (tgt.[DiscontinuedDate] is null and src.[DiscontinuedDate] is not null) or (tgt.[DiscontinuedDate] is not null and src.[DiscontinuedDate] is null) or tgt.[rowguid] <> src.[rowguid] or (tgt.[rowguid] is null and src.[rowguid] is not null) or (tgt.[rowguid] is not null and src.[rowguid] is null) THEN UPDATE SET [SYS_LoadTime] = sysdatetime(), [ModifiedDate] = src.[ModifiedDate], [Name] = src.[Name], [ProductNumber] = src.[ProductNumber], [MakeFlag] = src.[MakeFlag], [FinishedGoodsFlag] = src.[FinishedGoodsFlag], [Color] = src.[Color], [SafetyStockLevel] = src.[SafetyStockLevel], [Style] = src.[Style], [ProductSubcategoryID] = src.[ProductSubcategoryID], [ProductModelID] = src.[ProductModelID], [SellStartDate] = src.[SellStartDate], [SellEndDate] = src.[SellEndDate], [DiscontinuedDate] = src.[DiscontinuedDate], [rowguid] = src.[rowguid] ; SELECT @Upsert += @@ROWCOUNT -- /*SCD2 Updates*/ --Load into temp table to overcome nested merge with constraint sql limitation INSERT INTO #SCD2Insert ([SYS_Start], [SYS_End], [SYS_Current], [ModifiedDate], [ProductID], [Name], [ProductNumber], [MakeFlag], [FinishedGoodsFlag], [Color], [SafetyStockLevel], [ReorderPoint], [StandardCost], [ListPrice], [Size], [SizeUnitMeasureCode], [WeightUnitMeasureCode], [Weight], [DaysToManufacture], [ProductLine], [Class], [Style], [ProductSubcategoryID], [ProductModelID], [SellStartDate], [SellEndDate], [DiscontinuedDate], [rowguid]) SELECT [SYS_Start], [SYS_End], [SYS_Current], [ModifiedDate], [ProductID], [Name], [ProductNumber], [MakeFlag], [FinishedGoodsFlag], [Color], [SafetyStockLevel], [ReorderPoint], [StandardCost], [ListPrice], [Size], [SizeUnitMeasureCode], [WeightUnitMeasureCode], [Weight], [DaysToManufacture], [ProductLine], [Class], [Style], [ProductSubcategoryID], [ProductModelID], [SellStartDate], [SellEndDate], [DiscontinuedDate], [rowguid] FROM ( MERGE [austx01\ss2008r2].[ProtectedFields].[mart].[d_Product2] AS tgt USING (Select s.[SYS_Start], s.[ModifiedDate], s.[ProductID], s.[Name], s.[ProductNumber], s.[MakeFlag], s.[FinishedGoodsFlag], s.[Color], s.[SafetyStockLevel], s.[ReorderPoint], s.[StandardCost], s.[ListPrice], s.[Size], s.[SizeUnitMeasureCode], s.[WeightUnitMeasureCode], s.[Weight], s.[DaysToManufacture], s.[ProductLine], s.[Class], s.[Style], s.[ProductSubcategoryID], s.[ProductModelID], s.[SellStartDate], s.[SellEndDate], s.[DiscontinuedDate], s.[rowguid] FROM #Source s Inner Join @Current c ON s.[ProductID] = c.[ProductID] and s.[SYS_Start] = c.[SYS_Start]) AS src ON (tgt.[ProductID] = src.[ProductID] or (tgt.[ProductID] is null and src.[ProductID] is null)) WHEN MATCHED AND tgt.[SYS_Current] = 'TRUE' and tgt.[SYS_Start] < src.[SYS_Start] and (tgt.[ReorderPoint] <> src.[ReorderPoint] or (tgt.[ReorderPoint] is null and src.[ReorderPoint] is not null) or (tgt.[ReorderPoint] is not null and src.[ReorderPoint] is null) or tgt.[StandardCost] <> src.[StandardCost] or (tgt.[StandardCost] is null and src.[StandardCost] is not null) or (tgt.[StandardCost] is not null and src.[StandardCost] is null) or tgt.[ListPrice] <> src.[ListPrice] or (tgt.[ListPrice] is null and src.[ListPrice] is not null) or (tgt.[ListPrice] is not null and src.[ListPrice] is null) or tgt.[Size] <> src.[Size] or (tgt.[Size] is null and src.[Size] is not null) or (tgt.[Size] is not null and src.[Size] is null) or tgt.[SizeUnitMeasureCode] <> src.[SizeUnitMeasureCode] or (tgt.[SizeUnitMeasureCode] is null and src.[SizeUnitMeasureCode] is not null) or (tgt.[SizeUnitMeasureCode] is not null and src.[SizeUnitMeasureCode] is null) or tgt.[WeightUnitMeasureCode] <> src.[WeightUnitMeasureCode] or (tgt.[WeightUnitMeasureCode] is null and src.[WeightUnitMeasureCode] is not null) or (tgt.[WeightUnitMeasureCode] is not null and src.[WeightUnitMeasureCode] is null) or tgt.[Weight] <> src.[Weight] or (tgt.[Weight] is null and src.[Weight] is not null) or (tgt.[Weight] is not null and src.[Weight] is null) or tgt.[DaysToManufacture] <> src.[DaysToManufacture] or (tgt.[DaysToManufacture] is null and src.[DaysToManufacture] is not null) or (tgt.[DaysToManufacture] is not null and src.[DaysToManufacture] is null) or tgt.[ProductLine] <> src.[ProductLine] or (tgt.[ProductLine] is null and src.[ProductLine] is not null) or (tgt.[ProductLine] is not null and src.[ProductLine] is null) or tgt.[Class] <> src.[Class] or (tgt.[Class] is null and src.[Class] is not null) or (tgt.[Class] is not null and src.[Class] is null)) THEN UPDATE SET [SYS_End] = Dateadd(nanosecond, -100, src.[SYS_Start]), [SYS_Current] = 'FALSE' OUTPUT $Action AS [MergeAction], src.[SYS_Start], '9999-12-31 23:59:59.9999999' as [SYS_End], 'TRUE' as [SYS_Current], src.[ModifiedDate], src.[ProductID], src.[Name], src.[ProductNumber], src.[MakeFlag], src.[FinishedGoodsFlag], src.[Color], src.[SafetyStockLevel], src.[ReorderPoint], src.[StandardCost], src.[ListPrice], src.[Size], src.[SizeUnitMeasureCode], src.[WeightUnitMeasureCode], src.[Weight], src.[DaysToManufacture], src.[ProductLine], src.[Class], src.[Style], src.[ProductSubcategoryID], src.[ProductModelID], src.[SellStartDate], src.[SellEndDate], src.[DiscontinuedDate], src.[rowguid] ) AS MergeOutput WHERE [MergeAction] = 'Update' ; --Load from temp table to target (overcome nested merge with constraint SQL limitation) INSERT INTO [austx01\ss2008r2].[ProtectedFields].[mart].[d_Product2] ([SYS_Start], [SYS_End], [SYS_Current], [ModifiedDate], [ProductID], [Name], [ProductNumber], [MakeFlag], [FinishedGoodsFlag], [Color], [SafetyStockLevel], [ReorderPoint], [StandardCost], [ListPrice], [Size], [SizeUnitMeasureCode], [WeightUnitMeasureCode], [Weight], [DaysToManufacture], [ProductLine], [Class], [Style], [ProductSubcategoryID], [ProductModelID], [SellStartDate], [SellEndDate], [DiscontinuedDate], [rowguid]) SELECT [SYS_Start], [SYS_End], [SYS_Current], [ModifiedDate], [ProductID], [Name], [ProductNumber], [MakeFlag], [FinishedGoodsFlag], [Color], [SafetyStockLevel], [ReorderPoint], [StandardCost], [ListPrice], [Size], [SizeUnitMeasureCode], [WeightUnitMeasureCode], [Weight], [DaysToManufacture], [ProductLine], [Class], [Style], [ProductSubcategoryID], [ProductModelID], [SellStartDate], [SellEndDate], [DiscontinuedDate], [rowguid] FROM #SCD2Insert SELECT @SCD2 += @@ROWCOUNT --Clear temp table prior to next batch load TRUNCATE TABLE #SCD2Insert END ELSE BEGIN --All source records are processed BREAK END END -- SELECT @Insert AS [Insert], @Upsert AS [Upsert], @SCD2 AS [SCD2]
The good news for LeapFrogBI platform users is that you really don’t need to worry about this level of detail, but it is always good to know what is going on in case you ever need to troubleshoot issues. In summary, MERGE is a great tool, but it also warrants special attention when used to perform dimension loads.