SQL Server Integration Services offers developers a powerful toolset capable of performing a wide range of integration tasks. Data mart developers will typically design numerous SSIS packages which will create and maintain the target data mart. There is an array of options available when determining how to control the execution order of these packages. This post describes a precedence control option which provides the following benefits.
- Centrally define execution precedence
- Limit execution to specified day and time window
- Retry failed executions specified number of times after specified interval has lapsed
- Flag package as allowed to fail enabling execution process to continue
- Control maximum number of packages that can execute in parallel
- Monitor load process & job status centrally
- Quickly adapt to package additions, deletions, and modifications
Most of these goals can be achieved using a master package which uses SSIS native control flow features to define execution order precedence. This works well for a relatively small number of packages, but quickly becomes difficult to maintain and modify when a large number of packages needs to be managed.
As an alternative to the master package approach, this post describes a method for controlling package execution using a [Precedence] table to define each packages precedence definition. Generic SQL Agent jobs are used to initiate the execution of packages based on the specified precedence definition. Let’s start by taking a closer look at the [Precedence] table.
Step 1 – Create & Load [Precedence] Table
Use the below script to create the [Precedence] table in a central location. Each field is defined below. Populate the [Precedence] table with one record per package to be executed.
Create Precedence Table Script
CREATE TABLE [dbo].[Precedence]( [TaskID] [int] IDENTITY(1,1) NOT NULL, [DataMart] [nvarchar](150) NULL, [ComponentName] [nvarchar](100) NULL, [ComponentDirectory] [nvarchar](250) NULL, [SourceSystem] [nvarchar](max) NULL, [RunDays] [nvarchar](10) NULL, [Disable] [int] NULL, [MinStartTime] [time](7) NULL, [MaxStartTime] [time](7) NULL, [PhaseOrder] [int] NULL, [GroupOrder] [int] NULL, [SeriesOrder] [int] NULL, [SequenceOrder] [int] NULL, [MaxRetry] [int] NULL, [RetryInterval] [int] NULL, [AllowFailure] [int] NULL, [JobAssignedTo] [nvarchar](150) NULL, [JobStatus] [int] NULL, [JobTryCount] [int] NULL, [JobStatusModified] [datetime2](7) NULL ) ON [PRIMARY]
TaskID
Unique identity field. No input required.
DataMart
Target data mart name. Any text value. Not required.
ComponentName
Name of the SSIS package being executed (exclude extension). Required.
Component Directory
Local or UNC directory where the package is located (no trailing slash). Required.
Example; \ServerNamemyPackages or C:myPackages
SourceSystem
Source system name. Any text value. Not required.
RunDays
Numeric representation of weekdays which the package should execute.
Example; 1234567 will execute on all days of the week
Disable
Package will not be executed & will not affect other package execution.
1 = Disabled and 0 = Enabled
MinStartTime
Package will be executed only when the time of day is at or beyond this time.
Example; 01:00:00.0000000
MaxStartTime
Package will be executed at or prior to this time.
Example; 05:00:00.0000000
PhaseOrder
Highest level of precedence. Functions in a serial manner with lower values being executed first. All packages with a phase value of 4 must be executed prior to any package with a phase value greater than 4 being executed. Required.
Example; 0, 1, 2, 3, …
GroupOrder
Second level of precedence after phase. Functions in a serial manner with lower values being executed first. All packages with a group value of 2 must be executed prior to any package with a group value greater than 2 being executed. Group precedence is evaluated within the current phase. Required.
Example; 0, 1, 2, 3, …
SeriesOrder
Third level of precedence after Group. SeriesOrder is evaluated within the current group. Series can be executed in parallel. A value of 0 or greater can be assigned. When the series value is 0 sequence order is ignored. When series is greater than 0 sequence is evaluated. Note that while series will be assigned in ascending order they are executed in parallel when worker jobs (see below) are available. Required.
SequenceOrder
Fourth level of precedence after series. SequenceOrder is evaluated within the current series. A value of 0 or greater can be assigned. Packages will be executed in series based on the assigned sequence order value. Required when SeriesOrder is not equal to 0.
MaxRetry
The number of times a package will be retried in the event of a failure. Required
Example; 5
RetryInterval
The amount of time in minutes to wait between retries. Required.
Example; 1
AllowFailure
Designate whether packages which fail for the specified number of reties will halt subsequent dependent executions. When set to 0 failure is not allowed. When set to 1 failure is allowed. Required.
JobAssignedTo
Name of the worker job that has been assigned the task of executing the package. System Field.
Example; BIWorker_01
JobStatus
The status of the package execution. System Field.
0 = pending execution
1 = assigned to a worker job
2 = package is currently executing
3 = execution completed successfully
-1 = execution failed
JobTryCount
Number of times the current package has been executed. Used to track retry counts. System Field
Example; 1, 2, 3, etc…
JobStatusModified
Datetime of the most recent JobStatus value change. Used to track lapsed time between retries. System Field
Example; 04:23:21.1235322
Step 2 – Create [AssignTasks] Stored Procedure
Use the below script to create the [AssignTasks] stored procedure. This SP will be executed each time the poll job (see step 4) is run and packages will be assigned to a worker when the precedence criteria is met.
Create AssignTasks SP Script
CREATE PROCEDURE [dbo].[AssignTasks] -- Add the parameters for the stored procedure here @WorkerID as nvarchar(150) --Tasks will be assigned AS BEGIN -- SET NOCOUNT ON added to prevent extra result sets from -- interfering with SELECT statements. SET NOCOUNT ON; /* JobStatus Codes Error = -1 Pending = 0 Assigned = 1 Running = 2 Complete = 3 */ --Table variable will hold tasks & be updated as rules are applied. Declare @tvPrecedence Table ( [TaskID] [int] NOT NULL, [ComponentName] [nvarchar](100) NULL, [ComponentDirectory] [nvarchar](250) NULL, [SourceSystem] [nvarchar](max) NULL, [RunDays] [nvarchar](10) NULL, [Disable] [int] NULL, [MinStartTime] [time](7) NULL, [MaxStartTime] [time](7) NULL, [PhaseOrder] [int] NULL, [GroupOrder] [int] NULL, [SeriesOrder] [int] NULL, [SequenceOrder] [int] NULL, [MaxRetry] [int] NULL, [RetryInterval] [int] NULL, [AllowFailure] [int] NULL, [JobAssignedTo] [nvarchar](150) NULL, [JobStatus] [int] NULL, [JobTryCount] [int] NULL, [JobStatusModified] [datetime2](7) NULL ) /* Do not consider any tasks that are not set to run today (based on system time) Sunday is day 1. Saturday is day 7 */ Insert Into @tvPrecedence SELECT [TaskID] ,[ComponentName] ,[ComponentDirectory] ,[SourceSystem] ,[RunDays] ,[Disable] ,[MinStartTime] ,[MaxStartTime] ,[PhaseOrder] ,[GroupOrder] ,[SeriesOrder] ,[SequenceOrder] ,[MaxRetry] ,[RetryInterval] ,[AllowFailure] ,[JobAssignedTo] ,[JobStatus] ,[JobTryCount] ,[JobStatusModified] FROM [Precedence] Where charindex(cast(datepart(dw, SYSDATETIME()) as nvarchar(5)), [RunDays]) <> 0 /* Do not consider tasks that have met the maxRetry value and have AllowFailure set to True (1) except those that are currently running */ and ( AllowFailure = 0 or --Inluclude all where failure is not allowed (AllowFailure = 1 and JobTryCount < MaxRetry) or --Include if failure is allowed & max retry has not been met (AllowFailure = 1 and JobTryCount = MaxRetry and JobStatus in(1, 2))--Include if failure is allowed, maxRetry is met, but status is assigned or running ) /* Do not consider tasks that are set to disabled */ and [Disable] = 0 /* Order tasks by Phase, Group, Series, Sequence */ Order by [PhaseOrder], [GroupOrder], [SeriesOrder], [SequenceOrder] /* Idendify the current phase by collecting the minimum phase value that has 1 or more tasks that are not complete Only tasks in this phase can be started */ Declare @CurrentPhase as int Select @CurrentPhase = isnull(min([PhaseOrder]), -1) from @tvPrecedence Where [JobStatus] <> 3 --Select @CurrentPhase as CurrentPhase /* Idendify the currnt group by collecting the minimum group value that has 1 or more tasks that are not complete within the CurrentPhase' Only tasks in this group can be started */ Declare @CurrentGroup as int Select @CurrentGroup = isnull(min([GroupOrder]), -1) from @tvPrecedence Where [JobStatus] <> 3 and [PhaseOrder] = @CurrentPhase --Select @CurrentGroup as CurrentGroup /* Restict all future steps to only evaluate within the current phase and current group */ Delete from @tvPrecedence where [PhaseOrder] <> @CurrentPhase or [GroupOrder] <> @CurrentGroup --Select * from @tvPrecedence /* Idendify the current series. Must be in the set time window and have tasks that are pending or need to be retried (error). */ Declare @CurrentSeries as int Select @CurrentSeries = isnull(MIN([SeriesOrder]), -1) --When no series are available return -1 from @tvPrecedence where MinStartTime <= cast(SYSDATETIME() as time) and MaxStartTime >= CAST(SYSDATETIME() as time) --Within set time window and JobStatus in(0, -1) --Pending or has errored (not assigned, running, or complete) and(JobTryCount = 0 or(JobTryCount < MaxRetry and SYSDATETIME() >= dateadd(mi,[RetryInterval],JobStatusModified))) --First try or Retry interval has been met --Select @CurrentSeries as CurrentSeries /* Restrict assigned tasks to the current series */ Delete from @tvPrecedence where [SeriesOrder] <> @CurrentSeries /* If the current Series is 0, then assign one task to current worker. These will run in parallel (one per worker). */ If @CurrentSeries = 0 Begin Update [Precedence] Set JobAssignedTo = @WorkerID, JobStatus = 1, JobStatusModified = SYSDATETIME() Where [TaskID] = ( Select top 1 [TaskID] from @tvPrecedence where MinStartTime <= cast(SYSDATETIME() as time) and MaxStartTime >= CAST(SYSDATETIME() as time) --Within set time window and JobStatus in(0, -1) --Pending or has errored (not assigned, running, or complete) and(JobTryCount = 0 or(JobTryCount < MaxRetry and SYSDATETIME() >= dateadd(mi,[RetryInterval],JobStatusModified))) --First try or Retry interval has been met ) End /* If the current Series is <> 0, then assign all tasks in current series to current worker. These will run serially. Include tasks that are in the assigned state. These need to be reassigned to the new worker. This happens when a series does not complete. */ else Begin Update [Precedence] Set JobAssignedTo = @WorkerID, JobStatus = 1, JobStatusModified = SYSDATETIME() Where [TaskID] in ( Select [TaskID] from @tvPrecedence where MinStartTime <= cast(SYSDATETIME() as time) and MaxStartTime >= CAST(SYSDATETIME() as time) --Within set time window and JobStatus in(0, -1, 1) --Pending, assigned, or has errored (not assigned, running, or complete) and(JobTryCount = 0 or(JobTryCount < MaxRetry and SYSDATETIME() >= dateadd(mi,[RetryInterval],JobStatusModified))) --First try or Retry interval has been met ) End /* Return 4 if the worker job should be executed & -4 if it should not. Execute if a series was available to assign. msdb sp_help_job returns 4 when job is idle. Using inverse to designate that job is available, but no task is assigned. When 4 is returned the job will be started. When -4 is returned the job will not be started. */ Select case @CurrentSeries when -1 then -4 else 4 end as WorkerStatus END GO
Step 3 – Create Poll Package & SQL Agent Job
Download the attached BI_Poll.dtsx (119.70 kb) package. Save to a location accessible by SQL Agent. Edit the ”Console” connection to point to the database where the [Precedence] table is located.
The SQL agent poll job will run on a short interval (1 minute). On each job run the Poll.dtsx package will be executed. Replace ”DomainUser” with the a valid owner login name & replace “C:ETL_PackagesBI_Poll.dtsx” with the location of the BI_Poll package.
BI Poll SQL Agent Job Script
USE [msdb] GO /****** Object: Job [BI_Poll] Script Date: 01/30/2012 22:04:23 ******/ BEGIN TRANSACTION DECLARE @ReturnCode INT SELECT @ReturnCode = 0 /****** Object: JobCategory [[Uncategorized (Local)]]] Script Date: 01/30/2012 22:04:23 ******/ IF NOT EXISTS(SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]' AND category_class=1) BEGIN EXEC @ReturnCode = msdb.dbo.sp_add_category@class=N'JOB', @type=N'LOCAL', @name=N'[Uncategorized (Local)]' IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback END DECLARE @jobId BINARY(16) EXEC @ReturnCode =msdb.dbo.sp_add_job@job_name=N'BI_Poll', @enabled=1, @notify_level_eventlog=0, @notify_level_email=0, @notify_level_netsend=0, @notify_level_page=0, @delete_level=0, @description=N'No description available.', @category_name=N'[Uncategorized (Local)]', @owner_login_name=N'DomainUser', @job_id = @jobId OUTPUT IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback /****** Object: Step [BI_Poll] Script Date: 01/30/2012 22:04:23 ******/ EXEC @ReturnCode = msdb.dbo.sp_add_jobstep@job_id=@jobId, @step_name=N'BI_Poll', @step_id=1, @cmdexec_success_code=0, @on_success_action=1, @on_success_step_id=0, @on_fail_action=2, @on_fail_step_id=0, @retry_attempts=0, @retry_interval=0, @os_run_priority=0, @subsystem=N'SSIS', @command=N'/FILE "C:ETL_PackagesBI_Poll.dtsx" /CHECKPOINTING OFF /REPORTING E', @database_name=N'master', @flags=0 IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback EXEC @ReturnCode = msdb.dbo.sp_update_job@job_id = @jobId, @start_step_id = 1 IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule@job_id=@jobId, @name=N'BI_Poll_Interval', @enabled=1, @freq_type=4, @freq_interval=1, @freq_subday_type=2, @freq_subday_interval=30, @freq_relative_interval=0, @freq_recurrence_factor=0, @active_start_date=20120601, @active_end_date=99991231, @active_start_time=0, @active_end_time=235959, @schedule_uid=N'22d76b4a-6e07-483d-a36b-7e94fceb254a' IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback EXEC @ReturnCode = msdb.dbo.sp_add_jobserver@job_id = @jobId, @server_name = N'(local)' IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback COMMIT TRANSACTION GOTO EndSave QuitWithRollback: IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION EndSave: GO
Step 4 – Create Worker Package & SQL Agent Jobs
Download the attached BI_Worker.dtsx (65.31 kb) package. Save to a location accessible by SQL Agent. Edit the ”Console” connection to point to the database where the [Precedence] table is located.
The SQL agent worker job will run when the BI_Poll package executes them. Replace ”DomainUser” with the a valid owner login name & replace “D:ETL_PackagesBI_Worker.dtsx” with the location of the BI_Worker package. Multiple worker jobs can be created by changing each instance of ”BI_Worker_01” (job name & BI_Worker_ID variable) to a new value such as BI_Worker_02, BI_Worker_03, etc… The number of worker jobs created controls the max number of packages that will be executed in parallel.
BI Worker SQL Agent Job Script
USE [msdb] GO /****** Object: Job [BI_Worker_01] Script Date: 01/30/2012 22:13:20 ******/ BEGIN TRANSACTION DECLARE @ReturnCode INT SELECT @ReturnCode = 0 /****** Object: JobCategory [[Uncategorized (Local)]]] Script Date: 01/30/2012 22:13:20 ******/ IF NOT EXISTS(SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]' AND category_class=1) BEGIN EXEC @ReturnCode = msdb.dbo.sp_add_category@class=N'JOB', @type=N'LOCAL', @name=N'[Uncategorized (Local)]' IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback END DECLARE @jobId BINARY(16) EXEC @ReturnCode =msdb.dbo.sp_add_job@job_name=N'BI_Worker_01', @enabled=1, @notify_level_eventlog=0, @notify_level_email=0, @notify_level_netsend=0, @notify_level_page=0, @delete_level=0, @description=N'BI_Worker_Job ************************************************************* First line of desription is used in search. Do not change.', @category_name=N'[Uncategorized (Local)]', @owner_login_name=N'DomainUser', @job_id = @jobId OUTPUT IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback /****** Object: Step [Run Worker] Script Date: 01/30/2012 22:13:20 ******/ EXEC @ReturnCode = msdb.dbo.sp_add_jobstep@job_id=@jobId, @step_name=N'Run Worker', @step_id=1, @cmdexec_success_code=0, @on_success_action=1, @on_success_step_id=0, @on_fail_action=2, @on_fail_step_id=0, @retry_attempts=0, @retry_interval=0, @os_run_priority=0, @subsystem=N'SSIS', @command=N'/FILE "D:ETL_PackagesBI_Worker.dtsx" /X86 /CHECKPOINTING OFF /SET "Package.Variables[User::BI_Worker_ID].Properties[Value]";"BI_Worker_01" /REPORTING E', @database_name=N'master', @flags=0 IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback EXEC @ReturnCode = msdb.dbo.sp_update_job@job_id = @jobId, @start_step_id = 1 IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback EXEC @ReturnCode = msdb.dbo.sp_add_jobserver@job_id = @jobId, @server_name = N'(local)' IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback COMMIT TRANSACTION GOTO EndSave QuitWithRollback: IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION EndSave: GO
Conclusion
With the [Precedence] table, [AssignTasks] stored procedure, BI_Poll package, BI_Poll job, BI_Worker package, and BI_Worker job in place you are ready to start the execution process. To start the process simply enable the BI_Poll job and adjust the schedule to the desired execution window and interval.
After all packages have been executed, the precedence table must be reset when the entire process should be restarted. To do so simply update the JobStatus & JobTryCount columns setting their value to 0 for all records. This can be automated by creating a SQL Agent job to execute a sql script. In some cases it is important to not restart the execution process in certain circumstances. If this is the case, then logic must be added to only reset the [Precedence] table when appropriate.
To monitor the load process the [Precedence] table can be queried directly. It is often useful to select all records that are not disabled and have a JobStatus that is not equal to 3 (execution complete). Ordering this result set by PhaseOrder, GroupOrder, SeriesOrder, SequenceOrder will provide an ordered view of what is currently executing as well as the order up upcoming executions that are pending.
To make changes to the execution precedence, simply update the [Precedence] table as appropriate. For LeapFrogBI users the [Precedence] table can be seeded with all component packages automatically by executing the below script. This script will add default values for any component package that is not already loaded into the [Precedence] table. All other records will not be modified. Update the value of @MartID to reflect the name mart being loaded (10 character or less text value).
Seed [Precedence] LFBI Console
Declare @MartID varchar(10) = 'myMart'; --Add all components to [Precedence] that do not already exists INSERT INTO [LFBIConsole].[dbo].[Precedence] ([DataMart] ,[ComponentName] ,[ComponentDirectory] ,[SourceSystem] ,[RunDays] ,[Disable] ,[MinStartTime] ,[MaxStartTime] ,[PhaseOrder] ,[GroupOrder] ,[SeriesOrder] ,[SequenceOrder] ,[MaxRetry] ,[RetryInterval] ,[AllowFailure] ,[JobAssignedTo] ,[JobStatus] ,[JobTryCount] ,[JobStatusModified]) SELECT @MartID as DataMart, SUBSTRING(config.[configurationfilter], 1, PATINDEX('%-Precedence%', config.[ConfigurationFilter])-1) as PackageName, 'C:myPath' as ComponentDirectory, 'mySource' as SourceSystem, '1234567' as RunDays, 0 as [Disable], '00:00' as MinStartTime, '12:59' as MaxStartTime, NULL as PhaseOrder, NULL as GroupOrder, NULL as SeriesOrder, NULL as SequenceOrder, 4 as MaxRetry, 5 as RetryInterval, 0 as AllowFailure, 0 as JobAssignedTo, 0 as JobStatus, 0 as JobTryCount, SYSDATETIME() as JobStatusModified FROM [LFBIConsole].[dbo].[Configuration] as config left outer join [LFBIConsole].[dbo].[Precedence] prec on SUBSTRING(config.[configurationfilter], 1, PATINDEX('%-Precedence%', config.[ConfigurationFilter])-1) = prec.ComponentName and @MartID = prec.DataMart where RIGHT([configurationfilter], 10) = 'Precedence' and prec.ComponentName is null --Not yet in precedence table
After deploying this solution, package execution can be managed with ease. The developer focus can now be placed on tuning the execution process to take full advantage of available resources.