Why all the Fuss about Metadata Driven ETL?

Building anything from houses to software applications requires that we make decisions about how to efficiently convert raw materials into the desired product. Building a home in 1492 started by cutting down raw trees. Today builders use dimensional lumber or prebuilt partitions. Efficiency is gained by streamlining many of the tasks related to building with raw trees. Many other advancements contribute to today’s highly efficient home building process which results in faster results, higher quality, and lower costs.

Software development is no different from home building in this way. We are looking for the most efficient way to create our desired application. A developer could choose to create an entire ETL process using machine language. A better decision would be to use a programming language which provides pre-built function & methods to carry out common tasks. This can be further improved upon by not writing code at all. Instead, an ETL tool with a graphical interface can be used to quickly build code based on GUI driven selections.

In these two example the building/development process was brought to a higher level by utilizing tools and materials that were built specifically for the task at hand. Innovation is often focused on this approach to adding value. There is a point, however, when taking a building process to a higher level starts to have undesired consequences. This can be likened to a general store as compared to a specialty store. A general store can carry all of the same items, but often service is not comparable to the specialty store.

Metadata driven processes provide a slightly different approach. Instead of building the desired application directly we define the application’s specifications. These specifications are fed to an engine which builds the application for us by using predefined rules. This is the approach for metadata driven ETL development as well. Instead of building a package to create a dimension, for example, we provide the dimension description (metadata) to a package generating engine. This engine is then responsible for creating the defined package. Once the package is executed, it will create and maintain the prescribed dimension.

Why is metadata driven so much more efficient than traditional methods?

Creating a definition of a process is much faster than creating a process. A metadata driven approach results in building the same asset in 10% or less time as compared to traditional methods.
Quality standards are enforced. The rules engine becomes the gate keeper by enforcing best practices.
The rules engine becomes a growing knowledgebase which all processes benefit from.
Easily adapts to change & extension. Simply edit the definition & submit to the engine for a build. Need to inject a custom process? No problem, create a package the old fashioned way.
Enables agile data warehousing. Agile becomes possible due to greatly increased speed of development and reduced rework required by change.

What about high level development & metadata driven ETL?

Exactly! This simply means to provide a GUI that collects and validates all ETL definitions (metadata) making the development process even more efficient. This is exactly what the LeapFrogBI SaaS platform does while also eliminating all of the traditional thick client hassles.

Welcome to Agile Data Warehousing!

Paul B. Felix

Managing Partner - Paul has spent his career helping organizations convert raw data into valuable decision support systems. He built a deep set of analytical and technical skills while working as a business intelligence and data warehousing consultant before founding LeapFrogBI. Connect on LinkedIn