What is dimensional modeling in a data warehouse?
Full transcript:
Welcome to the LeapFrogBI video training series on dimensional modeling. In this video we’re going to simply introduce the concept of dimensional modeling. Throughout this video series we’ll be touching on many topics, all related to dimensional modeling, with the goal being at the end of the video series you should be able to go out and create your own dimensional model.
So to get started we should probably define what a dimensional model is. But, because dimensional modeling really has everything to do with data warehousing, we really need to first define what a data warehouse is. And a data warehouse is simply a database. It’s an area where data is stored. That information has been collected from other systems where the data actually originated, such as your CRM, your ERP, your marketing repository or even external data like weather information or financial market information. And lastly, a data warehouse is designed to support reporting and data analytics. So if this is the case, we’re copying data from our source systems and placing it into our data warehouse, then why are we going through all of this trouble? Why not just use the source systems?
To answer that question let’s take a little closer look at the data warehouse topology. Here we have six different boxes, each representing a different business function. Each of those functions more than likely has multiple source systems. This happens through natural growth of an organization or through mergers and acquisitions. But even if there was only one two or three source systems, there’s still a very good case for creating a data warehouse. Like we already mentioned, this information is actually collected and stored in a database which forms the data warehouse, so why are we doing this?
- Well, number one, to persist data. We need to create reports that span five years of time and that report is going to include data, let’s say, from our ERP system and from our operations system. Well, maybe our operation system thatwe’re concerned with only stores data for six months. That report would not be possible unless we actually extract the data from the operation systems and persist it for the period that we need it to be persisted.
- It’s going to be a central view. We’re not going to be trying to cobble together reports from a bunch of different source systems to figure out what’s going on in our organization.
- Data quality issues aregoing to be resolved. Through the process of extracting the data and integrating it into the data warehouse we’re also going to be taking care of a lot of data quality issues.
- We’re creating a single version of the truth, meaning that our marketing system may have a list of products, our CRM system has a different list of products and our ERP system has yet another list of products. That creates a real problem when you’re trying to create a report. In our data warehouse, when we try to look at sales by product we’re going to have a single product list providing us a single version of the truth.
- The data warehouse is a common model. It’s going to be extremely difficult to go out to all of our source systems try to understand each of those models, especially for business users, where our data warehouse will provide an easy to navigate model with a business focus.
- Our data warehouse is also designed to provide fast query response. If we try to query all of our source systems – integrate that data with all of our business rules all at the same time – we would likely have some performance The data warehouse gives us a way of overcoming that problem.
- Augmenting the source systems is another role for the data warehouse. Our CRM system for example may need data that’s actually derived in the operational systems. We can easily accomplish this by going through the data warehouse to move that data to the CRM system.
Okay, so that’s what a data warehouse is and why we’re creating it. So back to the focus of this video series, and that is, what is a dimensional model? Well a dimensional model is a data warehouse design technique. It was made popular by Ralph Kimball and it’s known as the bottom-up approach. Ralph Kimball created many of the concepts that we’re going to be discussing, such as the idea of using a collection of data marts joined together with a bus architecture. And by using conformed dimensions and facts, they all will create collectively the actual data warehouse. We’re going to focus on two roles of the data warehouse and that is easy navigation and fast query performance. So let’s take a look at that same information here on our data warehouse topology. When we talk about dimensional modeling we can think of our data warehouse again as being a collection of data marts, each data mart serving a particular function. It doesn’t have to be a single source system, or even two or three source systems. It could be an unlimited number of source systems. But typically we’re looking to create a data mart that solves a very focused data need. And again, we’re going to create each of these data marts in a conformed way so we can plug them all into the bus architecture and collectively form the data warehouse.
Now, in dimensional modeling we’re going to be providing all of these benefits of a data warehouse, but we’re going to focus on easy navigation as well as fast query performance. Okay, so what are we going to be talking about in this video series? Well you probably guessed, we’ll be talking about all things dimensional modeling. Here we have a little nomenclature of a lot of the parts that are included in a dimensional model. We’re going to be covering each one of them. We’ll talk about mini dimensions, we’ll talk about junk dimensions, factless facts, inferred members, slowly changing dimensions and so on. Each of these concepts we’re going to cover in bite-sized chunks so that at the end of this video series you’ll have all of the information you need to go out, look at your source systems, and create your own target data model. So thank you for listening, we’ll see you in the next video.