For this example, let's expand upon the company parts and imagine we're dealing with a piece of data that looks like the following and the snapshot schedule is daily:
Figure 1. Source System Inventory Schema |
This schema represents the parts inventory quantities, grouped by departments, for a company. Whenever the data warehouse snapshots the data from this schema a date and time of the snapshot is captured so we can deduce what the inventory state was at that time.
In order to fulfill the requirement to report on these snapshots on different time groups, we need a way to categorize this data as such. Now with every data warehouse there will be a date dimension at the level of granularity of the day, in which we could place min/max day/month/quarter/year flags that can help us with this problem. But as you may soon discover, these flags would only pertain to inventory exclusively and if you wanted to use this functionality with other snapshots you would have to add flags specific to that piece of data as well. In addition to this, we may not run this table ETL every day, or if we have an ETL job failure we may miss a day's snapshot. Thus a failure on Sept. 30 means that the Sept 29th snapshot is the maximum one captured for that month. With all of these scenarios, this solution would become untenable very fast. The best way to attack this problem, and provide a re-usable solution, would be to separate these flags out into their own junk dimension.
First we need to create this dimension, then determine how we're going to integrate it with the ETL of the snapshot fact table. Our goal is a table that looks like this:
Figure 2. Junk Dimension with Time Flags |
-->Create dimension to hold our date categories
CREATE
TABLE
DIM_TIME_FLAG(
FLAG_KEY int NOT NULL,
MIN_MONTH_FLG char(1) NULL,
MAX_MONTH_FLG char(1) NULL,
MIN_QUARTER_FLG
char(1) NULL,
MAX_QUARTER_FLG
char(1) NULL,
MIN_YEAR_FLG char(1) NULL,
MAX_YEAR_FLG char(1) NULL,
CONSTRAINT
PK_DIM_TIME_FLAG PRIMARY
KEY CLUSTERED (FLAG_KEY)
)
-->Hold our Y/N
enum
DECLARE @enum
TABLE
(
field char(1)
)
INSERT INTO
@enum(field)values('Y'),('N')
-->Temp table to contain cartesian of Y/N for all fields
DECLARE @final
TABLE
(
flag_key int identity(1,1),
field1 char(1),
field2 char(1),
field3 char(1),
field4 char(1),
field5 char(1),
field6 char(1)
)
-->Generate cartesian
INSERT INTO
@final
SELECT
a.field,
B.field,
c.field,
d.field,
e.field,
f.field
FROM
@enum a,
@enum b,
@enum c,
@enum d,
@enum e,
@enum f
-->Populate DIM_TIME_FLAG
INSERT INTO
DIM_TIME_FLAG
SELECT
0,
null,
null,
null,
null,
null,
null
UNION
SELECT
flag_key,
field1,
field2,
field3,
field4,
field5,
field6
FROM
@final
-->View Table Rows
SELECT
FLAG_KEY,
MIN_MONTH_FLG,
MAX_MONTH_FLG,
MIN_QUARTER_FLG,
MAX_QUARTER_FLG,
MIN_YEAR_FLG,
MAX_YEAR_FLG
FROM
DIM_TIME_FLAG
Figure 3. Parts Inventory Data Warehouse Schema |
Now, we need to figure out how to get DIM_TIME_FLAG integrated into this schema in our ETL plan. Key thing to keep in mind is that these flags will change as more and more snapshots are taken from the source system. Today's records that have MAX_MONTH_FLG=Y may need to be set to N the following day. After populating the other columns in the table you would have to execute this SQL statement to update the FLAG_KEY:
UPDATE A
SET
A.FLAG_KEY=Z.FLAG_KEY
FROM
FCT_PART_INVENTORY
A
INNER JOIN DIM_DAY B ON A.DAY_KEY=B.DAY_KEY
LEFT JOIN
(
SELECT
MIN(B.FULL_DATE) AS MIN_MONTH
FROM
FCT_PART_INVENTORY
A
INNER JOIN DIM_DAY B ON A.DAY_KEY=B.DAY_KEY
GROUP BY
B.YEARMO
)MIN_MONTH
ON B.FULL_DATE=MIN_MONTH.MIN_MONTH
LEFT JOIN
(
SELECT
MAX(B.FULL_DATE) AS MAX_MONTH
FROM
FCT_PART_INVENTORY
A
INNER JOIN DIM_DAY B ON A.DAY_KEY=B.DAY_KEY
GROUP BY
B.YEARMO
)MAX_MONTH
ON B.FULL_DATE=MAX_MONTH.MAX_MONTH
LEFT JOIN
(
SELECT
MIN(B.FULL_DATE) AS MIN_QUARTER
FROM
FCT_PART_INVENTORY
A
INNER JOIN DIM_DAY B ON A.DAY_KEY=B.DAY_KEY
GROUP BY
CONVERT(VARCHAR,B.QUARTER)+CONVERT(VARCHAR,B.YEAR)
)MIN_QUARTER
ON B.FULL_DATE=MIN_QUARTER.MIN_QUARTER
LEFT JOIN
(
SELECT
MAX(B.FULL_DATE) AS MAX_QUARTER
FROM
FCT_PART_INVENTORY
A
INNER JOIN DIM_DAY B ON A.DAY_KEY=B.DAY_KEY
GROUP BY
CONVERT(VARCHAR,B.QUARTER)+CONVERT(VARCHAR,B.YEAR)
)MAX_QUARTER
ON B.FULL_DATE=MAX_QUARTER.MAX_QUARTER
LEFT JOIN
(
SELECT
MIN(B.FULL_DATE) AS MIN_YEAR
FROM
FCT_PART_INVENTORY
A
INNER JOIN DIM_DAY B ON A.DAY_KEY=B.DAY_KEY
GROUP BY
B.YEAR
)MIN_YEAR
ON B.FULL_DATE=MIN_YEAR.MIN_YEAR
LEFT JOIN
(
SELECT
MAX(B.FULL_DATE) AS MAX_YEAR
FROM
FCT_PART_INVENTORY
A
INNER JOIN DIM_DAY B ON A.DAY_KEY=B.DAY_KEY
GROUP BY
B.YEAR
)MAX_YEAR
ON B.FULL_DATE=MAX_YEAR.MAX_YEAR
LEFT JOIN DIM_TIME_FLAG
Z
ON CASE WHEN MIN_MONTH.MIN_MONTH IS NULL THEN 'N' ELSE 'Y' END =Z.MIN_MONTH_FLG
AND CASE WHEN MAX_MONTH.MAX_MONTH IS NULL THEN 'N' ELSE 'Y' END =Z.MAX_MONTH_FLG
AND CASE WHEN MIN_QUARTER.MIN_QUARTER IS NULL THEN 'N' ELSE 'Y' END =Z.MIN_QUARTER_FLG
AND CASE WHEN MAX_QUARTER.MAX_QUARTER IS NULL THEN 'N' ELSE 'Y' END =Z.MAX_QUARTER_FLG
AND CASE WHEN MIN_YEAR.MIN_YEAR IS NULL THEN 'N' ELSE 'Y' END =Z.MIN_YEAR_FLG
AND CASE WHEN MAX_YEAR.MAX_YEAR IS NULL THEN 'N' ELSE 'Y' END =Z.MAX_YEAR_FLG
With this table now integrated into this schema, we can not only use it at the level of day, but now month, quarter and year all using the same fact. For example, if we wanted to view data across days we can ignore the DIM_TIME_FLAG and simply filter our query across DIM_DAY and report on something that looks like this:
Figure 4. Daily Parts Inventory |
If we need to report on trending across months we could query the exact same snapshot fact, but this time join to DIM_TIME_FLAG where MIN_MONTH_FLG or MAX_MONTH_FLG=Y (depending of if you want to trend over months based on what the inventory looked like at the beginning of the month or end of the month). This would produce a report that would look something like this:
Figure 5. Monthly Parts Inventory |
The same could be done by quarter, as well, using the quarter flags:
Figure 6. Quarterly Parts Inventory |
And finally by year:
Figure 7. Yearly Parts Inventory |
This design can come in very handy, especially when building dashboards and executives want to be able to trend across the different time frames. It also negates the need to create separate snapshot tables that capture this data for each of these time frames(i.e. a snapshot for months, a snapshot for quarters, etc.).
No comments:
Post a Comment