Thursday, August 1, 2013

Implementing an Aging Bucket Dimension in Your Schema

     For any organization that has any kind of service ticketing system, for either internal or external customers, similar questions are usually raised regardless of the service provided. A major concern for these organizations is issue aging. How many tickets are open now, how long have they been open, on average how long do certain issues stay open. If using a dimensional design pattern, modeling this scenario is easy in that we can create a measure in a fact that will show the amount of days between the open and close date of an issue. However; management likes to "bucket" aging in order to categorize and prioritize. In order to accommodate a business requirement such as this, and make it easy for report developers, we can create an aging bucket dimension that will categorize these issues for us. This bucket would contain the ranges specified by the business and may look something like this:


(1, '0 DAYS', -1, 0),
(2, '1-3 DAYS',1,3),
(3, '4-5 DAYS', 4,5),
(4, '6-15 DAYS', 6,15),
(5, '16-30 DAYS', 16,30),
(6, '31-60 DAYS', 31,60),
(7, '61-90 DAYS', 61,90),
(8, '91-120 DAYS', 91,120),
(9, '121-180 DAYS', 121,180),
(10,'181-360 DAYS', 181,360),
(11,'OVER 360 DAYS',361,50000)


     For this example were going to use data from an imaginary table that has staged issue ticket data. This data contains the issue identifier (a degenerate dimension), the natural key for the transaction type, the natural key for the employee assigned to the issue, the natural key of the customer associated to the issue, and the open and close dates of the issue. We can represent this with this sample data:


(1, 2, 5, 7, '1/1/2012','1/5/2012'),

(2, 3, 6, 9, '5/1/2012','6/20/2012'),

(3, 8, 22, 61, '12/22/2012','4/1/2013'),

(4, 7, 11, 13, '7/9/2013','7/22/2013'),

(5, 9, 22, 47, '7/23/2013',NULL)


We want to be able to load this data into a star schema that will utilize our bucket dimension. This schema may look something like this:

Figure 1. Star Schema with Bucket Dimension

     Now in order to populate the fact table we would need to join the staging table to the dimensions on the natural keys and retrieve the surrogate keys from these dimensions. The measures (DAYS_OPEN) would simply be a datediff between the open and close dates and (NUMBER_TRANSACTION) will always be a 1 so we can sum on this across the dimensions. This leaves us with one problem, how do we join to DIM_AGING_BUCKET if we have no natural key in either the dimension or the staging table? We in essence are forced to join on a range. This SQL statement will gather up our surrogate keys, calculate our measure and handle the "range join" we need for our bucket dimension:



     As long as your bucket ranges do not overlap, a join based on a range should be fine. Now you can group by the buckets in DIM_AGING_BUCKET while summing on the NUMBER_TRANSACTION field to display the number of issues in the bucket.