Ravi: Data Warehouses and Multidimensional Modelling

1. Introduction – Reasons for Emergence of Data Warehouses

The Database and the Database Management System (DBMS) are an important part of a DSS.

In the 1950’s and 1960’s application programs developed their own application orientated file structures.

This approach created physical data dependence so that if there was a change in the physical storage of the data then the application program needed to be changed as well. There was also redundancy amongst the data as many applications might use the same data but in a different format. Even consolidated files did not overcome this problem as some applications had to take precedence over others when conflict arose. Where a new application needed a different logical view of the data from those currently in use, then changes might have to be made to several programs to accommodate this new view. Thus the data was neither physically nor logically independent of the application programs.

The development of Relational Database Management System (RDBMS) overcame these problems. It ensured data independence and allowed controlled redundancy. It meant that an organisation could more closely control data as an organisational resource and ensure its adequacy and integrity.

However, there are many reasons why transactional or operational RDBMS’s are not suitable for DSS. These are:

They Lack of Historical Data
They Lack Data Stability
Data from different Databases may be Inconsistent with Each Other
The different Database may be Operating on different Platforms and Written in different Languages
They are not Optimised for DSS but for Transaction Processing Efficiency and Throughput (thus, they are highly normalised)
Though RDBMS’s are very efficient in Querying a Single Record, they are not so Efficient when it comes to Accessing Several Million Records in Order to Answer a User Query

DSS’s are more concerned with answering managerial queries such as “What was the effect of our sales promotion on the sale of televisions in the North-East of England in the month of December?”

To partially overcome these problems, DSS’s have tended to have their own specific relational databases. Thus, special software would extract data from the transactional database at specific points of time, “scrub and clean” it if required and place it in the DSS database for further analysis. Although this procedure is preferable to using the transactional database itself, the data is still too sparse and too specific for a complete analysis or to service flexible ad hoc queries.

It was because of this inadequacy that data warehouses began to appear on the scene.

2. Data Cubes

Figure 43 illustrates an example Sales Data Cube:

Clearly the data model can only be visualised as a cube if there are only three dimensions. However, even when there are more than three dimensions it is often referred to as a “cube.” Another way of taking a multidimensional view of business data is in the form of a STAR and in this guise it is known as a Star Data Model. This is shown in Figure 44 below:

The advantage of a multidimensional model over a relational one can be shown in the example in Table 20 below.

Table 20 shows the sales volumes for cars according to model, colour, dealership and sales volume. It is presented as a relational table.

The same data is shown below in Figure 45 but in a Cross-tab view or Data matrix:

The matrix is an example of a two-dimensional array.

Figure 45 is clearly easier to comprehend than Table 20.

The user can see immediately that there are two dimensions with three positions for each. It is more difficult to see this from the relational table. In addition, Table 20 groups similar information into rows and columns. For example: all the information on sports cars is lined up and can easily be added up to get a total figure for sales. It is also easy to compare the sales of different colour of sports car. Thus the above cross-tab structure represents a higher level of “intelligence” in the way the data is structured so that it aids comprehension. More importantly the multidimensional view gives performance advantages over the relational view when it comes to answering the type of query that a DSS is likely to generate.

In the simple example shown above it can take a maximum search of nine records to find the information you want from the relational table. However, in the cross-tab table it would only take a maximum of six positions. The performance advantage becomes greater the larger the database.

A 1000-row relational table may mean 1000 records to search and certainly 500 on average. As a data cube with three dimensions with 10 positions along each dimension, the maximum search is 30 positions with an overall average of only 15.

3. Rotation

It is clear that the cross-tab view is similar to a spreadsheet and it has the additional advantage that the view may be “rotated” to give a different perspective should a particular user query require this. For example: in Figure 43 above the view is MARKET by PRODUCT by TIME. However, the data cube may be rotated through 90 degrees, twice, to give PRODUCT by TIME by MARKET. Other perspectives are also possible, for example: there are six different possible views:

MARKET by PRODUCT (with TIME in the background)
PRODUCT by MARKET (with TIME in the background)
MARKET by TIME (with PRODUCT in the background)
TIME by MARKET (with PRODUCT in the background)
PRODUCT by TIME (with MARKET in the background)
TIME by PRODUCT (with MARKET in the background)

Each rotation is also referred to as data slicing, as each rotation presents a different two dimensional “slice” of the data.

A four-dimensional array would have 24 different views. Multidimensional technology would make all views easily accessible which would be far easier than having to sort a large, relational table with all the attendant inefficiencies.

4. Ranging

A multidimensional database also allows the user to focus on, or range into the exact view of the data required.

For example, Figure 43 above, we can range in on the middle product being sold, in the second market up from the bottom and view the sales volume over the entire time dimension.

This sub-set of the data cube is sometimes known as dicing the data cube.

The performance of ranging in a multidimensional database to answer to a user query is much better than in a relational database. Thus, the expression slicing and dicing is often heard in relation to multidimensional databases.

5. Implementation

Multidimensional data models can be implemented by conventional RDBMS or by multidimensional database technology. However, RDBMS’s are not so efficient as the newer technology for summarising a large number of records. For example: a typical DSS query may require summarising 30,000 records on the fly.

The typical retrieval rates for commercial SQL databases are 200-300 records per second. This query would take about 2 minutes to process on an IBM mainframe using DB2. To summarise over 200,000 records would take over 10 minutes. This response time is usually unacceptable to senior management. Very often these problems are met by running batch jobs overnight but there is a limit to the number of hours available for such activity. In addition these predetermined batch runs are anathema to a DSS which is meant to be flexible enough to deal with such ad hoc queries.

Fortunately, specially designed Multidimensional Database Management Systems (MDMS) are now available. Some of the companies in this field are Pilot Software, Comshare and Arbor Software. These database systems can summarise data much faster than RDBMS’s. In a recent test by David Friend of Pilot Software, a Pilot Lightship Server (MDDS) ran a large summarisation batch job in 3 minutes. The same batch job run using Sybase (RDBMS) took 13.4 hours. A 50Mhz, 486 machine was used in the test.

However, the RDBMS is still more efficient for transaction processing.

Databases which are essentially lists and where queries involve finding a single record are still best organised under a RDBMS.

Reference(s)
	Book	Campbell, D. J. & Craig, T. (2005) Organisations and the Business Environment. 2nd Edition. Elsevier: Netherlands, North Holland, Amsterdam. [ISBN: 9780750658294]. [Available on: Amazon: https://amzn.to/3VHJupz].
	Journal	Geoffrion, A. M. (1987) An Introduction to Structured Modeling. Management Science, Volume: 33, Issue: 5, Page(s): 547-588. [doi: 10.1287/mnsc.33.5.547]. [Available on: INFORMS: http://pubsonline.informs.org/doi/abs/10.1287/mnsc.33.5.547].
	Conference	Jelassi, T., Klein, M. R. & Mayon-White, W. M. (1992) Proceedings of the IFIP TC8/WG 8.3 Working Conference on Decision Support Systems: Experiences and Expectations. In: Jelassi, T., Klein, M. R. and Mayon-White, W. M., of editor(s) of the: IFIP TC8/WG 8.3 Working Conference on Decision Support Systems: Experiences and Expectations, 30th June - 3rd July 1992. Elsevier: France, Île-de-France, Seine-et-Marne, Fontainebleau, Volume, Page(s): 333. [Available on: ScienceDirect: https://www.elsevier.com/books/decision-support-systems-experiences-and-expectations/jelassi/978-0-444-89673-5].
	Book	Kimball, R. (1996) Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses. John Wiley & Sons: United States of America (USA), New York (NY). [ISBN: 9780471153375]. [Available on: Amazon: https://amzn.to/3TUghpw].
	Book	Pfaffenberger, B. (2002) Computers in Your Future 2003. 5th Edition. Prentice Hall: United States of America (USA), New Jersey (NJ), Bergen, Upper Saddle River. [ISBN: 9780139227820]. [Available on: Amazon: https://amzn.to/3gv8n7D].
	Book	Sauter, V. L. (1997) Decision Support Systems: An Applied Managerial Approach. John Wiley & Sons: United States of America (USA), New York (NY). [ISBN: 9780471173359]. [Available on: Amazon: https://amzn.to/3gDvD3h].
	Book	Turban, E. & Aronson, J. E. (2000) Decision support systems and intelligent systems. 6th Edition. Prentice Hall: United States of America (USA), New Jersey (NJ), Bergen, Upper Saddle River. [ISBN: 9780130327239]. [Available on: Amazon: https://amzn.to/3SALNI5].

Reference (or cite) Article
	Kahlon, R. S. (2013) Data Warehouses & Multidimensional Modelling [Online]. dkode: United Kingdom, England, London. [Published on: 2013-02-12]. [Article ID: RSK666-0000107]. [Available on: dkode \| Ravi - https://ravi.dkode.co/2013/02/data-warehouses-multidimensional.html].

Ravi

Pages

12 February 2013

Data Warehouses and Multidimensional Modelling

No comments:

Post a Comment