Organizations rarely start out with a complicated data infrastructure. Small organizations often start out with just a very small-scale system, perhaps only a few key applications.
These systems must exchange data, but since only a few systems are involved, the data exchange processes between these applications are easier to implement and manage.
Figure 1: Phase 1 of the Corporate Information Factory
However, as the organization grows and matures, new systems are introduced. For example, the corporate website grows from just a few pages to a true online presence with online ordering, account checking, and order checking. New data extract, transform and load (ETL) processes must be added to support this online presence, moving data between the accounting system and other systems. Perhaps next, a CRM system is implemented to better track customers. More ETL processes are needed to push data in and out of the other systems to support this. Beyond that, a more robust order fulfillment is implemented with even more ETL necessary to support it. And so on and so on.
What was once a small network with only a few applications exchanging information has grown into a complex “spaghetti” of ETL processes.
The “Spaghetti ETL” Stage
A spaghetti ETL solution occurs when applications must share data with several other applications. In this type of environment, these same applications must also receive information from several other applications as well. Spaghetti type ETL environments did not start out with this idea in mind; they evolved that way, often over the course of years.
Figure 2: Phase 2 of the Corporate Information Factory – Spaghetti ETL
There are many problems with a spaghetti ETL solution. The difficulty is primarily manifested in managing the overall system:
- Timing of the daily data movements and the daily synchronization process can become intricate and convoluted. Data movements between specific systems must be carefully planned and executed in the right order.
- It’s difficult to add new applications. To add a new application to the environment depicted in Figure 2, multiple ETL processes to different systems must be added.
- Errors have a cascading effect on other systems. Often, a data error must be fixed in multiple systems, not just one. Keeping all the systems in sync can be a nightmare.
- It’s difficult to understand “the truth,” because each system may have its own version of the truth. If there is an error in customer data, it is hard to track down where the data is wrong, and where the data is right.
Introducing the Operational Data Store (ODS)
In order to address the spaghetti ETL paradigm, many organizations create an operational data store (ODS). An ODS is operational in nature. An ODS is a central repository of organizational data that acts as a centralized data hub for other systems. The applications still act as the authoritative source for a specific set of data.
Figure 3: Stage 3 of the Corporate Information Factory – Implementing an ODS
With the introduction of a typical ODS, applications stop sharing information directly with each other, and instead only move data into and out of the ODS.
A further benefit of an ODS is that parts of the ODS ETL can be standardized and shared across systems. For example, the ODS can expose a standardized interface through which all systems may submit a “new customer” transaction. Likewise, all applications requiring new customer data can do so through a standardized interface to the ODS.
Below are some additional characteristics of an ODS:
- Because it is operational in nature, an ODS provides reports on Operations as of today.
- The data model is typically 3rd Normal Form.
- The source data sources still “own” the data.
- An ODS is easier to troubleshoot. Each system can focus on properly moving only its necessary data into and out of the ODS.
Adding the Data Warehouse
As discussed previously, an ODS can provide a significant benefit to operations. However, it is important to keep in mind that an ODS is operational in nature. Typically, an ODS does not keep an infinite amount of history. Additionally, an ODS often does not typically provide the ability to perform analytic/trend reporting.
A data warehouse, on the other hand, is managerial in nature. A data warehouse will hold more history than an ODS, and thus will provide the ability to measure operations over time.
A data warehouse often consists of smaller “data marts.” Under some methodologies, a data warehouse is a collection of smaller, subject-oriented data marts. For example, Marketing/Sales is often the first department to adopt a data mart for the purpose of analyzing sales data by a variety of different metrics such as sales by region, sales by customer type, customer analysis, sales trends, etc. Later, data marts for operations are added to analyze metrics such as product defects over time, product costs over time, etc. As more data marts are added, the more robust the overall data warehouse becomes.
In summary, a data warehouse has the following characteristics:
- Multidimensional data model
- Managerial in nature, not operational
- Oriented to statistical/trend analysis
- More reliant on historical data
- Most often sourced from the ODS
Figure 4: Stage 4 of the Corporate Information Factory
Does your organization need an operational data store? Remember, an ODS is operational in nature. The key benefit of an ODS will be more efficient and simplified ETL processes across the enterprise. Instead of having to share data with all of the various systems throughout the enterprise, each application need only concern itself with moving data into and out of the ODS.
Or, does your organization need a data warehouse? Where an ODS is operational in nature, a data warehouse will assist in analytical reporting. A typical evolution is that the ODS is created first, and then the data warehouse is created, sourcing its data from the ODS.
For more information on business intelligence and data solutions, visit our BI thought leadership page to browse RDA’s on-demand webcasts and complimentary white papers.