Data integration can be accomplished at an enterprise level, departmental level, or lower level.
The most common method of “data integration” is the Data Warehouse (DW). A DW can exist at an Enterprise level (EDW) or at lower hierarchical levels. At these lower levels, DW is really a misnomer as these data stores provide data for specific purposes whereas a DW, especially at the enterprise level, should provide a non-application-specific “single source of truth
Data stores at less than enterprise level are more appropriately called Data Marts (DM). DM are designed to answer specific business questions or serve a specific departmental need. These DM may be populated from the operational databases directly or, more common today, from the EDW, enforcing the “single source of truth” concept.
I have done “data integration” my entire data career but it has only recently become a named activity with the book mentioned above being the oldest publication I have found using the term (2002)
The process of getting data items from the source systems into the DW is complicated and requires significant research, analysis, and understanding of the source system's data stores. This has recently become known as “Data Profiling” but has been done “as a matter of fact” since data stores were first created. The actual movement of data is done by ETL procedures.
ETL means “Extract, Transform, and Load
The term “data integration” has recently been expanded to include sharing data directly from several operational source systems into one single presentation layer to provide real-time information. Customer information such as addresses and phone numbers may be kept in one data store, while customer sales may be kept in a different data store. Customer credit card information may be in a third data store. Gathering pieces of data from these three data stores into a single presentation layer such as a real-time Web-based GUI may provide customer service personnel the ability to provide accurate and timely help in resolving disputes while enabling instant updates and corrections to data in all three data stores.
Getting real-time data from operational data stores may be detrimental to the source system's performance. Therefore, “near real time” has become the norm. Data can be exported from the operational data stores into an Operational Data Store
No comments:
Post a Comment