Data-centric Discussions: October 2010

As a data architect, I am database agnostic until reaching the physical implementation. During the conceptual and logical modeling phases, I will have gathered metrics that will help predict the scalability needed for the physical implementation.
In the perfect scenario, these metrics will be used to determine the best database management system (DBMS) to use. Many times, however, the target DBMS has already been chosen; in this case, these metrics will provide the basis for scalability and risk assessment. These metrics include:

Number of entities (tables)
Number of attributes (columns)
Volume metrics

Initial

Table size (row count and bytes)
Connection Numbers
- Total
- Average concurrent
- Maximum concurrent
- Application / system users
- Power users

Growth rates for above: 1 year, 3 years

The data architect may or may not be involved in the physical implementation process, depending on the company's organizational hierarchy. Assuming that I will be involved with this, I don't need to be expert in all the available DBMS, but I should have some knowledge and understanding of them based on past experience. But I won't be able to keep up with all the DBMS out there, especially the niche systems. Some open source systems are also available and used. A data architect needs to work closely with the DBMS subject matter experts (SME). The most important SME to me are those who have actual experience using and supporting a DBMS and the type of data store being used on that DBMS.

So we all might gain understanding of what is available and the best DBMS uses, what is your experience? Consider the following points and share what DBMS was used and how it met, or failed to meet, expectations.

Purpose of database (EDW, DW, Data mart, ODS, reporting, OLAP, OLTP, etc.):
Database structure (Normalized, Dimensional, Snow Flake, etc.):
Database size in TB:
Number of tables in database:
Number of columns in database:
Total number of users:
Total number of power users:
Maximum number of connections:
Average number of connections:
DBMS or RDBMS used:
Expectations met or not for:

Database growth in TB?
Query performance?
Performance tuning?
Connectivity?
Database maintenance (DBA)?
Other comments:

My experiences:

Purpose of database: Enterprise Data Warehouse (EDW), Single source of truth; was source for other data marts, reporting data stores, OLAP.
Database structure: Normalized, 3NF
Database size in TB: Greater than 100 TB
Number of tables in database: More than 1000 tables; more than 3000 views.
Number of columns in database: More than 50,000
Total number of users: More than 1000
Total number of power users: More than 100
Maximum number of connections: More than 500
Average number of connections: More than 100
DBMS or RDBMS used: Teradata
Expectations met or not for:

Database growth in TB? Yes. Additional disk space seemed fairly easy to add. Adding more servers (AMPS) increased disk space, provided more CPU processing, and more user connectivity.
Query performance? Very good.
Performance tuning? Performance dependent on Primary Indexes (distribution), partitioning, secondary indexes, SQL tuning, statistics collected.
Connectivity? Batch programs, user direct, and user web-based.
Database maintenance (DBA)? Faster to add or drop columns to/from tables by creating new table and copying data from old table to new table than using ALTER TABLE commands. Obviously, not all steps to do this are itemized here.
Other comments: By far, the best RDBMS I have experienced for VLDB. Most everything was done in massively parallel processing. Data marts, reporting data stores, etc. were derived from physical database via views to create logical data stores. Response times were excellent; in those few cases where multi-table joins could not be improved, a physical table was created and loaded via batch processing.

Data-centric Discussions

Friday, October 29, 2010

Database Management Systems Best Uses

My experiences:

Topics

Blog Archive

About Me

Data-centric Discussions

Friday, October 29, 2010

Database Management Systems Best Uses

My experiences:

Topics

Subscribe To

Blog Archive

About Me