In the perfect scenario, these metrics will be used to determine the best database management system (DBMS) to use. Many times, however, the target DBMS has already been chosen; in this case, these metrics will provide the basis for scalability and risk assessment. These metrics include:
- Number of entities (tables)
- Number of attributes (columns)
- Volume metrics
- Initial
- Table size (row count and bytes)
- Connection Numbers
- Total
- Average concurrent
- Maximum concurrent
- Application / system users
- Power users
- Growth rates for above: 1 year, 3 years
So we all might gain understanding of what is available and the best DBMS uses, what is your experience? Consider the following points and share what DBMS was used and how it met, or failed to meet, expectations.
- Purpose of database (EDW, DW, Data mart, ODS, reporting, OLAP, OLTP, etc.):
- Database structure (Normalized, Dimensional, Snow Flake, etc.):
- Database size in TB:
- Number of tables in database:
- Number of columns in database:
- Total number of users:
- Total number of power users:
- Maximum number of connections:
- Average number of connections:
- DBMS or RDBMS used:
- Expectations met or not for:
- Database growth in TB?
- Query performance?
- Performance tuning?
- Connectivity?
- Database maintenance (DBA)?
- Other comments:
My experiences:
- Purpose of database: Enterprise Data Warehouse (EDW), Single source of truth; was source for other data marts, reporting data stores, OLAP.
- Database structure: Normalized, 3NF
- Database size in TB: Greater than 100 TB
- Number of tables in database: More than 1000 tables; more than 3000 views.
- Number of columns in database: More than 50,000
- Total number of users: More than 1000
- Total number of power users: More than 100
- Maximum number of connections: More than 500
- Average number of connections: More than 100
- DBMS or RDBMS used: Teradata
- Expectations met or not for:
- Database growth in TB? Yes. Additional disk space seemed fairly easy to add. Adding more servers (AMPS) increased disk space, provided more CPU processing, and more user connectivity.
- Query performance? Very good.
- Performance tuning? Performance dependent on Primary Indexes (distribution), partitioning, secondary indexes, SQL tuning, statistics collected.
- Connectivity? Batch programs, user direct, and user web-based.
- Database maintenance (DBA)? Faster to add or drop columns to/from tables by creating new table and copying data from old table to new table than using ALTER TABLE commands. Obviously, not all steps to do this are itemized here.
- Other comments: By far, the best RDBMS I have experienced for VLDB. Most everything was done in massively parallel processing. Data marts, reporting data stores, etc. were derived from physical database via views to create logical data stores. Response times were excellent; in those few cases where multi-table joins could not be improved, a physical table was created and loaded via batch processing.