Sunday, March 28, 2010

Data Modeling


Data Modeling is a way of describing the tables (entities) and relationships in a database schemata (see other blogs with "schema" tags). (By the way, both "modeling" and "modelling" are correct spellings. The first is more American English and the second is British English; the British claim they are right.) Because a "picture speaks a thousand words", data models help database users to more easily understand the schemata they are interested in.
There are many kinds of models, but the three most common are: Conceptual, Logical, and Physical.

Data Model Hierarchy

CONCEPTUAL

A conceptual data model is the highest level data model. It is usually created early in a database project as, or shortly after, data requirements have been gathered in what is known as the Plan and Analyze phase of the SDLC. It shows the major data entities the project needs and how they relate to each other. Entities include objects like Customers, Employees, Products, Sales, Orders, etc. Sometimes, some entities are made to be a child entity of a broader entity like Customers could be a child entity of People. The names used to describe an entity are the ones a business would use, such as Customer, not the names the table in the physical data schemata would be, such as TB_CUST.
A good definition of Conceptual Data Modeling can be found in "Conceptual Modeling - ER 2002" by Stefano Spaccapietra and Salvatore March (Springer)
"The task of conceptual modelling is to build a high-quality representation of selected phenomena in some domain. The conceptual models that result facilitate the implementation, operation, and maintenance of information systems."
 LOGICAL

The next level in data modeling is the Logical data model. It is created in the Design phase of the SDLC.

At this level, data items that define what makes up an entity are added. For example, a Customer entity will have data items such as Name, Address, City, State, ZIP, etc. These data items are called "attributes" of the entity.

The logical model will show how the entities relate to each other through common attributes. These relationships will be in business terms such as "An order has a customer" and "A customer may have one or more orders."

The relationships will be connected by lines that depict what is called the "cardinality" of the relationship, or how many elements there can be in the set. For example, every Order must have one and only one customer; however, a customer can have zero or many current orders. The Customer-to-Order relationship is an optional one-to-many cardinality, while the Order-to-Customer relationship is a mandatory one-to-one cardinality.

PHYSICAL

The physical data model is the graphic depiction of the actual database schemata that contains the entities (tables) and attributes (columns) specified in the Logical data model. Sadly, many database schemata are built directly without the planning and design models being created. The physical data model should be the result of collaboration between the Data Architect (or Modeler) and the Database Administrator (DBA). The Data Architect knows what data entities and attributes are needed by the application and how they relate to each other. The DBA knows the database management system (DBMS or RDBMS) in which the data will be stored. Together, these two can design and build the physical database schemata that will provide the fastest I/O performance, be maintainable by the DBA, and will be scalable for future growth. (Scalability is a term used to indicate how well the system will be able to handle increased data volumes and more users of the database.)

Data Modeling Tools

In the beginning, these models were created by hand drawings using stencils for the limited number of shapes and lines used at the time. As models became more complicated, software to create data models was developed. At first, these software applications were no more than diagramming tools.

These simple diagramming tools have become sophisticated data modeling applications that included data dictionaries, metadata, scripting languages, requirements tracking, database definition language (DDL) generators and much more. These mega-modeling tools can be very expensive. The two most common ones that I know of and have used are Sybase's PowerDesigner and CA's Erwin.

These are great tools for large companies, but are there data modeling software tools available cheaply or freely for those who want to learn modeling skills or do data modeling for small to medium sized businesses?
Some database management systems (DBMS) come with there own data modeling tools. A couple DBMS can be downloaded freely for personal education and training. Both Oracle and MS SQL Server allow this and I believe they both have built-in data modeling tools, but it has been a long time since I used them myself so this may not be true any more.

Some people use Visio for data modeling, but others claim it is not a "real" data modeling tool. It certainly has limitations, but you can draw database schemata diagrams with it. I have found it to be useful for simple data models, but it has none of the extended features mentioned above.

Below are links to various web sites that list free and/or inexpensive data modeling tools:

No comments:

Post a Comment