Showing posts with label Data Architect. Show all posts
Showing posts with label Data Architect. Show all posts

Tuesday, September 14, 2010

Should a data architect be more adept on the logical or the physical model? What are important traits of a data architect?

I recently replied to a LinkedIn group discussion asking the question in this post's title. 

For the complete discussion, use the following link.
http://www.linkedin.com/groupAnswers?viewQuestionAndAnswers=&discussionID=2551716&gid=108904&trk=EML_anet_qa_ttle-0Nt79xs2RVr6JBpnsJt7dBpSBA

My Answer:

To use a house analogy, would you want to hire an architect to design your house who had no idea what was needed to actually build it at the "physical" level or what the zoning and building requirements were for the location you chose?  You wouldn't?  Then why would you expect a data architect NOT to understand what is needed at the physical level?  That is not saying that a data architect has to have experience as a DBA or database engineer, but, since the physical data model and implementation is the ultimate goal, a data architect should understand what is needed for the physical design.

Depending on the organization a data architect is working at, the actual tasks given to the data architect may vary.  My list below shows what I think are the duties of a data architect in their order of importance.  I will not go into in-depth explanations for each step.  I know of organizations that limit a data architect's role to items 1 and 2, while others include 1 through 4, and some include all of them.
  1. Understand the data requirements.  Get agreement from client that data requirements are complete and understood.
  2. Create Conceptual Data Model that documents the data entities needed and how they relate to each other.  Confirm with the client that the entities are complete and fulfill the high level data requirements
  3. Create Logical Data Model that documents the attributes assigned to fully describe each entity.  Confirm with the client that sufficient detail is captured for each entity.  Also, plan for future growth and needs by showing how new data types can be added without requiring a new design (using super-type, sub-type, lookup tables, etc).
  4. Provide traceability matrix to show how each data requirement has been met in conceptual and logical data models.
  5. Determine and document volume and user constraints.  Determine initial load volumes and future growth rates.  Determine initial user load: total users, maximum (peak) users, how connected, administrative users, power users, limited access users, ....  Determine future user growth rates.  These are important for physical database selection.
  6. Determine suitable database engine needed.  This task should not be done until steps 1 through 5 have been done, but quite often, the client has already decided what the database engine will be.  Results from Step 5 will then be useful to forecast the database engine's future scalability.  Clients should at least know what the risks are based on their choice of database engine.
  7. Create Physical Data Model.  Working with the DBA and/or other experts for the chosen database engine, transform the Logical Data Model into a physical data model.  It is unusual that a Logical Data Model can be converted directly into a "best fit" physical data model.
Other duties of a data architect may include:
  • Enterprise standards: naming, modeling, architecture, data warehouse, more.
  • Data profiling.
  • Gap analysis.
  • Data dictionary.
  • Data governance.
  • ETL design
  • Source-to-target data mapping.
If you look at recent postings for Data Architect positions, you will see many of them requiring expertise with the already chosen target database engine, reporting / BI tools, CASE modeling tools, etc.  But I do not agree that a data architect's duties and requirements include these tools, but rather knowing what needs to be done using these tools.  Tools can be learned and are an aid to doing what needs to be done.

I have elaborated on the duties of a data architect on my blog site.

Data Architect, Duties of a
http://itpro420.blogspot.com/2010/09/data-architect-duties-of.html
Data Architect Job Description
http://itpro420.blogspot.com/2010/02/data-architect-job-description.html
Data Architect versus Database Architect
http://itpro420.blogspot.com/2010/01/data-architect-versus-database.html

Thursday, September 2, 2010

Data Architect, Duties of a

I have read many job postings seeking a Data Architect.  Most had a few data architect duties included, but specified some duties of many data-related positions not normally considered as part of the data architect role; seems like they were looking for one person to do the work of multiple specialists.  It seems to me that will get a company a "Jack/Jill of all trades, master of none."

While most Data Architects will have had experience in multiple data-centric roles doing multiple data-related tasks, few (none?) of us will have kept up-to-date skills in many of those skills.  While understanding the roles of those other data specialists we will be working with is crucial, we have moved our focus to a much higher elevation.  We should be able to mentor and assist other data specialists in getting their assigned duties accomplished, but we do not do the actual work involved (except in a critical, unplanned, resource shortage).

I have gleaned the following list of duties from dozens of different position position postings for Data Architects from the Internet.  I have included only those I feel are germane to the postion.
  • Data Architect to assist with data mappings, data modeling, data profiling, query design, data flow design, etc between multiple databases across multiples platforms. Data Architect will be support and contribute to DI/ETL architecture and act as data SME (subject matter expert) with respect to data profiling assessments and participate in JAD session/interviews.
  • Perform data profiling and analysis to assess data quality/patterns, recommend data cleansing rules, conforming data standardardzing rules, and matching algorithm changes to [business] and data bus architecture (dimensions/facts)
  • Database query Performance and Tuning expertise (P&T) on SQL statements.
  • ETL Workflow and data mapping analysis, review, and tuning.
  • Strong design artifact experience in documentation of data flow and data mapping and data exchange specifications.
  • ETL development experience and technical adeptness at reviewing and initial prototyping tasks.
  • Has solid project lifecycle, database design experience, and complimentary Business Analyst skills.
    (Although, I limit the BA skills to those needed to define business data items needed for database structure.  I work with the BA to get these.)
  • Responsible for the creation of the following work deliverables:

    • Estimate for resources needed, effort needed, and target milestone dates.
    • Data requirements documents, including traceability.
    • Weekly status reports from my team
    • Compiled weekly status report to project manager for team's work.
    • Conceptual data model, including metadata at the entity level
    • Logical data model, including metadata at the entity and attribute level
    • Data Profiling/Gap Analysis
    • Data Dictionary
    • Data Flow Diagram
    • Physical data model, including metadata at the table, column, and view level
    • Volume metrics (expected initial load and 3-year and 5-year forecasts)
    • DDL Scripts (generated from modeling tool such as PowerDesigner or Erwin)
    • ETL Data Mapping (ultimate source to ultimate target, not every hop along the way)

  • Primary Responsibilities:

    • Building conceptual, logical and physical database designs
    • Participating in data development and analysis, and data dictionary management
    • Ensuring data consistency and accuracy across all systems
    • Collaborating with business analysts, ETL developers and other key internal personnel on a wide scope of data architecture functions

Tuesday, February 2, 2010

Data Architect Job Description

I found this definition of a Data Architect on CareerBuilder.
Defines, designs, and builds relational and/or multi-dimensional databases for warehousing of data. Confers with client, technical staff, and team members to plan, design, develop, implement, and enhance applications, scripts, procedures, and metadata for relational databases. Reviews current data structures and recommends optimizations and reconfigurations as warranted. Generates data acquisition and archive recovery strategies. Works with Database Administrators to migrate complex large scale applications into test and production environments. Integrates new data with existing warehouse structure and continuously improves system functionality. Evaluates and recommends new tools and methodologies. Consults with vendors to integrate selected packages. Deletes old data. Develops and maintains all standards in accordance with data structures and models. May work with organization departments to design and develop data marts. (Source: eDOT Job Description)
Do you agree with this definition? Does it include too much or not enough? I agree with most of it, but I question or disagree with the following points:
  • archive recovery strategies
    This is more of a DBA or Disaster Recovery role
  • continuously improves system functionality
    Not quite sure what is meant by this; seems to be DBA or development team role.
  • Evaluates and recommends new tools and methodologies
    What new tools and methodologies? I might do this for data modeling tools like Erwin or PowerDesigner if they are looking for this and can do this for data architecture-related practices such as data governance, data quality, naming conventions, modeling standards, and so forth.
  • Consults with vendors to integrate selected packages.
    This is Solution, System, or Application Architect role.
  • Deletes old data.
    This is done by the user, application, or DBA.

Wednesday, January 6, 2010

Data Architect versus Database Architect

Introduction

It seems that many people don't understand the difference between a Data Architect and a Database Architect and the two titles are used interchangeably.  I, however, view them as separate skill sets and positions.
These web sites are examples of how these terms are used.
http://www.databaseguides.com/database-architecture
This definition of database architecture seems to start out by describing the physical infrastructure of a database system, but then switches to the architecture of the physical data model or data schema (the structure of how the data will be stored).
http://www.wisegeek.com/what-is-a-data-architect.htm
This person's definition of a Data Architect is consistent with mine.

Confusion

Perhaps the confusion begins with the term database.  Is a database the structure (schema) of how the data will be stored (Hierarchical, Object, Relational, Dimensional, CODASYL, etc.) or what I call the "engine"? 

The database engine is the software application, or the Database Management System (DBMS), that allows us to store the data.  The most common database engines today are used for Relational Database Management Systems (RDBMS).  These include Teradata, Oracle, and MS SQL Server.  Within each DBMS, there may be multiple schemata, or collections of data. (Each individual collection of data within a DBMS is called a schemata; the plural is schema, but the plural form is often used in the singular.)

The term database seems to be freely used for both the schema and the DBMS (engine) and the distinction only becomes apparent through the context the word is used in.  Therefore, I feel there should be a clear delineation between what a Data Architect does and what a Database Architect does.

Data Architect

A Data Architect (DA) is someone who focuses first on the data itself and how it will be used and maybe later on how it will be stored (schema) on the chosen DBMS.  It is the DA who strives to ensure that the "Garbage" in "Garbage In, Garbage Out" (GIGO) won't exist.  The DA should start working at the very beginning of a project in the Plan and Analyze phases.

At this point, the DA will be working with the Business Analyst (BA) and the client stakeholders to get the meta data for the data items (entities and attributes) that will be needed for the solution.  It is very important to understand early in the project what the business means by Customer, Order Date, Sale Date, etc.  The DA should be considering the big picture, i.e. the enterprise (global) use of the data, so the same data item won't have different definitions (meta data) and data types used in other data stores.

Typical deliverables in the plan and analyze phases are the Conceptual and Logical Data Models.  These documents are not concerned with what database engine (Teradata, Oracle, MS SQL Server, etc.) the data will be stored on.  Once there is consensus on what data items are needed and how they are related, then the DA may pass the Logical Data Model off to a Data Modeler or Data Analyst to create the Physical Data Model or schema; the DA may do this model, too.

This is generally done in the Design phase of the project even though it is really in the DA's Build and Deploy phases.  This is because there are others who can't do their work until the data store's structure has been completed and published.  At this point, the DA enters a supporting role to those who will be using the data.

Once the Logical data model is completed and agreed to with the business stakeholders, the database engine (probably a RDBMS) should be chosen.  Based on data volume metrics for initial and future storage capacity needed, maximum expected number of concurrent users, batch processing needed, and other usage factors, a decision should be made as to which database engine is the best.  Quite often, however, this decision has already been made because the company has invested money, time, and staff to one database engine already and does not want to invest more into a second database engine even if the existing database engine will have scalability issues.

Database Architect

When the data-centric project gets to the point of choosing what DBMS will be used to store the data, this is the person who guides the selection of a DBMS and architects the infrastructure for it.  Things like how many servers will be needed, how many tiers there will be, how much disk space is needed initially and three years later, how many concurrent users must be supported, and so forth are what the Database Architect is concerned with and will be creating a solution for.

The Database Architect is generally someone who is expert in the chosen DBMS.  At first, there may be several of these people, each representing a competing DBMS provider.  Vendors usually provide these people to design, estimate, and (if chosen) be the company's DBMS solution provider.  Once the DBMS is chosen by the program / project managers and team members and the CIO/CFO approve the selection, then a Database Architect who specializes in that DBMS is needed.

The Database Architect could be the same person who will be the Database Adminstrator (DBA); it would be better, in my opinion, to have a consultant act as the Database Architect because once the database system (DBMS and hardware) is designed and installed, there won't be much need for a Database Architect until the system needs to be upgraded, modified, or expanded.  Generally, the Database Architect should be provided by the vendor of the DBMS since they will have the greatest and most current knowledge of their DBMS.  A DBA should work along side of the Database Architect so there can be knowledge transfer of the infrastructure that the DBMS will reside on and the DBA will be monitoring.

Database Administrator

I included this third role in this dissertation even though it was not my original intent.  It seems that most people understand the operational role of the Database Administrator (DBA), but may not understand the DBA's role in modeling the data store.

The Data Architect (or Modeler or Analyst) will work with the DBA when the Logical Data Model is transformed into the Physical Data Model (schema).  It is very rare that a Utopian Logical Data Model is transformed directly to a Physical Data Model.  What is great in theory (or from a logical business perspective) is not always best in reality.  The DBA should be the one guiding the person doing the Physical Data Model as to what will work best with the chosen DBMS.  The DBA needs to understand the relationships between the entities, constraints and data validation needed, and what attributes will need indexing for SQL performance.  Once the Data Architect has gone on to other projects, the DBA will generally be the first contact for developers and users who need to understand how to get data out of the schema contained within the DBMS.  The DBA may need to consult with the Data Architect, but the users will generally contact the DBA first.

Sometimes, the DBA is also the Physical Data Modeler, but I don't think this is a good approach.  The DBA focuses on the performance and ease of maintenance of the DBMS; the Data Architect / Modeler / Analyst focuses on the schema, data quality, data structure, data relationships, and data use (queries).  These are definitely interrelated aspects, but it is hard for one person to advocate them equally.  What the modeler sees as a best solution may not be what the DBA sees as providing the best performance and least disk space consumption.  It is best when two or more people can provide input for both sides of the issue and come up with a solution that is agreeable to both.