Tuesday, September 14, 2010

Should a data architect be more adept on the logical or the physical model? What are important traits of a data architect?

I recently replied to a LinkedIn group discussion asking the question in this post's title. 

For the complete discussion, use the following link.
http://www.linkedin.com/groupAnswers?viewQuestionAndAnswers=&discussionID=2551716&gid=108904&trk=EML_anet_qa_ttle-0Nt79xs2RVr6JBpnsJt7dBpSBA

My Answer:

To use a house analogy, would you want to hire an architect to design your house who had no idea what was needed to actually build it at the "physical" level or what the zoning and building requirements were for the location you chose?  You wouldn't?  Then why would you expect a data architect NOT to understand what is needed at the physical level?  That is not saying that a data architect has to have experience as a DBA or database engineer, but, since the physical data model and implementation is the ultimate goal, a data architect should understand what is needed for the physical design.

Depending on the organization a data architect is working at, the actual tasks given to the data architect may vary.  My list below shows what I think are the duties of a data architect in their order of importance.  I will not go into in-depth explanations for each step.  I know of organizations that limit a data architect's role to items 1 and 2, while others include 1 through 4, and some include all of them.
  1. Understand the data requirements.  Get agreement from client that data requirements are complete and understood.
  2. Create Conceptual Data Model that documents the data entities needed and how they relate to each other.  Confirm with the client that the entities are complete and fulfill the high level data requirements
  3. Create Logical Data Model that documents the attributes assigned to fully describe each entity.  Confirm with the client that sufficient detail is captured for each entity.  Also, plan for future growth and needs by showing how new data types can be added without requiring a new design (using super-type, sub-type, lookup tables, etc).
  4. Provide traceability matrix to show how each data requirement has been met in conceptual and logical data models.
  5. Determine and document volume and user constraints.  Determine initial load volumes and future growth rates.  Determine initial user load: total users, maximum (peak) users, how connected, administrative users, power users, limited access users, ....  Determine future user growth rates.  These are important for physical database selection.
  6. Determine suitable database engine needed.  This task should not be done until steps 1 through 5 have been done, but quite often, the client has already decided what the database engine will be.  Results from Step 5 will then be useful to forecast the database engine's future scalability.  Clients should at least know what the risks are based on their choice of database engine.
  7. Create Physical Data Model.  Working with the DBA and/or other experts for the chosen database engine, transform the Logical Data Model into a physical data model.  It is unusual that a Logical Data Model can be converted directly into a "best fit" physical data model.
Other duties of a data architect may include:
  • Enterprise standards: naming, modeling, architecture, data warehouse, more.
  • Data profiling.
  • Gap analysis.
  • Data dictionary.
  • Data governance.
  • ETL design
  • Source-to-target data mapping.
If you look at recent postings for Data Architect positions, you will see many of them requiring expertise with the already chosen target database engine, reporting / BI tools, CASE modeling tools, etc.  But I do not agree that a data architect's duties and requirements include these tools, but rather knowing what needs to be done using these tools.  Tools can be learned and are an aid to doing what needs to be done.

I have elaborated on the duties of a data architect on my blog site.

Data Architect, Duties of a
http://itpro420.blogspot.com/2010/09/data-architect-duties-of.html
Data Architect Job Description
http://itpro420.blogspot.com/2010/02/data-architect-job-description.html
Data Architect versus Database Architect
http://itpro420.blogspot.com/2010/01/data-architect-versus-database.html

Thursday, September 2, 2010

Teradata 13 Express Install

For those of you wanting to install Teradata on your own PC, Teradata supplies a free "Express" version.  Go to this site to request to download it: http://www.teradata.com/downloadcenter/

You will have to register with Teradata and then request the Teradata Express by clicking the "Order Now" link in the "Teradata Express — Teradata Software for Evaluation and Development" section. You'll have to look carefully for it.

I installed version 13 on my home PC which has Windows Vista Home Premium for the OS.  I had several problems which I'll detail here in case you run into these, too.

I used the "silent" install method at first.  Be sure to read ALL the text with each installation step; don't assume the "usual" steps of just accepting defaults and going on.  The silent install method is the default and it will merrily try to install everything, but won't give you any error messages if anything fails.  So at the end, you will be told that the installation has completed, but it may not have been successful.

Thinking I had a successful installation, I followed the instructions to verify that from the user guide.  From the "Teradata Service Control" I clicked the "Start Teradata" menu item.  When it completed, I was told "Teradata is Running" in the lower left corner.  When I clicked the "File" menu to execute "Session info", it was grayed out.

HOSTS file edited
Reading the installation documentation, I found that the HOSTS file is modified by Teradata.  I looked at my HOSTS file (which is a custom one that blocks Internet advertisements and other unwanted sites) and found that the Teradata install program had botched the addition of the following line:
127.0.0.1  localtdcop1

Make sure that line is on a line all by itself.  Also make sure that "localtdcop1" has not been added to the end of every line in the HOSTS file that begins with 127.0.0.1!

ODBC Driver
I also discovered that the Teradata ODBC driver had not been installed.  I tried to reinstall it, this time not using the "silent" install method.  This time I got an error message: "Error 2738.Could not access VBScript runtime for custom action."

The resolution for this error is to make sure VBScript is registered.  Do this by following these steps:

  1. Run Command Prompt as administrator:


    1. Start Menu -> All Programs -> Accessories
    2. Right click on Command Prompt and select Run as administrator


  2. Run the following command:


    • If you are using 32-bit version of Windows:


      • Type cd %windir%\system32 into the Command Prompt and hit Enter


    • If you are using 64-bit version of Windows:


      • Type cd %windir%\syswow64 into the Command Prompt and hit Enter


  3. Type regsvr32 vbscript.dll into the Command Prompt and hit Enter.
Now you will be able to install the Teradata ODBC driver.

Firewall
But I'm still having problems.  Even though I can now start Teradata and the "Session Info" menu option is available, clicking it starts a dialog box that tells me to  wait while sessions are retrieved.  After 12 minutes, I cancelled it.

I opened "Teradata Administrator" and tried to connect as user tdadmin (created by default).  I got the error message "ODBC HY000: WSA E HostUnreach: The Teradata server can't currently be reached over this network."

The fix for this was to turn off all firewalls.  I had both the Windows Firewall and Avast's firewall running.  Since my PC is behind a cable router that has its own firewall, turning both of these off poses no threat.

SQL Assistant
I found that the Teradata SQL Assistant did not get installed.  Using the "not silent" install mode, I discovered that this requires the .NET Framework version 2, SP 2.  I tried to install this again from the "Other Software" install option, choosing the "not silent" install mode, and discovered that this would not install on Vista.  I was unable to find a version of .NET Framework version 2 that would install on Vista.

There is a "work around" to this.  Teradata has a Java version of SQL Assistant that will work on Vista.  You need to go to the Teradata download site to get it.  In my case, for Vista on 32-bit platform, I went to http://downloads.teradata.com/download/tools/sql-assistant-java-edition-13-01-for-windows-32-bit

Find all the versions of the "SQL Assistant Java Edition 13.01" at http://downloads.teradata.com/download/tools

Data Architect, Duties of a

I have read many job postings seeking a Data Architect.  Most had a few data architect duties included, but specified some duties of many data-related positions not normally considered as part of the data architect role; seems like they were looking for one person to do the work of multiple specialists.  It seems to me that will get a company a "Jack/Jill of all trades, master of none."

While most Data Architects will have had experience in multiple data-centric roles doing multiple data-related tasks, few (none?) of us will have kept up-to-date skills in many of those skills.  While understanding the roles of those other data specialists we will be working with is crucial, we have moved our focus to a much higher elevation.  We should be able to mentor and assist other data specialists in getting their assigned duties accomplished, but we do not do the actual work involved (except in a critical, unplanned, resource shortage).

I have gleaned the following list of duties from dozens of different position position postings for Data Architects from the Internet.  I have included only those I feel are germane to the postion.
  • Data Architect to assist with data mappings, data modeling, data profiling, query design, data flow design, etc between multiple databases across multiples platforms. Data Architect will be support and contribute to DI/ETL architecture and act as data SME (subject matter expert) with respect to data profiling assessments and participate in JAD session/interviews.
  • Perform data profiling and analysis to assess data quality/patterns, recommend data cleansing rules, conforming data standardardzing rules, and matching algorithm changes to [business] and data bus architecture (dimensions/facts)
  • Database query Performance and Tuning expertise (P&T) on SQL statements.
  • ETL Workflow and data mapping analysis, review, and tuning.
  • Strong design artifact experience in documentation of data flow and data mapping and data exchange specifications.
  • ETL development experience and technical adeptness at reviewing and initial prototyping tasks.
  • Has solid project lifecycle, database design experience, and complimentary Business Analyst skills.
    (Although, I limit the BA skills to those needed to define business data items needed for database structure.  I work with the BA to get these.)
  • Responsible for the creation of the following work deliverables:

    • Estimate for resources needed, effort needed, and target milestone dates.
    • Data requirements documents, including traceability.
    • Weekly status reports from my team
    • Compiled weekly status report to project manager for team's work.
    • Conceptual data model, including metadata at the entity level
    • Logical data model, including metadata at the entity and attribute level
    • Data Profiling/Gap Analysis
    • Data Dictionary
    • Data Flow Diagram
    • Physical data model, including metadata at the table, column, and view level
    • Volume metrics (expected initial load and 3-year and 5-year forecasts)
    • DDL Scripts (generated from modeling tool such as PowerDesigner or Erwin)
    • ETL Data Mapping (ultimate source to ultimate target, not every hop along the way)

  • Primary Responsibilities:

    • Building conceptual, logical and physical database designs
    • Participating in data development and analysis, and data dictionary management
    • Ensuring data consistency and accuracy across all systems
    • Collaborating with business analysts, ETL developers and other key internal personnel on a wide scope of data architecture functions