Showing posts with label Teradata. Show all posts
Showing posts with label Teradata. Show all posts

Tuesday, November 30, 2010

LPAD and RPAD for Teradata

Teradata does not come with an LPAD() or RPAD() function. If you can create a UDF for them, that would be best, but if you can't, here are some workarounds.

RPAD a string with spaces

-- replace 10 in char(10) below with desired column length
-- single ticks put around result to show spaces were actually added to right

select
  'abc' as "SampleString",
  '''' || cast(SampleString as char(10)) || '''' "rpad_spaces";

RPAD a string with any character

-- make string of characters (in this example '0000') to match desired length
-- leading space in column is preserved
select 
  ' 1' AS "SampleString",
  SampleString || Substring('0000' From 1 For Chars(SampleString)) AS "rpad_zero";

-- same as above, but using ANSI SUBSTR
select 
  ' 1' AS "SampleString",
  SampleString || SUBSTR('0000',Chars(SampleString)) AS "rpad_zero";

LPAD a string with any character

-- replace 0 with desired character
-- make string of characters (in this example '0000') to match desired length
-- leading space in column is preserved
select 
  ' 1' AS "SampleString",
  Substring('0000' From 1 For Chars(SampleString)) || SampleString "lpad_zero";

-- same as above, but using ANSI SUBSTR
SELECT 
  '19' as "SampleText",
  SUBSTR('00000'||SampleText,CHARACTERS(SampleText)+1) "LPAD(SampleText,5,'0')";

-- Showing how trailing spaces from a CHAR column can be dropped 
select
  '0000123  ' as "SampleText",
  CAST(CAST(SampleText AS integer FORMAT '-9(10)') AS CHAR(10)) "LPAD(SampleText,10,'0')";


Wednesday, November 17, 2010

How to work with bad date formats in Teradata

A question was posted on ITtoolbox's Teradata forum concerning data loading from a file that contained character strings for dates that used single digit day and month formats like 4/1/2010. Teradata does not support dates in this format: M/Y/YYYY; it only supports two-digit month and day formats like MM/DD/YYYY. Therefore my reply was a sample SQL statement that would work for the date conversion, but the actual load from the data file would need to do this in the "wrapper' code.

The SQL in Teradata format is:

SELECT
    '4/1/2010' AS "BAD_DATE",
    CASE
        WHEN INDEX(BAD_DATE,'/') = 2
          THEN '0' || SUBSTRING(BAD_DATE FROM 1)
          ELSE BAD_DATE
    END AS "MONTH_OK",
    CASE
        WHEN INDEX(SUBSTRING(MONTH_OK FROM 4),'/') = 2
          THEN SUBSTRING(MONTH_OK FROM 1 FOR 3) || '0' || 
            SUBSTRING(MONTH_OK FROM 4)
          ELSE BAD_DATE
    END AS "GOOD_DATE"
-- don't need a FROM clause to do this
; 

Friday, October 29, 2010

Database Management Systems Best Uses

As a data architect, I am database agnostic until reaching the physical implementation. During the conceptual and logical modeling phases, I will have gathered metrics that will help predict the scalability needed for the physical implementation.
In the perfect scenario, these metrics will be used to determine the best database management system (DBMS) to use. Many times, however, the target DBMS has already been chosen; in this case, these metrics will provide the basis for scalability and risk assessment. These metrics include:
  • Number of entities (tables)
  • Number of attributes (columns)
  • Volume metrics
    • Initial
      • Table size (row count and bytes)
      • Connection Numbers
        • Total
        • Average concurrent
        • Maximum concurrent
        • Application / system users
        • Power users
    • Growth rates for above: 1 year, 3 years
The data architect may or may not be involved in the physical implementation process, depending on the company's organizational hierarchy. Assuming that I will be involved with this, I don't need to be expert in all the available DBMS, but I should have some knowledge and understanding of them based on past experience. But I won't be able to keep up with all the DBMS out there, especially the niche systems. Some open source systems are also available and used.  A data architect needs to work closely with the DBMS subject matter experts (SME).  The most important SME to me are those who have actual experience using and supporting a DBMS and the type of data store being used on that DBMS.

So we all might gain understanding of what is available and the best DBMS uses, what is your experience? Consider the following points and share what DBMS was used and how it met, or failed to meet, expectations.
  • Purpose of database (EDW, DW, Data mart, ODS, reporting, OLAP, OLTP, etc.):
  • Database structure (Normalized, Dimensional, Snow Flake, etc.):
  • Database size in TB:
  • Number of tables in database:
  • Number of columns in database:
  • Total number of users:
  • Total number of power users:
  • Maximum number of connections:
  • Average number of connections:
  • DBMS or RDBMS used:
  • Expectations met or not for:
    • Database growth in TB?
    • Query performance?
    • Performance tuning?
    • Connectivity?
    • Database maintenance (DBA)?
    • Other comments:

My experiences:

  • Purpose of database: Enterprise Data Warehouse (EDW), Single source of truth; was source for other data marts, reporting data stores, OLAP.
  • Database structure: Normalized, 3NF
  • Database size in TB: Greater than 100 TB
  • Number of tables in database: More than 1000 tables; more than 3000 views.
  • Number of columns in database: More than 50,000
  • Total number of users: More than 1000
  • Total number of power users: More than 100
  • Maximum number of connections: More than 500
  • Average number of connections: More than 100
  • DBMS or RDBMS used: Teradata
  • Expectations met or not for:
    • Database growth in TB? Yes. Additional disk space seemed fairly easy to add. Adding more servers (AMPS) increased disk space, provided more CPU processing, and more user connectivity.
    • Query performance? Very good.
    • Performance tuning? Performance dependent on Primary Indexes (distribution), partitioning, secondary indexes, SQL tuning, statistics collected.
    • Connectivity? Batch programs, user direct, and user web-based.
    • Database maintenance (DBA)? Faster to add or drop columns to/from tables by creating new table and copying data from old table to new table than using ALTER TABLE commands. Obviously, not all steps to do this are itemized here.
    • Other comments: By far, the best RDBMS I have experienced for VLDB. Most everything was done in massively parallel processing. Data marts, reporting data stores, etc. were derived from physical database via views to create logical data stores. Response times were excellent; in those few cases where multi-table joins could not be improved, a physical table was created and loaded via batch processing.

Thursday, September 2, 2010

Teradata 13 Express Install

For those of you wanting to install Teradata on your own PC, Teradata supplies a free "Express" version.  Go to this site to request to download it: http://www.teradata.com/downloadcenter/

You will have to register with Teradata and then request the Teradata Express by clicking the "Order Now" link in the "Teradata Express — Teradata Software for Evaluation and Development" section. You'll have to look carefully for it.

I installed version 13 on my home PC which has Windows Vista Home Premium for the OS.  I had several problems which I'll detail here in case you run into these, too.

I used the "silent" install method at first.  Be sure to read ALL the text with each installation step; don't assume the "usual" steps of just accepting defaults and going on.  The silent install method is the default and it will merrily try to install everything, but won't give you any error messages if anything fails.  So at the end, you will be told that the installation has completed, but it may not have been successful.

Thinking I had a successful installation, I followed the instructions to verify that from the user guide.  From the "Teradata Service Control" I clicked the "Start Teradata" menu item.  When it completed, I was told "Teradata is Running" in the lower left corner.  When I clicked the "File" menu to execute "Session info", it was grayed out.

HOSTS file edited
Reading the installation documentation, I found that the HOSTS file is modified by Teradata.  I looked at my HOSTS file (which is a custom one that blocks Internet advertisements and other unwanted sites) and found that the Teradata install program had botched the addition of the following line:
127.0.0.1  localtdcop1

Make sure that line is on a line all by itself.  Also make sure that "localtdcop1" has not been added to the end of every line in the HOSTS file that begins with 127.0.0.1!

ODBC Driver
I also discovered that the Teradata ODBC driver had not been installed.  I tried to reinstall it, this time not using the "silent" install method.  This time I got an error message: "Error 2738.Could not access VBScript runtime for custom action."

The resolution for this error is to make sure VBScript is registered.  Do this by following these steps:

  1. Run Command Prompt as administrator:


    1. Start Menu -> All Programs -> Accessories
    2. Right click on Command Prompt and select Run as administrator


  2. Run the following command:


    • If you are using 32-bit version of Windows:


      • Type cd %windir%\system32 into the Command Prompt and hit Enter


    • If you are using 64-bit version of Windows:


      • Type cd %windir%\syswow64 into the Command Prompt and hit Enter


  3. Type regsvr32 vbscript.dll into the Command Prompt and hit Enter.
Now you will be able to install the Teradata ODBC driver.

Firewall
But I'm still having problems.  Even though I can now start Teradata and the "Session Info" menu option is available, clicking it starts a dialog box that tells me to  wait while sessions are retrieved.  After 12 minutes, I cancelled it.

I opened "Teradata Administrator" and tried to connect as user tdadmin (created by default).  I got the error message "ODBC HY000: WSA E HostUnreach: The Teradata server can't currently be reached over this network."

The fix for this was to turn off all firewalls.  I had both the Windows Firewall and Avast's firewall running.  Since my PC is behind a cable router that has its own firewall, turning both of these off poses no threat.

SQL Assistant
I found that the Teradata SQL Assistant did not get installed.  Using the "not silent" install mode, I discovered that this requires the .NET Framework version 2, SP 2.  I tried to install this again from the "Other Software" install option, choosing the "not silent" install mode, and discovered that this would not install on Vista.  I was unable to find a version of .NET Framework version 2 that would install on Vista.

There is a "work around" to this.  Teradata has a Java version of SQL Assistant that will work on Vista.  You need to go to the Teradata download site to get it.  In my case, for Vista on 32-bit platform, I went to http://downloads.teradata.com/download/tools/sql-assistant-java-edition-13-01-for-windows-32-bit

Find all the versions of the "SQL Assistant Java Edition 13.01" at http://downloads.teradata.com/download/tools

Monday, July 12, 2010

Calculate beginning and ending day of month in Teradata

Teradata has many date and time functions, but sometimes they are not too intuitive. If you need to calculate the first and last date for the current month as part of a batch program, try this SQL:

SELECT 
  MIN(calendar_date) BOM,
  MAX(calendar_date) EOM 
FROM
  sys_calendar.calendar
WHERE (month_of_year , year_of_calendar) IN 
  (SELECT month_of_year, year_of_calendar
   FROM sys_calendar.calendar
   WHERE calendar_date = DATE)

Friday, July 9, 2010

Teradata optimizer anomaly to beware of

I’ve been working with Teradata for more than four years and I just “discovered” this anomaly which was a problem in my SQL scripts that Teradata told me would return 2 quadrillion rows and take 6500 hours to complete.

If you include a column in the SELECT statement that does not come from a database.table/view in the FROM statement, Teradata merrily does a Cartesian join on the database.table/view.

Try this:

SELECT COUNT( <database>.<table or view>.<column>)

Teradata doesn’t seem to care that there is a missing FROM clause and reads every row from <database>.<table or view>.

If the <database>.<table or view>.<column> is part of a larger SQL that has joins to one or more tables and this particular table is not in the FROM clause, Teradata makes a Cartesian join to every other table in the SQL for every row in the "missing" <database>.<table or view>.

My problem was the result of a copy and paste operation from a query that worked, but I didn't catch the change needed for the new query.