Tuesday, November 30, 2010

LPAD and RPAD for Teradata

Teradata does not come with an LPAD() or RPAD() function. If you can create a UDF for them, that would be best, but if you can't, here are some workarounds.

RPAD a string with spaces

-- replace 10 in char(10) below with desired column length
-- single ticks put around result to show spaces were actually added to right

select
  'abc' as "SampleString",
  '''' || cast(SampleString as char(10)) || '''' "rpad_spaces";

RPAD a string with any character

-- make string of characters (in this example '0000') to match desired length
-- leading space in column is preserved
select 
  ' 1' AS "SampleString",
  SampleString || Substring('0000' From 1 For Chars(SampleString)) AS "rpad_zero";

-- same as above, but using ANSI SUBSTR
select 
  ' 1' AS "SampleString",
  SampleString || SUBSTR('0000',Chars(SampleString)) AS "rpad_zero";

LPAD a string with any character

-- replace 0 with desired character
-- make string of characters (in this example '0000') to match desired length
-- leading space in column is preserved
select 
  ' 1' AS "SampleString",
  Substring('0000' From 1 For Chars(SampleString)) || SampleString "lpad_zero";

-- same as above, but using ANSI SUBSTR
SELECT 
  '19' as "SampleText",
  SUBSTR('00000'||SampleText,CHARACTERS(SampleText)+1) "LPAD(SampleText,5,'0')";

-- Showing how trailing spaces from a CHAR column can be dropped 
select
  '0000123  ' as "SampleText",
  CAST(CAST(SampleText AS integer FORMAT '-9(10)') AS CHAR(10)) "LPAD(SampleText,10,'0')";


Monday, November 29, 2010

STAR schema and transactions

I recently replied to a question in ITtoolbox's Data Warehouse forum regarding star schema data marts.
The original post had several questions that others answered, but one question had not been addressed, so that was the basis for my reply.

"Is it sound industry practice to use a star schema to hold individual records rather than totals?"

In order to create a fact table that aggregates transaction data, there must be a dimension table that contains the transaction detail. The granularity of that dimension table needs to be at the level needed to perform the drill down or slice and dice necessary. For some industries, such as cell phone carriers, the granularity might need to be down to the millisecond level. For others, daily granularity might be OK.
These transactional dimension tables also need to be updated, so you must determine what type of slowly changing dimension (SCD) table you need. (The "slowly" part of the name was a bad choice of words when transaction changes can occur by the second.) There are three types: SCD1, SCD2, and SCD3. Most common is SCD2, next is SCD1, and rarely used is SCD3.
A time dimension table is critical for any data warehouse / data mart. The granularity of the time dimension needs to be at a level at least as fine as the dimensional transaction table is at.
For reporting purposes, I usually suggest using a fact table or view to be the source for the reporting engine rather than have the reporting engine do all the aggregating. This way, the algorithms needed to do the aggregations will be consistent. If these are left to the reporting engine(s) and true ad-hoc queries, the results might be different and there will be lots of complaints and troubleshooting to determine why the results are not consistent.
Whether a table or view is used depends on how frequently the report needs to updated and the length of time needed to return results.

Wednesday, November 17, 2010

How to work with bad date formats in Teradata

A question was posted on ITtoolbox's Teradata forum concerning data loading from a file that contained character strings for dates that used single digit day and month formats like 4/1/2010. Teradata does not support dates in this format: M/Y/YYYY; it only supports two-digit month and day formats like MM/DD/YYYY. Therefore my reply was a sample SQL statement that would work for the date conversion, but the actual load from the data file would need to do this in the "wrapper' code.

The SQL in Teradata format is:

SELECT
    '4/1/2010' AS "BAD_DATE",
    CASE
        WHEN INDEX(BAD_DATE,'/') = 2
          THEN '0' || SUBSTRING(BAD_DATE FROM 1)
          ELSE BAD_DATE
    END AS "MONTH_OK",
    CASE
        WHEN INDEX(SUBSTRING(MONTH_OK FROM 4),'/') = 2
          THEN SUBSTRING(MONTH_OK FROM 1 FOR 3) || '0' || 
            SUBSTRING(MONTH_OK FROM 4)
          ELSE BAD_DATE
    END AS "GOOD_DATE"
-- don't need a FROM clause to do this
;