Saturday, 16 February 2013

The SQL Server DBA’s Guide to Teradata



Being a long-time devotee of all things Microsoft SQL Server, I had quite a bit of learning to do to get up to speed on our new Teradata appliance. This guide attempts to distill some of what I have learned. I will primarily focus on what I have found most lacking: examples of how to convert SQL Server T-SQL to Teradata SQL. This guide will use the vendor-supplied sample databases: SQL Server’s AdventureWorksDW2012, and the Teradata’s Express 14.0 Retail database.
Also, while this is probably self explanatory, the “mufford” object referenced in the Teradata examples is simply my own personal sandbox. You’ll need to replace any “mufford” reference to whatever your development sandbox may be named.
Release History:
  • 2012-08-17 – Added UPDATE examples; reformatted
  • 2012-08-12 – Added date examples
  • 2012-08-08 – Initial Release
SELECT Statements
Simple SELECT
SQL Server:
Best practice is to qualify the table schema; qualifying the database is optional unless you plan to perform cross-database queries.
USE AdventureWorksDW2012;
GO
 
SELECT TOP 10 ProductKey
  , EnglishProductName 
FROM dbo.DimProduct;
-- OR
SELECT TOP 10 ProductKey
  , EnglishProductName 
FROM AdventureWorksDW2012.dbo.DimProduct;
Teradata:
Teradata does not support the concept of schemas, so you would select directly from a database and table.
SELECT TOP 10 P_PartKey
  , P_Name 
FROM retail.Product;
UPDATE Statements
Update From a Subquery
SQL Server:
To help demonstrate a more complex UPDATE, I am going to create a temporary table, insert into the temporary table, and then update that same table. Please don’t get caught up in the data so much as the overall syntax.
CREATE TABLE #UpdateTest_firstOrder
(
      ProductKey            INT
    , FirstOrderNumber      NVARCHAR(20)    NULL
);
 
INSERT INTO #UpdateTest_firstOrder(ProductKey)
SELECT DISTINCT ProductKey
FROM dbo.DimProduct;
 
UPDATE ufo
SET FirstOrderNumber = COALESCE(x.FirstOrderNumber, '-1')
FROM #UpdateTest_firstOrder ufo
JOIN 
    (
    SELECT ProductKey
        , MIN(SalesOrderNumber) AS FirstOrderNumber
    FROM dbo.FactInternetSales
    GROUP BY ProductKey
) AS x
ON ufo.ProductKey = x.ProductKey;
Teradata:
We’re going to do the same thing in Teradata: create a temporary (volatile) table, populate it, then perform an update. However, there are 2 different ways to write the UPDATE statement. Both are shown here.
CREATE VOLATILE TABLE UpdateTest_firstOrder
(
      L_PartKey          INTEGER
    , FirstOrderKey     INTEGER    NULL
)
PRIMARY INDEX
(
    L_PartKey
)
ON COMMIT PRESERVE ROWS;
 
INSERT INTO UpdateTest_firstOrder(L_PartKey)
SELECT DISTINCT P_PartKey
FROM retail.Product;
 
-- Option 1
UPDATE UpdateTest_firstOrder
FROM (
    SELECT L_PartKey
            , MIN(L_OrderKey) AS FirstOrderKey
        FROM retail.Item
        GROUP BY L_PartKey
    ) AS x
SET FirstOrderKey = COALESCE(x.FirstOrderKey, -1)
WHERE UpdateTest_firstOrder.L_PartKey = x.L_PartKey;
 
-- Option 2
UPDATE UpdateTest_firstOrder
SET FirstOrderKey = FirstOrderKey
WHERE EXISTS(
    SELECT L_PartKey
        , MIN(L_OrderKey) AS firstOrderID
    FROM retail.Item
    WHERE UpdateTest_firstOrder.L_PartKey = retail.Item.L_PartKey
    GROUP BY L_PartKey
);
DELETE Statements
Delete Top X
SQL Server:
This is pretty straight-forward in SQL Server.
DELETE TOP (1) 
FROM dbo.DimEmployee 
WHERE BirthDate < '2012-01-01';
Teradata:
Because Teradata is intended to be a data warehouse, not an OLTP system, this is a little more involved. We need to first create a temporary table that contains the rows we wish to remove, then perform the DELETE using the IN clause.
CREATE TABLE mufford.ToBeDeleted AS
(  
    SELECT TOP 1 EmpNo 
    FROM retail.Employee 
    WHERE DOB < DATE'2012-01-01'
)
WITH DATA;
 
DELETE FROM retail.Employee
WHERE EmpNo IN (SELECT EmpNo FROM mufford.ToBeDeleted);
Delete where a relationship exists
SQL Server:
DELETE fis
FROM dbo.FactInternetSales AS fis
JOIN dbo.DimSalesTerritory AS dst
    ON fis.SalesTerritoryKey = dst.SalesTerritoryKey
WHERE dst.SalesTerritoryCountry <> 'United States';
Teradata:
Again, you can do this by using a subquery.
DELETE FROM retail.Item
WHERE L_PartKey IN 
(
    SELECT P_PartKey
    FROM retail.Product
    WHERE P_Mfgr = 'Manufacturer#1'
);
Delete where a relationship does not exist:
SQL Server:
DELETE fis
FROM dbo.DimProduct AS dp
LEFT JOIN dbo.FactInternetSales AS fis
    ON dp.ProductKey = fis.ProductKey
WHERE fis.SalesOrderNumber IS NULL;
Teradata:
Once more, we’ll use a subquery to perform our delete.
DELETE FROM retail.Item
WHERE NOT EXISTS(
    SELECT P_PartKey
    FROM retail.Product
    WHERE retail.Item.L_PartKey = retail.Product.P_PartKey
);
Temporary / Volatile Tables
Creating a simple temporary table
SQL Server:
The simple way:
SELECT LastName, FirstName
INTO #MyTempTable
FROM dbo.DimEmployee
WHERE DepartmentName = 'Engineering';
The verbose, “best practice” way:
CREATE TABLE #MyTempTable
(
      LastName  NVARCHAR(50)
    , FirstName NVARCHAR(50)
);
 
INSERT INTO #MyTempTable
(
      LastName
    , FirstName
)
SELECT LastName
    , FirstName
FROM dbo.DimEmployee
WHERE DepartmentName = 'Engineering';
Teradata:
Teradata’s equivalent of temporary tables is volatile tables.
CREATE VOLATILE TABLE MyTempTable
AS (
    SELECT Name
    FROM retail.Employee
    WHERE DeptNo = 1420
    ) 
WITH DATA ON COMMIT PRESERVE ROWS;
Using Dates
Find Within Last X Days
SQL Server:
SELECT SalesOrderNumber
FROM dbo.FactInternetSales
WHERE OrderDate >= DATEADD(day, -30, GETDATE());
Teradata:
SELECT L_OrderKey
FROM retail.Item
WHERE l_shipdate >= CURRENT_DATE - INTERVAL '30' DAY;
Calculate the Number of Days Between 2 Dates
SQL Server:
SELECT TOP 10 OrderDate
    , ShipDate
    , DATEDIFF(day, OrderDate, ShipDate) AS 'daysLapsed'
FROM dbo.FactInternetSales;
-- OR
SELECT TOP 10 OrderDate
    , ShipDate
    , CAST(ShipDate - OrderDate AS INT) AS 'daysLapsed'
FROM dbo.FactInternetSales;
Teradata:
SELECT L_ShipDate
    , L_ReceiptDate
    , L_ReceiptDate - L_ShipDate AS "daysLapsed"
FROM retail.Item;
Using dates in variables
SQL Server:
CREATE PROCEDURE spReturnOrdersBetweenDaysExample
(
        @StartDate  DATETIME
      , @EndDate    DATETIME
)
AS
    SELECT *
    FROM dbo.FactInternetSales
    WHERE OrderDate >= @StartDate
        AND OrderDate < @EndDate;
 
EXECUTE spReturnOrdersBetweenDaysExample '2005-07-01', '2005-08-01';
Teradata:
We’ll use a macro instead of a stored procedure for returning this same type of data in Teradata.
CREATE MACRO mufford.mReturnOrdersBetweenDays
(
        startDate  DATE 
      , endDate   DATE 
)
AS
(
    SELECT *
    FROM retail.Item
    WHERE L_ShipDate >= :startDate
        AND L_ShipDate < :endDate;
);
 
EXECUTE mufford.mReturnOrdersBetweenDays('1993-09-01','1993-10-01');
Note: if you get a “An owner referenced by user does not have SELECT WITH GRANT OPTION” error when you try to execute this, make sure you run this:
GRANT SELECT ON retail.Item TO mufford WITH GRANT OPTION;
Performing date calculations using variables
SQL Server:
CREATE PROCEDURE spReturnOrdersInXDaysExample
(
      @StartDate    DATETIME
    , @numberOfDays INT
)
AS
    SELECT *
    FROM dbo.FactInternetSales
    WHERE OrderDate >= @StartDate
        AND OrderDate < DATEADD(day, @numberOfDays, @StartDate);
 
EXECUTE spReturnOrdersInXDaysExample '2005-07-01', 30;
Teradata:
We’ll use a macro instead of a stored procedure for returning this same type of data in Teradata.
Teradata:
CREATE MACRO mufford.mReturnOrdersInXDaysExample
(
      startDate DATE DEFAULT CURRENT_DATE
    , numberOfDays INTEGER DEFAULT 30
)
AS
(
    SELECT *
    FROM retail.Item
    WHERE L_ShipDate >= :startDate
        AND L_ShipDate < :startDate + CAST(:numberOfDays AS INTERVAL DAY);
);
 
EXECUTE mufford.mReturnOrdersInXDaysExample('1993-09-01', 30);
Miscellaneous
NULL Handling
SQL Server:
SELECT COALESCE(StandardCost, 0) AS [StandardCost]
FROM dbo.DimProduct;
-- OR
SELECT ISNULL(StandardCost, 0) AS [StandardCost]
FROM dbo.DimProduct;
Teradata:
SELECT COALESCE(P_RetailPrice, 0) AS "RetailPrice"
FROM retail.Product;

No comments:

Post a Comment