Querying Teradata and SQL Server - Tutorial 2 Filtering and Sorting
- Tutorial –1
- About Filtering and Sorting Data
- Prerequisites
- Filter data by predicates
- Task
- Query – Teradata
- Query – SQL Server
- Result
- Task
- Query – Teradata
- Query – SQL Server
- Result
- Comments
- Task
- SQL – Teradata
- SQL – SQL Server
- Result
- Comments
- Task
- SQL – Teradata
- SQL – SQL Server
- Result
- Comments
- Task
- SQL – Teradata
- SQL – SQL Server
- Result
- Comments
- Sorting data
- Task
- SQL – Teradata
- SQL - SQL Server
- Result
- The TOP Statement
- Task
- SQL – Teradata
- SQL – SQL Server
- Result
- Comments
- Task
- SQL – SQL Server
- Result
- Comments
- Task
- SQL – Teradata
- SQL – SQL Server
- Result
- Comments
- Summary
Tutorial –1
For Teradata developers, if you have no SQL Server installed, please go to the following link to download the SQL Server 2014 Expression Edition.
http://www.microsoft.com/en-us/server-cloud/Products/sql-server-editions/sql-server-express.aspx
It is easy to get started and free to use. Follow the ‘Install instruction’ section in the download page to install.
Download page: http://www.microsoft.com/en-us/download/details.aspx?id=42299
About Filtering and Sorting Data
Filtering and sorting data is very common in querying database. In this tutorial, I will cover the following items:
- Filtering data by predicates
- Sorting data
- The TOP statement
Prerequisites
Querying Teradata and SQL Server - Tutorial 0: Preparation
Filter data by predicates
Task
Retrieve all the employees whose salary is below 120,000 from Employee table; return these columns: (Employee ID, Employee Name, Salary).
Query – Teradata
DATABASE TD_MS_SAMPLE_DB; SELECT EmployeeID, EmployeeName, Salary FROM employee WHERE Salary < 120000;
Query – SQL Server
USE TD_MS_SAMPLE_DB GO SELECT EmployeeID, EmployeeName, Salary FROM employee WHERE Salary < 120000;
Result
Task
Retrieve the following employees from Employee table; return these columns: (Employee ID, Employee Name, Salary).
- Derek
- Nancy
- David
Query – Teradata
DATABASE TD_MS_SAMPLE_DB; SELECT EmployeeID, EmployeeName, Salary FROM employee WHERE EmployeeName =ANY ('Derek','Nancy','David');
Query – SQL Server
USE TD_MS_SAMPLE_DB GO SELECT EmployeeID, EmployeeName, Salary FROM employee WHERE EmployeeName IN ('Derek','Nancy','David'); -- OR SELECT EmployeeID, EmployeeName, Salary FROM employee WHERE EmployeeName = ANY (select 'Derek' union select 'Nancy' union select 'David');
Result
Comments
Both Teradata and SQL Server supports IN, =ANY, NOT = ALL, = SOME operators. Sometimes, using these operators will be easier.
To learn more, visit:
https://technet.microsoft.com/en-us/library/ms187074(v=sql.105).aspx
Task
Retrieve all the employees whose age are above 50 from Employee table ; return these columns: (Employee ID, Employee Name, Age ).
SQL – Teradata
SELECT EmployeeID, EmployeeName, (EXTRACT (YEAR FROM CURRENT_TIMESTAMP) - EXTRACT(YEAR FROM Birthday)) AS Age FROM employee WHERE Age > 50;
SQL – SQL Server
USE TD_MS_SAMPLE_DB GO SELECT EmployeeID, EmployeeName, DATEDIFF(YEAR,Birthday, CURRENT_TIMESTAMP) AS Age FROM employee WHERE DATEDIFF(YEAR,Birthday, CURRENT_TIMESTAMP) > 50;
Result
Comments
In Teradata, once you give the column alias, you can use the alias in SELECT, WHERE, ORDER BY, GROUP BY… In SQL Server, you can only use it in ORDER BY statement due to different execution sequence.
In SQL Server, we write our query in the sequence of SELECT … FROM … WHERE … GROUP BY… HAVING… ORDER BY while the engine will execute in different order: FROM … WHERE … GROUP BY… HAVING… SELECT… ORDER BY.
Task
Retrieve all the employees who were born between ‘1970-01-01’ and ‘1980-01-01’ from Employee table ; return these columns: (Employee ID, Employee Name, Birthday).
SQL – Teradata
SELECT EmployeeID, EmployeeName, Birthday FROM employee WHERE Birthday BETWEEN '1970-01-01' AND '1980-01-01';
SQL – SQL Server
USE TD_MS_SAMPLE_DB GO SELECT EmployeeID, EmployeeName, Birthday FROM employee WHERE Birthday BETWEEN '1970-01-01' AND '1980-01-01';
Result
Comments
When querying character data, it is important to understand whether the column you are filtering on is case sensitive. You can specify case sensitive when querying Teradata.
For example, the following query will not return any records.
SELECT EmployeeID, EmployeeName, Birthday FROM employee WHERE 'Name'='name' (CASESPECIFIC);
In SQL Server, you can specify the collations.
https://technet.microsoft.com/en-us/library/ms180175(v=sql.100).aspx
Task
Retrieve all the employees who have no managers from Employee table ; return these columns: (Employee ID, Employee Name).
SQL – Teradata
SELECT EmployeeID, EmployeeName FROM employee WHERE EmployeeManagerEmployeeID IS NULL ;
SQL – SQL Server
USE TD_MS_SAMPLE_DB GO SELECT EmployeeID, EmployeeName FROM employee WHERE EmployeeManagerEmployeeID IS NULL;
Result
Comments
Always be careful when you deal with NULLs.
Sorting data
Task
Retrieve all the employees from Employee table order by Gendar ASC, EmployeeName DESC; return these columns: (Employee ID, Employee Name).
SQL – Teradata
SELECT EmployeeID ,EmployeeName ,EmployeeManagerEmployeeID ,DepartmentID ,Gendar ,Birthday ,JobGrade ,Salary FROM Employee ORDER BY Gendar, EmployeeName DESC;
SQL - SQL Server
USE TD_MS_SAMPLE_DB GO SELECT [EmployeeID] ,[EmployeeName] ,[EmployeeManagerEmployeeID] ,[DepartmentID] ,[Gendar] ,[Birthday] ,[JobGrade] ,[Salary] FROM [dbo].[Employee] ORDER BY [Gendar], [EmployeeName] DESC
Result
The TOP Statement
Both SQL Server and Teradata support the SELECT TOP statement.
Task
Retrieve the top three employees who have the highest salary from Employee table ; return these columns: (Employee ID, Employee Name, Salary).
SQL – Teradata
SELECT TOP 3 EmployeeID ,EmployeeName ,Salary FROM Employee ORDER BY Salary DESC;
SQL – SQL Server
USE TD_MS_SAMPLE_DB GO SELECT TOP 3 [EmployeeID] ,[EmployeeName] ,[Salary] FROM [dbo].[Employee] ORDER BY [Salary] DESC;
Result
Comments
In Teradata, the TOP function is slightly different.
TOP {decimal | integer} [PERCENT] [WITH TIES]
- Some samples:
- TOP 10 – Return the top ten rows
- TOP 10 PERCENT – Return the top 10% of rows
- TOP 10 WITH TIES – If more that one row has the same criteria value, return all
- TOP 10 PERCENT WITH TIES – If more than one row has the same criteria value, return all
Task
Retrieve the 2nd, 3rd and 4th employees who have the highest salary from Employee table ; return these columns: (Employee ID, Employee Name, Salary).
SQL – SQL Server
USE TD_MS_SAMPLE_DB GO SELECT [EmployeeID] ,[EmployeeName] ,[Salary] FROM [dbo].[Employee] ORDER BY [Salary] DESC OFFSET 1 ROW FETCH NEXT 3 ROWS ONLY;
Result
Comments
After SQL Server 2012, the OFFSET FETECH statement can help you retrieve only a window or page of data from the result set (must be used with ORDER BY clause ).
For the detailed syntax, please refer to https://technet.microsoft.com/en-us/library/gg699618(v=sql.110).aspx.
There is not equivalent OFFSET FETECH statement in Teradata.
Task
Retrieve 3 sample employees from Employee table ; return these columns: (Employee ID, Employee Name, Salary).
SQL – Teradata
SELECT EmployeeID ,EmployeeName ,Salary FROM Employee SAMPLE 3;
SQL – SQL Server
USE TD_MS_SAMPLE_DB GO SELECT [EmployeeID] ,[EmployeeName] ,[Salary] FROM [dbo].[Employee] TABLESAMPLE(3 ROWS);
Result
Comments
Both Teradata and SQL Server supports retrieve sample data from result set (by actual number or percentage).
-- SQL Server USE TD_MS_SAMPLE_DB GO SELECT [EmployeeID] ,[EmployeeName] ,[Salary] FROM [dbo].[Employee] TABLESAMPLE(20 percent);-- Teradata SELECT EmployeeID ,EmployeeName ,Salary ,SAMPLEID FROM Employee SAMPLE 0.2;
The physical implementation of TOP function data in Teradata is not randomized while for SAMPLE you can control whether to return the sample data by AMP proportionally.
* About AMP:
The AMP is a vproc in the Teradata Database's shared-nothing architecture that is responsible for managing a portion of the database. Each AMP will manage some portion of each table on the system. AMPs do the physical work associated with generating an answer set which includes sorting, aggregating, formatting, and converting. The AMPs retrieve and perform all database management functions on the required rows from a table. An AMP accesses data from its single associated vdisk. AMPs is also able to redistribute a copy of the data to other AMPs.
Summary
I didn’t cover all the details for filtering and sorting data since I am only focusing on the most commonly used ones. You can learn all the others through the official tutorials.
In next tutorial, I will talk about how to combine sets in SQL Server and Teradata.
Due to limited knowledge, there might be errors/issues in my tutorials; please let me know if any information is inaccurate.