Thursday, 15 August 2013

Stored Procedure Optimization Tips – Best Practices

  • Include SET NOCOUNT ON statement: With every SELECT and DML statement, the SQL server returns a message that indicates the number of affected rows by that statement. This information is mostly helpful in debugging the code, but it is useless after that. By setting SET NOCOUNT ON, we can disable the feature of returning this extra information. For stored procedures that contain several statements or contain Transact-SQL loops, setting SET NOCOUNT to ON can provide a significant performance boost because network traffic is greatly reduced.
CREATE PROC dbo.ProcName AS SET NOCOUNT ON;--Procedure code hereSELECT column1 FROM dbo.TblTable1 -- Reset SET NOCOUNT to OFFSET NOCOUNT OFF;GO
  • Use schema name with object name: The object name is qualified if used with schema name. Schema name should be used with the stored procedure name and with all objects referenced inside the stored procedure. This help in directly finding the complied plan instead of searching the objects in other possible schema before finally deciding to use a cached plan, if available. This process of searching and deciding a schema for an object leads to COMPILE lock on stored procedure and decreases the stored procedure’s performance. Therefore, always refer the objects with qualified name in the stored procedure like
SELECT * FROM dbo.MyTable -- Preferred method -- Instead ofSELECT * FROM MyTable -- Avoid this method --And finally call the stored procedure with qualified name like:EXEC dbo.MyProc -- Preferred method --Instead ofEXEC MyProc -- Avoid this method
  • Do not use the prefix “sp_” in the stored procedure name: If a stored procedure name begins with “SP_,” then SQL server first searches in the master database and then in the current session database. Searching in the master database causes extra overhead and even a wrong result if another stored procedure with the same name is found in master database.
  • Use IF EXISTS (SELECT 1) instead of (SELECT *): To check the existence of a record in another table, we uses the IF EXISTS clause. The IF EXISTS clause returns True if any value is returned from an internal statement, either a single value “1” or all columns of a record or complete recordset. The output of the internal statement is not used. Hence, to minimize the data for processing and network transferring, we should use “1” in the SELECT clause of an internal statement, as shown below:
IF EXISTS (SELECT 1 FROM sysobjects WHERE name = 'MyTable' AND type = 'U')
  • Use the sp_executesql stored procedure instead of the EXECUTE statement. The sp_executesql stored procedure supports parameters. So, using the sp_executesql stored procedure instead of the EXECUTE statement improve the re-usability of your code. The execution plan of a dynamic statement can be reused only if each and every character, including case, space, comments and parameter, is same for two statements. For example, if we execute the below batch:
DECLARE @Query VARCHAR(100)DECLARE @Age INT SET @Age = 25 SET @Query = 'SELECT * FROM dbo.tblPerson WHERE Age = ' + CONVERT(VARCHAR(3),@Age)EXEC (@Query)
If we again execute the above batch using different @Age value, then the execution plan for SELECT statement created for @Age =25 would not be reused. However, if we write the above batch as given below,
DECLARE @Query NVARCHAR(100)SET @Query = N'SELECT * FROM dbo.tblPerson WHERE Age = @Age'EXECUTE sp_executesql @Query, N'@Age int', @Age = 25
the compiled plan of this SELECT statement will be reused for different value of @Age parameter. The reuse of the existing complied plan will result in improved performance.
  • Try to avoid using SQL Server cursors whenever possible: Cursor uses a lot of resources for overhead processing to maintain current record position in a recordset and this decreases the performance. If we need to process records one-by-one in a loop, then we should use the WHILE clause. Wherever possible, we should replace the cursor-based approach with SET-based approach. Because the SQL Server engine is designed and optimized to perform SET-based operation very fast. Again, please note cursor is also a kind of WHILE Loop.
  • Keep the Transaction as short as possible: The length of transaction affects blocking and deadlocking. Exclusive lock is not released until the end of transaction. In higher isolation level, the shared locks are also aged with transaction. Therefore, lengthy transaction means locks for longer time and locks for longer time turns into blocking. In some cases, blocking also converts into deadlocks. So, for faster execution and less blocking, the transaction should be kept as short as possible.
  • Use TRY-Catch for error handling: Prior to SQL server 2005 version code for error handling, there was a big portion of actual code because an error check statement was written after every t-sql statement. More code always consumes more resources and time. In SQL Server 2005, a new simple way is introduced for the same purpose. The syntax is as follows:
BEGIN TRY --Your t-sql code goes hereEND TRY BEGIN CATCH --Your error handling code goes hereEND CATCH
Don't use "SELECT*" in a SQL query
  • Unnecessary columns may get fetched that will add expense to the data retrieval time.
  • The database engine cannot utilize the benefit of "Covered Index" (discussed in the previous article), and hence the query performs slowly.
Avoid unnecessary columns in the SELECT list and unnecessary tables in join conditions
  • Selecting unnecessary columns in a Select query adds overhead to the actual query, specially if the unnecessary columns are of LOB types.
  • Including unnecessary tables in join conditions forces the database engine to retrieve and fetch unnecessary data and increases the query execution time.
Do not use the COUNT() aggregate in a subquery to do an existence check
  • Do not use:
  • SELECT column_list FROM table WHERE 0 < (SELECT count(*) FROM table2 WHERE ..)
    Instead, use:
    SELECT column_list FROM table WHERE EXISTS (SELECT * FROM table2 WHERE ...)
  • When you use COUNT(), SQL Server does not know that you are doing an existence check. It counts all matching values, either by doing a table scan or by scanning the smallest non-clustered index.
  • When you use EXISTS, SQL Server knows you are doing an existence check. When it finds the first matching value, it returns TRUE and stops looking. The same applies to using COUNT() instead of IN or ANY.
Try to avoid joining between two types of columns
  • When joining between two columns of different data types, one of the columns must be converted to the type of the other. The column whose type is lower is the one that is converted.
  • If you are joining tables with incompatible types, one of them can use an index, but the query optimizer cannot choose an index on the column that it converts. For example:
  • SELECT column_list FROM small_table, large_table WHERE
    smalltable.float_column = large_table.int_column 
    In this case, SQL Server converts the integer column to float, because int is lower in the hierarchy than float. It cannot use an index on large_table.int_column, although it can use an index on smalltable.float_column.
Try to avoid deadlocks
  • Always access tables in the same order in all your Stored Procedures and triggers consistently.
  • Keep your transactions as short as possible. Touch as few data as possible during a transaction.
  • Never, ever wait for user input in the middle of a transaction.
Write TSQL using "Set based approach" rather than "Procedural approach"
  • The database engine is optimized for Set based SQL. Hence, Procedural approach (use of Cursor or UDF to process rows in a result set) should be avoided when large result sets (more than 1000) have to be processed.
  • How can we get rid of "Procedural SQL"? Follow these simple tricks:
    • Use inline sub queries to replace User Defined Functions.
    • Use correlated sub queries to replace Cursor based code.
    • If procedural coding is really necessary, at least, use a table variable instead of a cursor to navigate and process the result set.
For more info on "set" and "procedural" SQL, see Understanding "Set based" and "Procedural" approaches in SQL.
Try not to use COUNT(*) to obtain the record count in a table
  • To get the total row count in a table, we usually use the following Select statement:
  • SELECT COUNT(*) FROM dbo.orders
    This query will perform a full table scan to get the row count.
  • The following query would not require a full table scan. (Please note that this might not give you 100% perfect results always, but this is handy only if you don't need a perfect count.)
  • SELECT rows FROM sysindexes 
    WHERE id = OBJECT_ID('dbo.Orders') AND indid < 2
Try to avoid dynamic SQL
Unless really required, try to avoid the use of dynamic SQL because:
  • Dynamic SQL is hard to debug and troubleshoot.
  • If the user provides the input to the dynamic SQL, then there is possibility of SQL injection attacks.
Try to avoid the use of temporary tables
  • Unless really required, try to avoid the use of temporary tables. Rather use table variables.
  • In 99% of cases, table variables reside in memory, hence it is a lot faster. Temporary tables reside in the TempDb database. So operating on temporary tables require inter database communication and hence will be slower.
Instead of LIKE search, use full text search for searching textual data
Full text searches always outperform LIKE searches.
  • Full text searches will enable you to implement complex search criteria that can't be implemented using a LIKE search, such as searching on a single word or phrase (and optionally, ranking the result set), searching on a word or phrase close to another word or phrase, or searching on synonymous forms of a specific word.
  • Implementing full text search is easier to implement than LIKE search (especially in the case of complex search requirements).
  • For more info on full text search, see http://msdn.microsoft.com/en-us/library/ms142571(SQL.90).aspx
Try to use UNION to implement an "OR" operation
  • Try not to use "OR" in a query. Instead use "UNION" to combine the result set of two distinguished queries. This will improve query performance.
  • Better use UNION ALL if a distinguished result is not required. UNION ALL is faster than UNION as it does not have to sort the result set to find out the distinguished values.
Implement a lazy loading strategy for large objects
  • Store Large Object columns (like VARCHAR(MAX), Image, Text etc.) in a different table than the main table, and put a reference to the large object in the main table.
  • Retrieve all the main table data in a query, and if a large object is required to be loaded, retrieve the large object data from the large object table only when it is required.
Use VARCHAR(MAX), VARBINARY(MAX), and NVARCHAR(MAX)
  • In SQL Server 2000, a row cannot exceed 8000 bytes in size. This limitation is due to the 8 KB internal page size of SQL Server. So to store more data in a single column, you need to use TEXT, NTEXT, or IMAGE data types (BLOBs) which are stored in a collection of 8 KB data pages.
  • These are unlike the data pages that store other data in the same table. These pages are arranged in a B-tree structure. These data cannot be used as variables in a procedure or a function, and they cannot be used inside string functions such as REPLACE, CHARINDEX, or SUBSTRING. In most cases, you have to use READTEXT, WRITETEXT, and UPDATETEXT.
  • To solve this problem, use VARCHAR(MAX), NVARCHAR(MAX), and VARBINARY(MAX) in SQL Server 2005. These data types can hold the same amount of data BLOBs can hold (2 GB), and they are stored in the same type of data pages used for other data types.
  • When data in a MAX data type exceeds 8 KB, an over-flow page is used (in the ROW_OVERFLOW allocation unit), and a pointer to the page is left in the original data page in the IN_ROW allocation unit.
Implement the following good practices in User Defined Functions
  • Do not call functions repeatedly within your Stored Procedures, triggers, functions, and batches. For example, you might need the length of a string variable in many places of your procedure, but don't call the LEN function whenever it's needed; instead, call the LEN function once, and store the result in a variable for later use.
Implement the following good practices in Stored Procedures
  • Do not use "SP_XXX" as a naming convention. It causes additional searches and added I/O (because the system Stored Procedure names start with "SP_"). Using "SP_XXX" as the naming convention also increases the possibility of conflicting with an existing system Stored Procedure.
  • Use "Set Nocount On" to eliminate extra network trip.
  • Use the WITH RECOMPILE clause in the EXECUTE statement (first time) when the index structure changes (so that the compiled version of the Stored Procedure can take advantage of the newly created indexes).
  • Use default parameter values for easy testing.
Implement the following good practices in Triggers
  • Try to avoid the use of triggers. Firing a trigger and executing the triggering event is an expensive process.
  • Never use triggers that can be implemented using constraints.
  • Do not use the same trigger for different triggering events (Insert, Update, Delete).
  • Do not use transactional code inside a trigger. The trigger always runs within the transactional scope of the code that fires the trigger.
Implement the following good practices in Views
  • Use views for re-using complex TSQL blocks, and to enable it for indexed views (Will be discussed later).
  • Use views with the SCHEMABINDING option if you do not want to let users modify the table schema accidentally.
  • Do not use views that retrieve data from a single table only (that will be an unnecessary overhead). Use views for writing queries that access columns from multiple tables.
Implement the following good practices in Transactions
  • Prior to SQL Server 2005, after BEGIN TRANSACTION and each subsequent modification statement, the value of @@ERROR had to be checked. If its value was non-zero, then the last statement caused an error, and if an error occurred, the transaction had to be rolled back and an error had to be raised (for the application). In SQL Server 2005 and onwards, the Try...Catch block can be used to handle transactions in TSQL. So try to use Try...Catch based transactional code.
  • Try to avoid nested transactions. Use the @@TRANCOUNT variable to determine whether a transaction needs to be started (to avoid nested transactions).
  • Start a transaction as late as possible and commit/rollback the transaction as fast as possible to reduce the time period of resource locking.

No comments:

Post a Comment