Thursday, 15 August 2013

Stored Procedure Optimization Tips – Best Practices

  • Include SET NOCOUNT ON statement: With every SELECT and DML statement, the SQL server returns a message that indicates the number of affected rows by that statement. This information is mostly helpful in debugging the code, but it is useless after that. By setting SET NOCOUNT ON, we can disable the feature of returning this extra information. For stored procedures that contain several statements or contain Transact-SQL loops, setting SET NOCOUNT to ON can provide a significant performance boost because network traffic is greatly reduced.
CREATE PROC dbo.ProcName AS SET NOCOUNT ON;--Procedure code hereSELECT column1 FROM dbo.TblTable1 -- Reset SET NOCOUNT to OFFSET NOCOUNT OFF;GO
  • Use schema name with object name: The object name is qualified if used with schema name. Schema name should be used with the stored procedure name and with all objects referenced inside the stored procedure. This help in directly finding the complied plan instead of searching the objects in other possible schema before finally deciding to use a cached plan, if available. This process of searching and deciding a schema for an object leads to COMPILE lock on stored procedure and decreases the stored procedure’s performance. Therefore, always refer the objects with qualified name in the stored procedure like
SELECT * FROM dbo.MyTable -- Preferred method -- Instead ofSELECT * FROM MyTable -- Avoid this method --And finally call the stored procedure with qualified name like:EXEC dbo.MyProc -- Preferred method --Instead ofEXEC MyProc -- Avoid this method
  • Do not use the prefix “sp_” in the stored procedure name: If a stored procedure name begins with “SP_,” then SQL server first searches in the master database and then in the current session database. Searching in the master database causes extra overhead and even a wrong result if another stored procedure with the same name is found in master database.
  • Use IF EXISTS (SELECT 1) instead of (SELECT *): To check the existence of a record in another table, we uses the IF EXISTS clause. The IF EXISTS clause returns True if any value is returned from an internal statement, either a single value “1” or all columns of a record or complete recordset. The output of the internal statement is not used. Hence, to minimize the data for processing and network transferring, we should use “1” in the SELECT clause of an internal statement, as shown below:
IF EXISTS (SELECT 1 FROM sysobjects WHERE name = 'MyTable' AND type = 'U')
  • Use the sp_executesql stored procedure instead of the EXECUTE statement. The sp_executesql stored procedure supports parameters. So, using the sp_executesql stored procedure instead of the EXECUTE statement improve the re-usability of your code. The execution plan of a dynamic statement can be reused only if each and every character, including case, space, comments and parameter, is same for two statements. For example, if we execute the below batch:
DECLARE @Query VARCHAR(100)DECLARE @Age INT SET @Age = 25 SET @Query = 'SELECT * FROM dbo.tblPerson WHERE Age = ' + CONVERT(VARCHAR(3),@Age)EXEC (@Query)
If we again execute the above batch using different @Age value, then the execution plan for SELECT statement created for @Age =25 would not be reused. However, if we write the above batch as given below,
DECLARE @Query NVARCHAR(100)SET @Query = N'SELECT * FROM dbo.tblPerson WHERE Age = @Age'EXECUTE sp_executesql @Query, N'@Age int', @Age = 25
the compiled plan of this SELECT statement will be reused for different value of @Age parameter. The reuse of the existing complied plan will result in improved performance.
  • Try to avoid using SQL Server cursors whenever possible: Cursor uses a lot of resources for overhead processing to maintain current record position in a recordset and this decreases the performance. If we need to process records one-by-one in a loop, then we should use the WHILE clause. Wherever possible, we should replace the cursor-based approach with SET-based approach. Because the SQL Server engine is designed and optimized to perform SET-based operation very fast. Again, please note cursor is also a kind of WHILE Loop.
  • Keep the Transaction as short as possible: The length of transaction affects blocking and deadlocking. Exclusive lock is not released until the end of transaction. In higher isolation level, the shared locks are also aged with transaction. Therefore, lengthy transaction means locks for longer time and locks for longer time turns into blocking. In some cases, blocking also converts into deadlocks. So, for faster execution and less blocking, the transaction should be kept as short as possible.
  • Use TRY-Catch for error handling: Prior to SQL server 2005 version code for error handling, there was a big portion of actual code because an error check statement was written after every t-sql statement. More code always consumes more resources and time. In SQL Server 2005, a new simple way is introduced for the same purpose. The syntax is as follows:
BEGIN TRY --Your t-sql code goes hereEND TRY BEGIN CATCH --Your error handling code goes hereEND CATCH
Don't use "SELECT*" in a SQL query
  • Unnecessary columns may get fetched that will add expense to the data retrieval time.
  • The database engine cannot utilize the benefit of "Covered Index" (discussed in the previous article), and hence the query performs slowly.
Avoid unnecessary columns in the SELECT list and unnecessary tables in join conditions
  • Selecting unnecessary columns in a Select query adds overhead to the actual query, specially if the unnecessary columns are of LOB types.
  • Including unnecessary tables in join conditions forces the database engine to retrieve and fetch unnecessary data and increases the query execution time.
Do not use the COUNT() aggregate in a subquery to do an existence check
  • Do not use:
  • SELECT column_list FROM table WHERE 0 < (SELECT count(*) FROM table2 WHERE ..)
    Instead, use:
    SELECT column_list FROM table WHERE EXISTS (SELECT * FROM table2 WHERE ...)
  • When you use COUNT(), SQL Server does not know that you are doing an existence check. It counts all matching values, either by doing a table scan or by scanning the smallest non-clustered index.
  • When you use EXISTS, SQL Server knows you are doing an existence check. When it finds the first matching value, it returns TRUE and stops looking. The same applies to using COUNT() instead of IN or ANY.
Try to avoid joining between two types of columns
  • When joining between two columns of different data types, one of the columns must be converted to the type of the other. The column whose type is lower is the one that is converted.
  • If you are joining tables with incompatible types, one of them can use an index, but the query optimizer cannot choose an index on the column that it converts. For example:
  • SELECT column_list FROM small_table, large_table WHERE
    smalltable.float_column = large_table.int_column 
    In this case, SQL Server converts the integer column to float, because int is lower in the hierarchy than float. It cannot use an index on large_table.int_column, although it can use an index on smalltable.float_column.
Try to avoid deadlocks
  • Always access tables in the same order in all your Stored Procedures and triggers consistently.
  • Keep your transactions as short as possible. Touch as few data as possible during a transaction.
  • Never, ever wait for user input in the middle of a transaction.
Write TSQL using "Set based approach" rather than "Procedural approach"
  • The database engine is optimized for Set based SQL. Hence, Procedural approach (use of Cursor or UDF to process rows in a result set) should be avoided when large result sets (more than 1000) have to be processed.
  • How can we get rid of "Procedural SQL"? Follow these simple tricks:
    • Use inline sub queries to replace User Defined Functions.
    • Use correlated sub queries to replace Cursor based code.
    • If procedural coding is really necessary, at least, use a table variable instead of a cursor to navigate and process the result set.
For more info on "set" and "procedural" SQL, see Understanding "Set based" and "Procedural" approaches in SQL.
Try not to use COUNT(*) to obtain the record count in a table
  • To get the total row count in a table, we usually use the following Select statement:
  • SELECT COUNT(*) FROM dbo.orders
    This query will perform a full table scan to get the row count.
  • The following query would not require a full table scan. (Please note that this might not give you 100% perfect results always, but this is handy only if you don't need a perfect count.)
  • SELECT rows FROM sysindexes 
    WHERE id = OBJECT_ID('dbo.Orders') AND indid < 2
Try to avoid dynamic SQL
Unless really required, try to avoid the use of dynamic SQL because:
  • Dynamic SQL is hard to debug and troubleshoot.
  • If the user provides the input to the dynamic SQL, then there is possibility of SQL injection attacks.
Try to avoid the use of temporary tables
  • Unless really required, try to avoid the use of temporary tables. Rather use table variables.
  • In 99% of cases, table variables reside in memory, hence it is a lot faster. Temporary tables reside in the TempDb database. So operating on temporary tables require inter database communication and hence will be slower.
Instead of LIKE search, use full text search for searching textual data
Full text searches always outperform LIKE searches.
  • Full text searches will enable you to implement complex search criteria that can't be implemented using a LIKE search, such as searching on a single word or phrase (and optionally, ranking the result set), searching on a word or phrase close to another word or phrase, or searching on synonymous forms of a specific word.
  • Implementing full text search is easier to implement than LIKE search (especially in the case of complex search requirements).
  • For more info on full text search, see http://msdn.microsoft.com/en-us/library/ms142571(SQL.90).aspx
Try to use UNION to implement an "OR" operation
  • Try not to use "OR" in a query. Instead use "UNION" to combine the result set of two distinguished queries. This will improve query performance.
  • Better use UNION ALL if a distinguished result is not required. UNION ALL is faster than UNION as it does not have to sort the result set to find out the distinguished values.
Implement a lazy loading strategy for large objects
  • Store Large Object columns (like VARCHAR(MAX), Image, Text etc.) in a different table than the main table, and put a reference to the large object in the main table.
  • Retrieve all the main table data in a query, and if a large object is required to be loaded, retrieve the large object data from the large object table only when it is required.
Use VARCHAR(MAX), VARBINARY(MAX), and NVARCHAR(MAX)
  • In SQL Server 2000, a row cannot exceed 8000 bytes in size. This limitation is due to the 8 KB internal page size of SQL Server. So to store more data in a single column, you need to use TEXT, NTEXT, or IMAGE data types (BLOBs) which are stored in a collection of 8 KB data pages.
  • These are unlike the data pages that store other data in the same table. These pages are arranged in a B-tree structure. These data cannot be used as variables in a procedure or a function, and they cannot be used inside string functions such as REPLACE, CHARINDEX, or SUBSTRING. In most cases, you have to use READTEXT, WRITETEXT, and UPDATETEXT.
  • To solve this problem, use VARCHAR(MAX), NVARCHAR(MAX), and VARBINARY(MAX) in SQL Server 2005. These data types can hold the same amount of data BLOBs can hold (2 GB), and they are stored in the same type of data pages used for other data types.
  • When data in a MAX data type exceeds 8 KB, an over-flow page is used (in the ROW_OVERFLOW allocation unit), and a pointer to the page is left in the original data page in the IN_ROW allocation unit.
Implement the following good practices in User Defined Functions
  • Do not call functions repeatedly within your Stored Procedures, triggers, functions, and batches. For example, you might need the length of a string variable in many places of your procedure, but don't call the LEN function whenever it's needed; instead, call the LEN function once, and store the result in a variable for later use.
Implement the following good practices in Stored Procedures
  • Do not use "SP_XXX" as a naming convention. It causes additional searches and added I/O (because the system Stored Procedure names start with "SP_"). Using "SP_XXX" as the naming convention also increases the possibility of conflicting with an existing system Stored Procedure.
  • Use "Set Nocount On" to eliminate extra network trip.
  • Use the WITH RECOMPILE clause in the EXECUTE statement (first time) when the index structure changes (so that the compiled version of the Stored Procedure can take advantage of the newly created indexes).
  • Use default parameter values for easy testing.
Implement the following good practices in Triggers
  • Try to avoid the use of triggers. Firing a trigger and executing the triggering event is an expensive process.
  • Never use triggers that can be implemented using constraints.
  • Do not use the same trigger for different triggering events (Insert, Update, Delete).
  • Do not use transactional code inside a trigger. The trigger always runs within the transactional scope of the code that fires the trigger.
Implement the following good practices in Views
  • Use views for re-using complex TSQL blocks, and to enable it for indexed views (Will be discussed later).
  • Use views with the SCHEMABINDING option if you do not want to let users modify the table schema accidentally.
  • Do not use views that retrieve data from a single table only (that will be an unnecessary overhead). Use views for writing queries that access columns from multiple tables.
Implement the following good practices in Transactions
  • Prior to SQL Server 2005, after BEGIN TRANSACTION and each subsequent modification statement, the value of @@ERROR had to be checked. If its value was non-zero, then the last statement caused an error, and if an error occurred, the transaction had to be rolled back and an error had to be raised (for the application). In SQL Server 2005 and onwards, the Try...Catch block can be used to handle transactions in TSQL. So try to use Try...Catch based transactional code.
  • Try to avoid nested transactions. Use the @@TRANCOUNT variable to determine whether a transaction needs to be started (to avoid nested transactions).
  • Start a transaction as late as possible and commit/rollback the transaction as fast as possible to reduce the time period of resource locking.

Optimization cont'd

If we will follow following tips or rules while write sql queries we can improve the performance of sql queries:

1. Help Query optimizer to estimate correct statistics

In sql server execution plan is decided by:

1. Cardinality: The total number of rows processed at each level of query.

2. Query Plan: Algorithm to use to perform task like searching, sorting etc.

When we execute a sql queries, create index, insert records in a table etc sql server automatically creates or updates the statistics of key fields of tables. In simple words we can say statistics is object which keeps information about total numbers of distinct records in a table.

There a few cases where query optimizer is unable to get correct information about statistics which decrease the performance of a sql query. For examples:

First we are creating a tblMessage table and inserting 97526 records into it and creating three indexes on it:

CREATE TABLE tblMessage (
    ntMessageID BIGINT IDENTITY PRIMARY KEY,
    ntSeverity INT,
    vcMessage VARCHAR(500),
    dtDate DATETIME
)

INSERT INTO tblMessage
SELECT
    Severity,
    LEFT ([TEXT], 500),
    DATEADD (YEAR,severity,GETDATE())  
FROM Sys.messages

CREATE NONCLUSTERED INDEX NC_Severity
ON tblMessage(ntSeverity)
INCLUDE (vcMessage)

CREATE NONCLUSTERED INDEX NC_Msg
ON tblMessage(vcMessage)

CREATE NONCLUSTERED INDEX NC_Date
ON tblMessage(dtDate)
INCLUDE (ntSeverity,vcMessage)

Note: We are setting statistics IO on to know the total logical reads:

SET STATISTICS IO ON

Scenario 1:

a. Consider on the following two sql queries:

Query 1:

SELECT vcMessage FROM tblMessage
WHERE LEFT (vcMessage,1) = 'a'

Query 2:

SELECT vcMessage FROM tblMessage
WHERE vcMessage LIKE 'a%'

Both of them will perform same task from query point of view. Now check the logical read in message tab:

Logical read (Query 1):  1471
Logical read (Query 2):  79

So, Query 1 has to go for approximately 18 times more logical read than query 2.

Now check the actual execution plan of both of the query:

We can observe query 1 is using index scan while query 2 is index seek and overall cost of query 1 is 96% while of query 2 is only 4%.

So, now we can say query2 is much better than query 1.

b. Consider on the following two sql queries:

Query 1:

SELECT * FROM tblMessage WHERE YEAR (dtDate) = '2012'

Query 2:

SELECT * FROM tblMessage
WHERE dtDate >= '2012-01-01' AND dtDate < '2013-01-01'

Both of them will perform same task from query point of view. Now check the logical read in message tab:

Logical read (Query 1):  1589
Logical read (Query 2):  10

So, Query 1 has to go for approximately 160 times more logical read than query 2.

Now if we will check the actual execution plan of both of the query we will observe query 1 is using index scan while query 2 is index seek and overall cost of query 1 is 99% while of query 2 is only 1%.

In the same way if we will analyze the following queries query 2 will perform much better than query 1:

C.

Query 1:

SELECT vcMessage FROM tblMessage
WHERE (ntSeverity * 5)/2 = 35

Query 2:

SELECT vcMessage FROM tblMessage
WHERE ntSeverity = (35/5) * 2

Logical read (Query 1): 1490
Logical read (Query 2): 16
Batch query cost (Query 1): 100%
Batch query cost (Query 2): 0%

D.

Query 1:

SELECT vcMessage FROM tblMessage
WHERE LOWER (vcMessage) = LOWER ('Text')

Query 2:

SELECT vcMessage FROM tblMessage
WHERE vcMessage = 'Text'

Or

SELECT vcMessage FROM tblMessage
WHERE vcMessage = 'Text' COLLATE SQL_Latin1_General_CP1_CI_AS

Logical read (Query 1): 1471
Logical read (Query 2): 3
Batch query cost (Query 1): 100%
Batch query cost (Query 2): 0%

Note: Default collation of sql server is case in sensitive so no need of user LOWER function to perform case insensitive comparison. If default collation of your database or table or column is  case sensitive then we should use COLLATE clause with any case in insensitive collation.

E.

Query 1:

SELECT vcMessage FROM tblMessage
WHERE CAST (ntSeverity AS VARCHAR) = '14'

Query 2:

SELECT vcMessage FROM tblMessage
WHERE ntSeverity = CAST ('14' AS INT)

Logical read (Query 1): 1490
Logical read (Query 2): 16
Batch query cost (Query 1): 99%
Batch query cost (Query 2): 1%

Question: Why query 2 is performing better than query 1 in all of the above examples?

Answer: If you will notice the WHERE clause in all above examples in query 1, you will get that we are performing some operations with the field of table tblMessage while in query 2 equivalent value has used in constant expression.

If we perform following operations on field of any table in query predicate:

1. Using any system function or user defined function
2. Scalar operation like addition, multiplication etc.
3. Type casting

In this situation sql server query optimizer is not able to estimate correct cardinality using statistics.

Query 1:

SELECT vcMessage FROM tblMessage
WHERE LEFT (vcMessage, 1) = 'a'

Query 2:

SELECT vcMessage FROM tblMessage
WHERE vcMessage LIKE 'a%'

Both of the above queries select total records: 4171

If we check the detail section of execution plan we will get:

Actual cardinality: 4171
Cardinality (Query 1): 2432
Cardinality (Query 2):  4131.76

So cardinality of query 2 (4131.76) is very near to correct cardinality i.e. total numbers records filter by WHERE clause and pass to SELECT clause (4171)

Tips: If possible, don't perform any operation on the any field of a table in WHERE Clause, ON Clause, HAVING Clause. Instead of this, write equivalent constant expression.

Scenario 2:

Consider on the following two sql statements:

Query 1:

SELECT vcMessage FROM tblmessage
WHERE vcMessage LIKE 'Text%'

Query 2:

DECLARE @Msg AS VARCHAR (500) = 'Text%'
SELECT vcMessage FROM tblmessage
WHERE vcMessage LIKE @Msg

If we will check the execution plan of both the queries:

Total query cost of query 2 is 96% while query 1 is only 4%!!!

What wrong with query 2?

Again, same cause. Query optimizer is not able to get correct cardinality use statistics.

1. Actual cardinality: 55
2. Carnality (Query 1):  40.9086
3. Carnality (Query 2): 6222.16

If we will use local variables in query predicate,   sql server query optimizer is not able to estimate correct cardinality using statistics.

Tips:

Don't use local variables in WHERE Clause, ON Clause, HAVING Clause. Instead of this, use constant expressions.

Question: How to resolve this problem when this variable is the parameter of stored procedures or functions?

Scenario 3:

Consider on the following sql statement:

Query 1:

SELECT vcMessage FROM tblMessage
WHERE ntSeverity * ntMessageID = 0

There any many business logic where we use more than one field and perform some arithmetic operations in query predicate.  This is called cross relationship among fields. Above query is example of cross relationship of field ntSeverity and ntMessageID 

If there is any cross relationship among different fields of a table in query predicate, sql server query optimizer is not able to estimate correct cardinality using statistics.

To resolve this problem sql server has introduced computed column. So we have to create a computed column in the table tblMessage. In this example we are creating a other table tblMessage_Dense, inserting same records and creating a non- clustered index:

CREATE TABLE tblMessage_Dense(
    ntMessageID BIGINT IDENTITY PRIMARY KEY,
    ntSeverity INT,
    vcMessage VARCHAR(500),
    dtDate DATETIME,
    ntDensity AS ntSeverity * ntMessageID
)

INSERT tblMessage_Dense(ntSeverity,vcMessage,dtDate)
SELECT
    Severity,
    LEFT ([TEXT], 500),
    DATEADD (YEAR,severity,GETDATE())  
FROM Sys.messages

CREATE NONCLUSTERED INDEX NC_Density
ON tblMessage_Dense(ntDensity)
INCLUDE (vcMessage)

And our new sql query will be:

Query 2:

SELECT vcMessage FROM tblMessage_Dense
WHERE ntDensity = 0

Ok, now compare the performance of both the sql queries:

Logical read (Query 1): 1490
Logical read (Query 2):  9

So, Query 1 has to go for approximately 165 times more logical read than query 2.

Now check the actual execution plan of both of the query:

We can observe query 1 is using index scan while query 2 is index seek and overall cost of query 1 is 99% while of query 2 is only 1%.

So, now we can say query 2 is much better than query 1.

Note: Good news, a non – persisted computed column is not stored physically in memory.

Suppose in our application there a table tblQueue:

CREATE TABLE tblQueue (
    ntQueueID BIGINT IDENTITY PRIMARY KEY,
    vcQueueName VARCHAR(500)
)

INSERT tblQueue
SELECT  '1_' +
    CAST(message_id AS VARCHAR) + '_' +
    CAST(severity AS VARCHAR) + '_' +
    CAST(language_id AS VARCHAR)  FROM Sys.messages

This application very frequently executes a sql query which gets batch id form a column vcQueueName:

Query 1:

SELECT * FROM tblQueue
WHERE CAST (REVERSE (LEFT (REVERSE (vcQueueName), CHARINDEX ('_', REVERSE (vcQueueName)) - 1)) AS INT) = 1031

In this situation, it would be good idea to create a computed column BatchID in the table. In this example we are creating a new table tblQueue_Batch: 

CREATE TABLE tblQueue_Batch (
    ntQueueID BIGINT IDENTITY PRIMARY KEY,
    vcQueueName VARCHAR(500),
    ntBatchID AS  CAST( REVERSE( LEFT( REVERSE( vcQueueName), CHARINDEX('_',REVERSE(vcQueueName)) - 1)) AS INT)
)

INSERT tblQueue_Batch
SELECT '1_' +
    CAST (message_id AS VARCHAR) + '_' +
    CAST (severity AS VARCHAR) + '_' +
    CAST (language_id AS VARCHAR) FROM Sys.messages

CREATE NONCLUSTERED INDEX NC_Batch
ON tblQueue_Batch (ntBatchID)
INCLUDE (vcQueueName)

This sql query 2 is much more efficient than previous one:

Query 2:

SELECT * FROM tblQueue_Batch  WHERE ntBatchID = 1031

Logical read (Query 1): 430
Logical read (Query 2): 44
Batch query cost (Query 1): 91%
Batch query cost (Query 2): 9%

Tips:

If there is any cross relationship among fields or there is a complex expression in a field in a query predicates, it is better to create a computed column and then create a non-clustered index on it.

Scenario 4:

In sql server query optimizer can estimate correct cardinality if sql statement use variable in query predicate only if that variable is parameter of stored procedures or functions. For example:

We are creating two stored procedures:

CREATE PROC uspGetMsg_1 (
    @Severity AS INT
)
AS
BEGIN
    SELECT vcMessage FROM tblMessage
WHERE ntSeverity = @Severity
END

Go

CREATE PROC uspGetMsg_2
AS
BEGIN
    SELECT vcMessage FROM tblMessage
WHERE ntSeverity = 15
END

Query 1:

EXECUTE uspGetMsg_1 15

Query 2:

EXECUTE uspGetMsg_2

From query performance point of view both queries has exactly same execution cost. There are no differences, either we use constant expression or pass it as a parameter.

Now we are editing the stored procedures uspGetMsg_2:

ALTER PROC uspGetMsg_2 (
    @Severity AS INT
)
AS
BEGIN

    SET @Severity = @Severity - 1
    SELECT vcMessage FROM tblMessage
WHERE ntSeverity = @Severity
END

Now consider on the following two sql statements:

Query 1:

EXECUTE uspGetMsg_1 15

Query 2:

EXECUTE uspGetMsg_2 16

If we will check the execution plan of both of the queries:

Execution cost of query 2 is 95% while query 1 is only 5%

1. Actual cardinality: 3102
2. Cardinality (Query 1): 3102
3. Cardinality (Query 2): 69839

So, again query 2 is not able to estimate correct cardinality. If you will check, then you will find query 2 is using carnality of query :

SELECT vcMessage FROM tblMessage WHERE ntSeverity = 16

While we have updated the value varaible @Severity to 15. So ,it should use the cardanility of query:

SELECT vcMessage FROM tblMessage WHERE ntSeverity = 15

But it is not true. So we can say query optimizer estimate the cardinality from statistics on the basis value passed to parameter of a function or a stored procedure not the updated value of the parameter inside the function or stored procedures:

Tips:

If possible, don't update the value of parameters of a function or stored procedure before using in sql statement, instead of this pass the updated value to the parameters.

Scenario 5:

In sql server, we empty the proc cache by using sql statement:

DBCC FREEPROCCACHE

Note: Don't use this query at production server.

Consider on following two sql statements:

Query 1:

EXECUTE uspGetMsg_1 16

Query 2:

EXECUTE uspGetMsg_1 15

If we will check the execution plan we will find:

Query 1:

Actual cardinality: 69839
Estimated cardinality: 69839

Query 2:

Actual cardinality: 3102
Estimated cardinality: 69839

Now empty the proc cache:

DBCC FREEPROCCACHE

Now execute same two queries but in reverse order that is:

Query 1:

EXECUTE uspGetMsg_1 15

Query 2:

EXECUTE uspGetMsg_1 16

If we will check the execution plan we will find:

Query 1:

Actual cardinality: 3102
Estimated cardinality: 3102

Query 2:

Actual cardinality: 69839
Estimated cardinality: 3102

Here we can observe estimated cardinally of query 2 is equal to estimated cardinality of query 1. In sql server creates the statistics of any parameters of stored procedures or functions on the basis of value to pass parameters when it has executed first time. Next time onward it uses same cardinality if statistics, query plan etc doesn't change. Due to wrong estimation of cardinality query optimizer can choose wrong execution plan which may decrease the performance of queries.

We can optimize a sql query by specifying the value in the sql statement which mostly passes the application by using query hints OPTIMIZE FOR. It will override the default behavior that is instead of optimizing the sql query on the basis of what first time values had passed.  For example:

CREATE PROC uspGetMsg(
    @Severity AS INT
)
AS
BEGIN
    SELECT vcMessage FROM tblMessage
    WHERE ntSeverity = @Severity
    OPTION (OPTIMIZE FOR (@Severity = 15))
END

Tips:

Use OPTIMIZE FOR clause when you want to optimize a sql query on the basis of specific parameter value.

Scenario 6:

There are many situations where we need to update the value of the parameter according to some condition inside a stored procedure of a function. For example:

CREATE PROC uspGetMsg(
    @Severity AS INT
)
AS
BEGIN

    IF @Severity IS NULL
         SET @Severity = (SELECT MAX(ntSeverity) FROM tblMessage)
       
    SELECT vcMessage FROM tblMessage WHERE ntSeverity = @Severity
END

Consider a situation when our application has executed the following sql query at the first time:

EXECUTE uspGetMsg NULL

Query optimizer will estimate the cardinality on the basis of:

SELECT vcMessage FROM tblMessage
WHERE ntSeverity = NULL

As we know NULL is never equal to any value. We use IS NULL to compare the null values. So this query will not return any result set. If a query doesn't return any value it cardinality will be one.

Let's check the cardinality in the execution plan:
Actual cardinality: 110
Estimated cardinality: 1

It will affect the performance of this query drastically.  Solution of this problem is, to create two stored procedures. For example:

CREATE PROC uspGetMsg (
    @Severity AS INT
)
AS
BEGIN

    IF @Severity IS NULL
         SET @Severity = (SELECT MAX (ntSeverity) FROM tblMessage)
       
    EXECUTE uspGetMsgNotNull @Severity
END

Go

CREATE PROC uspGetMsgNotNull @Severity
    @Severity AS INT
)
AS
BEGIN  
    SELECT vcMessage FROM tblMessage WHERE ntSeverity = @Severity
END

And from application use only the stored procedure uspGetMsg.

Tips:

If you want to update the value parameter of a stored procedure or a function create a similar procedure or function and execute it form base procedure or function by passing the updated value as a parameter.

Scenario 7:

In sql server, default setting of auto creates and updates statistics is on except asynchronous statistics. It means query optimizer automatically creates and updates the statistics according to requirement. More specifically it auto creates the statistics of table for primary key columns when we create a table, for key columns of clustered and non-clustered index when we create indexes, for query predicates when we execute the query and updates the statistics when threshold limit reach after updation, deletion or insertions of records.

Limitation: Query optimizer can create only single columns statistics and also it cannot create filtered statistics.

So, there are many cases where we can improve the performance of sql statements by executing the user defined statistics. For example:

SELECT vcMessage FROM tblMessage
WHERE ntMessageId < 1000
    AND ntSeverity IN (10, 11, 12, 16)

If we will check the execution plan we will get:

Actual cardanality: 705
Estimated cardanality: 839.486

If we will create following filtered index:

CREATE STATISTICS Stat_Severity
ON tblMessage(ntMessageId)
WHERE ntSeverity IN (10, 11, 12, 16)

Now estimated cardinality of previous query will be: 705.671

So, now query will perform better.

Tips:

Create user defined multi column statistics if query predicates have more than one fields of a table.

Query optimization

Use JOINs rather than subqueries

If possible (and if it makes sense), I suggest using JOIN statements rather than subqueries to improve performance. When a subquery is used as criteria in a SELECT statement, the values returned from the subquery are distinct. Returning a distinct list of values requires additional processing, which can slow down your queries.

Use explicit transactions

When data manipulation occurs in the database, the actions are written to the transaction log. If your statements are executing many DML statements, it might be a good idea to place them inside of a transaction for performance purposes. Placing the statements inside of a transaction will prevent all of the statements from being written to the transaction log serially.

Use UNION ALL instead of UNION

When you use the UNION clause to concatenate the results from two or more SELECT statements, duplicate records are removed. This duplicate removal requires additional computing to accomplish. If you are not concerned that your results may include duplicate records, use the UNION ALL clause, which concatenates the full results from the SELECT statements.

Use EXISTS when possible

When you need to check for the presence of certain conditions, it is usually faster to use the EXISTS function over COUNT(*). This is because COUNT(*) has to scan all records returned by the statement, while EXISTS will return a true value as soon as it finds a record that meets the criteria.

STATISTICS IO

There are different ways to determine the best way to write your queries. Two of my favorite methods are looking at the number of logical reads produced by the query and looking at graphical execution plans provided by SQL Server Management Studio. For determining the number of logical reads, you can turn the STATISTICS IO option ON. Consider this query:

SET STATISTICS IO ON
SELECT * FROM SalesHistory
The following is returned in the Messages window in SQL Server Management Studio:

Table 'SalesHistory'. Scan count 1, logical reads 33, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
There are several bits of data returned by STATISTICS IO, but I am really only concerned with the logical reads portion because it will tell me the number of pages read from the data cache. This is the most helpful to me because it will stay constant when I run the same query, which is important because there are sometimes external factors that might vary the execution time of my queries, such as locking by other queries.

When I'm tuning my queries, my goal is to get the number of logical reads as low as possible. Fewer logical reads typically leads to faster execution times.

Tip 1: Always use WHERE Clause in SELECT Queries while we don’t need all the rows to be returned. This will help to narrow the return rows else it will perform a whole table scan and waste the Sql server resources with increasing the network traffic. While scanning the whole it will lock the Table which may prevent other users to access the table.
Tip 2: It is seen many times developers use codes like 
SELECT * FROM OrderTable WHERE LOWER(UserName)='telsa'

Instead of writing it like the below

SELECT * FROM OrderTable WHERE UserName='telsa'

Infact both the queries does the same work but the 2nd one is better and retrieves rows more speedly than the first query. Because Sql Server is not case sensitive

Tip 3: While running a query, the operators used with the WHERE clause directly affect the performance. The operators shown below are in their decreasing order of their performance.

=
>,>=,<, <=
LIKE
<>
Tip 4 : When we are writing queries containing NOT IN, then this is going to offer poor performance as the optimizer need to use nested table scan to perform this activity. This can be avoided by using EXISTS or NOT EXISTS.

When there is a choice to use IN or EXIST, we should go with EXIST clause for better performance.

Tip 5: It is always best practice to use the Index seek while the columns are covered by an index, this will force the Query Optimizer to use the index while using IN or OR clauses as a part of our WHERE clause.

SELECT * FROM OrderTable WHERE Status = 1 AND OrderID IN (406,530,956)

Takes more time than

SELECT * FROM OrderTable (INDEX=IX_OrderID) WHERE Status = 1 AND OrderID IN (406,530,956)

Tip 6: While we use IN, in the sql query it better to use one or more leading characters in the clause instead of using the wildcard character at the starting.

SELECT * FROM CustomerTable WHERE CustomerName LIKE 'm%'

SELECT * FROM CustomerTable WHERE CustomerName LIKE '%m'

In the first query the Query optimizer is having the ability to use an index to perform the query and there by reducing the load on sql server. But in the second query, no suitable index can be created while running the query.

Tip 7: While there is case to use IN or BETWEEN clause in the query, it is always advisable to use BETWEEN for better result.

SELECT * FROM CustomerTable WHERE CustomerID BETWEEN (5000 AND 5005)

Performs better than

SELECT * FROM CustomerTable WHERE CustomerID IN (5000,5001,5002,5003,5004,5005)

Tip 8: Always avoid the use of SUBSTRING function in the query.

SELECT * FROM CustomerTable WHERE CustomerName LIKE 'n%'

Is much better than writing

SELECT * FROM CustomerTable WHERE SUBSTRING(CustomerName,1,1)='n'

Tip 9 : The queries having WHERE clause connected by AND operators are evaluated from left to right in the order they are written. So certain things should be taken care of like

Provide the least likely true expressions first in the AND. By doing this if the AND expression is false at the initial stage the clause will end immediately. So it will save execution time
If all the parts of the AND expression are equally like being false then better to put the Complex expression first. So if the complex works are false then less works to be done.
Tip 10: Its sometimes better to combine queries using UNION ALL instead of using many OR clauses.

SELECT CustomerID, FirstName, LastName FROM CustomerTable

WHERE City = 'Wichita' or ZIP = '67201' or State= 'Kansas'

The above query to use and index, it is required to have indexes on all the 3 columns.

The same query can be written as

SELECT CustomerID, FirstName, LastName FROM CustomerTable WHERE City = 'Wichita'

UNION ALL

SELECT CustomerID, FirstName, LastName FROM CustomerTable WHERE ZIP = '67201'

UNION ALL

SELECT CustomerID, FirstName, LastName FROM CustomerTable WHERE State= 'Kansas'

Both the queries will provide same results but if there is only an index on City and no indexes on the zip or state, then the first query will not use the index and a table scan is performed. But the 2nd one will use the index as the part of the query.

Tip 11: While the select statement contains a HAVING clause, its better to make the WHERE clause to do most of the works (removing the undesired rows) for the Query instead of letting the HAVING clause to do the works.

e.g. in a SELECT statement with GROUP BY and HAVING clause, things happens like first WHERE clause will select appropriate rows then GROUP BY divide them to group of rows and finally the HAVING clause have less works to perform, which will boost the performance.

Tip 12: Let’s take 2 situations

A query that takes 30 seconds to run, and then displays all of the required results.
A query that takes 60 seconds to run, but displays the first screen full of records in less than 1 second.
By looking at the above 2 situations a developer may choose to follow the 1st option, as it uses less resources and faster in performance. But actually the 2nd one is more acceptable by a DBA. An application may provide immediate feedback to the user, but actually this may not be happening at the background.

We can use a hint like
SELECT * FROM CustomerTable WHERE City = 'Wichita' OPTION(FAST n)
where n = number of rows that we want to display as fast as possible. This hint helps to return the specified number of rows as fast as possible without bothering about the time taken by the overall query.