Thursday, 15 August 2013

Query optimization

Use JOINs rather than subqueries

If possible (and if it makes sense), I suggest using JOIN statements rather than subqueries to improve performance. When a subquery is used as criteria in a SELECT statement, the values returned from the subquery are distinct. Returning a distinct list of values requires additional processing, which can slow down your queries.

Use explicit transactions

When data manipulation occurs in the database, the actions are written to the transaction log. If your statements are executing many DML statements, it might be a good idea to place them inside of a transaction for performance purposes. Placing the statements inside of a transaction will prevent all of the statements from being written to the transaction log serially.

Use UNION ALL instead of UNION

When you use the UNION clause to concatenate the results from two or more SELECT statements, duplicate records are removed. This duplicate removal requires additional computing to accomplish. If you are not concerned that your results may include duplicate records, use the UNION ALL clause, which concatenates the full results from the SELECT statements.

Use EXISTS when possible

When you need to check for the presence of certain conditions, it is usually faster to use the EXISTS function over COUNT(*). This is because COUNT(*) has to scan all records returned by the statement, while EXISTS will return a true value as soon as it finds a record that meets the criteria.


There are different ways to determine the best way to write your queries. Two of my favorite methods are looking at the number of logical reads produced by the query and looking at graphical execution plans provided by SQL Server Management Studio. For determining the number of logical reads, you can turn the STATISTICS IO option ON. Consider this query:

SELECT * FROM SalesHistory
The following is returned in the Messages window in SQL Server Management Studio:

Table 'SalesHistory'. Scan count 1, logical reads 33, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
There are several bits of data returned by STATISTICS IO, but I am really only concerned with the logical reads portion because it will tell me the number of pages read from the data cache. This is the most helpful to me because it will stay constant when I run the same query, which is important because there are sometimes external factors that might vary the execution time of my queries, such as locking by other queries.

When I'm tuning my queries, my goal is to get the number of logical reads as low as possible. Fewer logical reads typically leads to faster execution times.

Tip 1: Always use WHERE Clause in SELECT Queries while we don’t need all the rows to be returned. This will help to narrow the return rows else it will perform a whole table scan and waste the Sql server resources with increasing the network traffic. While scanning the whole it will lock the Table which may prevent other users to access the table.
Tip 2: It is seen many times developers use codes like 
SELECT * FROM OrderTable WHERE LOWER(UserName)='telsa'

Instead of writing it like the below

SELECT * FROM OrderTable WHERE UserName='telsa'

Infact both the queries does the same work but the 2nd one is better and retrieves rows more speedly than the first query. Because Sql Server is not case sensitive

Tip 3: While running a query, the operators used with the WHERE clause directly affect the performance. The operators shown below are in their decreasing order of their performance.

>,>=,<, <=
Tip 4 : When we are writing queries containing NOT IN, then this is going to offer poor performance as the optimizer need to use nested table scan to perform this activity. This can be avoided by using EXISTS or NOT EXISTS.

When there is a choice to use IN or EXIST, we should go with EXIST clause for better performance.

Tip 5: It is always best practice to use the Index seek while the columns are covered by an index, this will force the Query Optimizer to use the index while using IN or OR clauses as a part of our WHERE clause.

SELECT * FROM OrderTable WHERE Status = 1 AND OrderID IN (406,530,956)

Takes more time than

SELECT * FROM OrderTable (INDEX=IX_OrderID) WHERE Status = 1 AND OrderID IN (406,530,956)

Tip 6: While we use IN, in the sql query it better to use one or more leading characters in the clause instead of using the wildcard character at the starting.

SELECT * FROM CustomerTable WHERE CustomerName LIKE 'm%'

SELECT * FROM CustomerTable WHERE CustomerName LIKE '%m'

In the first query the Query optimizer is having the ability to use an index to perform the query and there by reducing the load on sql server. But in the second query, no suitable index can be created while running the query.

Tip 7: While there is case to use IN or BETWEEN clause in the query, it is always advisable to use BETWEEN for better result.

SELECT * FROM CustomerTable WHERE CustomerID BETWEEN (5000 AND 5005)

Performs better than

SELECT * FROM CustomerTable WHERE CustomerID IN (5000,5001,5002,5003,5004,5005)

Tip 8: Always avoid the use of SUBSTRING function in the query.

SELECT * FROM CustomerTable WHERE CustomerName LIKE 'n%'

Is much better than writing

SELECT * FROM CustomerTable WHERE SUBSTRING(CustomerName,1,1)='n'

Tip 9 : The queries having WHERE clause connected by AND operators are evaluated from left to right in the order they are written. So certain things should be taken care of like

Provide the least likely true expressions first in the AND. By doing this if the AND expression is false at the initial stage the clause will end immediately. So it will save execution time
If all the parts of the AND expression are equally like being false then better to put the Complex expression first. So if the complex works are false then less works to be done.
Tip 10: Its sometimes better to combine queries using UNION ALL instead of using many OR clauses.

SELECT CustomerID, FirstName, LastName FROM CustomerTable

WHERE City = 'Wichita' or ZIP = '67201' or State= 'Kansas'

The above query to use and index, it is required to have indexes on all the 3 columns.

The same query can be written as

SELECT CustomerID, FirstName, LastName FROM CustomerTable WHERE City = 'Wichita'


SELECT CustomerID, FirstName, LastName FROM CustomerTable WHERE ZIP = '67201'


SELECT CustomerID, FirstName, LastName FROM CustomerTable WHERE State= 'Kansas'

Both the queries will provide same results but if there is only an index on City and no indexes on the zip or state, then the first query will not use the index and a table scan is performed. But the 2nd one will use the index as the part of the query.

Tip 11: While the select statement contains a HAVING clause, its better to make the WHERE clause to do most of the works (removing the undesired rows) for the Query instead of letting the HAVING clause to do the works.

e.g. in a SELECT statement with GROUP BY and HAVING clause, things happens like first WHERE clause will select appropriate rows then GROUP BY divide them to group of rows and finally the HAVING clause have less works to perform, which will boost the performance.

Tip 12: Let’s take 2 situations

A query that takes 30 seconds to run, and then displays all of the required results.
A query that takes 60 seconds to run, but displays the first screen full of records in less than 1 second.
By looking at the above 2 situations a developer may choose to follow the 1st option, as it uses less resources and faster in performance. But actually the 2nd one is more acceptable by a DBA. An application may provide immediate feedback to the user, but actually this may not be happening at the background.

We can use a hint like
SELECT * FROM CustomerTable WHERE City = 'Wichita' OPTION(FAST n)
where n = number of rows that we want to display as fast as possible. This hint helps to return the specified number of rows as fast as possible without bothering about the time taken by the overall query.

No comments:

Post a Comment