Wednesday, 31 July 2013

DMV revisited.. PART 2

PAGEIOLATCH is related to IO issues, while PAGELATCH is not related to IO issues but is oftentimes linked to a buffer issue. Before we delve deeper in this interesting topic, first let us understand what Latch is.
Latches are internal SQL Server locks which can be described as very lightweight and short-term synchronization objects. Latches are not primarily to protect pages being read from disk into memory. It’s a synchronization object for any in-memory access to any portion of a log or data file.[Updated based on comment of Paul Randal]
Latches and Locks are two key synchronization objects that can affect system waits and overall performance. They are quite complex in nature and quite a lot of efforts are needed to understand them fully. For the sake of simpler understanding, I would term latches as light-weight internal locks that are usually applied on in-memory buffer pages. Locks are for ensuring transactional integrity which you can control to a large extend (using transaction isolation levels, query and table hints etc) where as latches are used internally by the relational and storage engines to synchronize access to in-memory pages.
Now, let us understand the wait stat type  related to latches.

From Book On-Line:

PAGELATCH_DT Occurs when a task is waiting on a latch for a buffer that is not in an I/O request. The latch request is in Destroy mode.
PAGELATCH_EX Occurs when a task is waiting on a latch for a buffer that is not in an I/O request. The latch request is in Exclusive mode.
PAGELATCH_KP Occurs when a task is waiting on a latch for a buffer that is not in an I/O request. The latch request is in Keep mode.
PAGELATCH_SH Occurs when a task is waiting on a latch for a buffer that is not in an I/O request. The latch request is in Shared mode.
PAGELATCH_UP Occurs when a task is waiting on a latch for a buffer that is not in an I/O request. The latch request is in Update mode.

PAGELATCH_X Explanation:

When there is a contention of access of the in-memory pages, this wait type shows up. It is quite possible that some of the pages in the memory are of very high demand. For the SQL Server to access them and put a latch on the pages, it will have to wait. This wait type is usually created at the same time. Additionally, it is commonly visible when the TempDB has higher contention as well. If there are indexes that are heavily used, contention can be created as well, leading to this wait type.

Reducing PAGELATCH_X wait:

The following counters are useful to understand the status of the PAGELATCH:
  • Average Latch Wait Time (ms): The wait time for latch requests that have to wait.
  • Latch Waits/sec: This is the number of latch requests that could not be granted immediately.
  • Total Latch Wait Time (ms): This is the total latch wait time for latch requests in the last second.
If there is TempDB contention, I suggest that you read the blog post of Robert Davis right away. He has written an excellent blog post regarding how to find out TempDB contention. The same blog post explains the terms in the allocation of GAM, SGAM and PFS. If there was a TempDB contention, Paul Randal explains the optimal settings for the TempDB in his misconceptions series. Trace Flag 1118 can be useful but use it very carefully.
I totally understand that this blog post is not as clear as my other blog posts. I suggest if this wait stats is on one of your higher wait type. Do leave a comment or send me an email and I will get back to you with my solution for your situation. May the looking at all other wait stats and types together become effective as this wait type can help suggest proper bottleneck in your system.
There are many new Wait types related to Full Text Search that are introduced in SQL Server 2008.
If you run the following query, you will be able to find them in the list. Currently there is not enough information for all of them available on BOL or any other place. But don’t worry; I will write an in-depth article when I learn more about them.
SELECT *FROM sys.dm_os_wait_statsWHERE wait_type LIKE 'FT_%'
The result set will contain following rows.
FT_RESTART_CRAWL FT_METADATA_MUTEX FT_IFTSHC_MUTEX FT_IFTSISM_MUTEX FT_IFTS_RWLOCK FT_COMPROWSET_RWLOCK FT_MASTER_MERGE FT_IFTS_SCHEDULER_IDLE_WAIT
We have understood so far that there is not much information available. But the problem is when you have this Wait type, what should you do?  The answer is to filter them out for the moment (i.e, do not pay attention on them) and focus on other pressing issues in wait stats or performance tuning.
Here are two of my informal suggestions, which are totally independent from wait stats:
  • Turn off the Full Text Search service in your system if you are  not necessarily using it on your server.
  • Learn proper Full Text Search methodology.
Now I invite you to speak out your suggestions or any input regarding Full Text-related best practices and wait stats issue. Please leave a comment.
BACKUPIO Occurs when a backup task is waiting for data, or is waiting for a buffer in which to store data. This type is not typical, except when a task is waiting for a tape mount.
BACKUPBUFFER Occurs when a backup task is waiting for data, or is waiting for a buffer in which to store data. This type is not typical, except when a task is waiting for a tape mount.

BACKUPIO and BACKUPBUFFER Explanation:

This wait stats will occur when you are taking the backup on the tape or any other extremely slow backup system.

Reducing BACKUPIO and BACKUPBUFFER wait:

In my recent consultancy, backup on tape was very slow probably because the tape system was very old. During the time when I explained this wait type reason in the consultancy, the owners immediately decided to replace the tape drive with an alternate system. They had a small SAN enclosure not being used on side, which they decided to re-purpose. After a week, I had received an email from their DBA, saying that the wait stats have reduced drastically.
At another location, my client was using a third party tool (please don’t ask me the name of the tool) to take backup. This tool was compressing the backup along with taking backup. I have had a very good experience with this tool almost all the time except this one sparse experience. When I tried to take backup using the native SQL Server compressed backup, there was a very small value on this wait type and the backup was much faster. However, when I attempted with the third party backup tool, this value was very high again and was taking much more time. The third party tool had many other features but the client was not using these features.
We end up using the native SQL Server Compressed backup and it worked very well.
If I get to see this higher in my future consultancy, I will try to understand this wait type much more in detail and so probably I would able to come to some solid solution.
Locking is a mechanism used by the SQL Server Database Engine to synchronize access by multiple users to the same piece of data, at the same time. In simpler words, it maintains the integrity of data by protecting (or preventing) access to the database object.

From Book On-Line:

LCK_M_BU Occurs when a task is waiting to acquire a Bulk Update (BU) lock.
LCK_M_IS Occurs when a task is waiting to acquire an Intent Shared (IS) lock.
LCK_M_IU Occurs when a task is waiting to acquire an Intent Update (IU) lock.
LCK_M_IX Occurs when a task is waiting to acquire an Intent Exclusive (IX) lock.
LCK_M_S Occurs when a task is waiting to acquire a Shared lock.
LCK_M_SCH_M Occurs when a task is waiting to acquire a Schema Modify lock.
LCK_M_SCH_S Occurs when a task is waiting to acquire a Schema Share lock.
LCK_M_SIU Occurs when a task is waiting to acquire a Shared With Intent Update lock.
LCK_M_SIX Occurs when a task is waiting to acquire a Shared With Intent Exclusive lock.
LCK_M_U Occurs when a task is waiting to acquire an Update lock.
LCK_M_UIX Occurs when a task is waiting to acquire an Update With Intent Exclusive lock.
LCK_M_X Occurs when a task is waiting to acquire an Exclusive lock.

LCK_M_XXX Explanation:

I think the explanation of this wait type is the simplest. When any task is waiting to acquire lock on any resource, this particular wait type occurs. The common reason for the task to be waiting to put lock on the resource is that the resource is already locked and some other operations may be going on within it. This wait also indicates that resources are not available or are occupied at the moment due to some reasons. There is a good chance that the waiting queries start to time out if this wait type is very high. Client application may degrade the performance as well.
You can use various methods to find blocking queries:
  • EXEC sp_who2
  • SELECTdb.name DBName,tl.request_session_id,wt.blocking_session_id,OBJECT_NAME(p.OBJECT_IDBlockedObjectName,tl.resource_type,h1.TEXT AS RequestingText,h2.TEXT AS BlockingTest,tl.request_modeFROM sys.dm_tran_locks AS tlINNER JOIN sys.databases db ON db.database_id =tl.resource_database_idINNER JOIN sys.dm_os_waiting_tasks AS wt ON tl.lock_owner_address =wt.resource_addressINNER JOIN sys.partitions AS ON p.hobt_id =tl.resource_associated_entity_idINNER JOIN sys.dm_exec_connections ec1 ON ec1.session_id =tl.request_session_idINNER JOIN sys.dm_exec_connections ec2 ON ec2.session_id =wt.blocking_session_idCROSS APPLY sys.dm_exec_sql_text(ec1.most_recent_sql_handleAS h1CROSS APPLY sys.dm_exec_sql_text(ec2.most_recent_sql_handleAS h2 GO
  • DMV – sys.dm_tran_locks
  • DMV – sys.dm_os_waiting_tasks

Reducing LCK_M_XXX wait:

  • Check the Explicit Transactions. If transactions are very long, this wait type can start building up because of other waiting transactions. Keep the transactions small.
  • Serialization Isolation can build up this wait type. If that is an acceptable isolation for your business, this wait type may be natural. The default isolation of SQL Server is ‘Read Committed’.
  • One of my clients has changed their isolation to “Read Uncommitted”. I strongly discourage the use of this because this will probably lead to having lots of dirty data in the database.
  • Identify blocking queries mentioned using various methods described above, and then optimize them.
  • Partition can be one of the options to consider because this will allow transactions to execute concurrently on different partitions.
  • If there are runaway queries, use timeout. (Please discuss this solution with your database architect first as timeout can work against you).
  • Check if there is no memory and IO-related issue using the following counters:
Checking Memory Related Perfmon Counters
    • SQLServer: Memory Manager\Memory Grants Pending (Consistent higher value than 0-2)
    • SQLServer: Memory Manager\Memory Grants Outstanding (Consistent higher value, Benchmark)
    • SQLServer: Buffer Manager\Buffer Hit Cache Ratio (Higher is better, greater than 90% for usually smooth running system)
    • SQLServer: Buffer Manager\Page Life Expectancy (Consistent lower value than 300 seconds)
    • Memory: Available Mbytes (Information only)
    • Memory: Page Faults/sec (Benchmark only)
    • Memory: Pages/sec (Benchmark only)
  • Checking Disk Related Perfmon Counters
    • Average Disk sec/Read (Consistent higher value than 4-8 millisecond is not good)
    • Average Disk sec/Write (Consistent higher value than 4-8 millisecond is not good)
    • Average Disk Read/Write Queue Length (Consistent higher value than benchmark is not good)
WRITELOG Occurs while waiting for a log flush to complete. Common operations that cause log flushes are checkpoints and transaction commits.

WRITELOG Explanation:

This wait type is usually seen in the heavy transactional database. When data is modified, it is written both on the log cache and buffer cache. This wait type occurs when data in the log cache is flushing to the disk. During this time, the session has to wait due to WRITELOG.
I have recently seen this wait type’s persistence at my client’s place, where one of the long-running transactions was stopped by the user causing it to roll back. In the future, I will see if I could re-create this situation once again on my machine to validate the relation.

Reducing WRITELOG wait:

There are several suggestions to reduce this wait stats:
  • Move Transaction Log to Separate Disk from mdf and other files.
  • Avoid cursor-like coding methodology and frequent committing of statements.
  • Find the most active file based on IO stall time .
  • You can also use fn_virtualfilestats to find IO-related issues using the script mentioned over here. SELECT DB_NAME(vfs.DbIdDatabaseNamemf.name,mf.physical_namevfs.BytesReadvfs.BytesWritten,vfs.IoStallMSvfs.IoStallReadMSvfs.IoStallWriteMS,vfs.NumberReadsvfs.NumberWrites, (Size*8)/1024 Size_MBFROM ::fn_virtualfilestats(NULL,NULL) vfsINNER JOIN sys.master_files mf ON mf.database_id vfs.DbIdAND mf.FILE_ID vfs.FileId GO
  • Check the IO-related counters (PhysicalDisk:Avg.Disk Queue Length, PhysicalDisk:Disk Read Bytes/sec and PhysicalDisk :Disk Write Bytes/sec) for additional details. Read about them over here .Processor:% Processor Time 
    It reports the total processor time with respect to the available capacity of the server. If counter is between 50 to 70 % consistently, investigate the process which is taking long time.
    PhysicalDisk:Avg.Disk Queue Length It indicates wait time for processes to use disk resources. As a disk is reading and writing data some requests cannot be immediately filled, those requests are queued. If many simultaneous requests are waiting, investigate the process which is taking long time.
    PhysicalDisk:Disk Read Bytes/sec and PhysicalDisk :Disk Write Bytes/sec It report the number of bytes read from and written to the disk, respectively. Slow SELECT queries with high physical reads and low queue lengths demonstrates under performance of Disk read and write. Index optimization can reduce this problem.
LOGBUFFER Occurs when a task is waiting for space in the log buffer to store a log record. Consistently high values may indicate that the log devices cannot keep up with the amount of log being generated by the server.

LOGBUFFER Explanation:

The book online definition of the LOGBUFFER seems to be very accurate. On the system where I faced this wait type, the log file (LDF) was put on the local disk, and the data files (MDF, NDF) were put on SanDrives. My client then was not familiar about how the file distribution was supposed to be. Once we moved the LDF to a faster drive, this wait type disappeared.

Reducing LOGBUFFER wait:

There are several suggestions to reduce this wait stats:
  • Move Transaction Log to Separate Disk from mdf and other files. (Make sure your drive where your LDF is has no IO bottleneck issues).
  • Avoid cursor-like coding methodology and frequent commit statements.
  • Find the most-active file based on IO stall time.
  • You can also use fn_virtualfilestats to find IO-related issues using the script mentioned in previous section.
  • Check the IO-related counters (PhysicalDisk:Avg.Disk Queue Length, PhysicalDisk:Disk Read Bytes/sec and PhysicalDisk :Disk Write Bytes/sec) for additional details. Read about them previous section.
If you have noticed, my suggestions for reducing the LOGBUFFER is very similar to WRITELOG. Although the procedures on reducing them are alike, I am not suggesting that LOGBUFFER and WRITELOG are same wait types. From the definition of the two, you will find their difference. However, they are both related to LOG and both of them can severely degrade the performance.
what PREEMPTIVE and Non-PREEMPTIVE waits in SQL Server mean.
PREEMPTIVE: Simply put, this wait means non-cooperative. While SQL Server is executing a task, the Operating System (OS) interrupts it. This leads to SQL Server to involuntarily give up the execution for other higher priority tasks. This is not good for SQL Server as it is a particular external process which makes SQL Server to yield. This kind of wait can reduce the performance drastically and needs to be investigated properly.
Non-PREEMPTIVE: In simple terms, this wait means cooperative. SQL Server manages the scheduling of the threads. When SQL Server manages the scheduling instead of the OS, it makes sure its own priority. In this case, SQL Server decides the priority and one thread yields to another thread voluntarily.
In the earlier version of SQL Server, there was no preemptive wait types mentioned and the associated task status with them was marked as suspended. In SQL Server 2005, preemptive wait types were not listed as well, but their associated task status was marked as running. In SQL Server 2008, preemptive wait types are properly listed and their associated task status is also marked as running.
Now, SQL Server is in Non-Preemptive mode by default and it works fine. When CLR, extended Stored Procedures and other external components run, they run in Preemptive mode, leading to the creation of these wait types.
There are a wide variety of preemptive wait types. If you see consistent high value in the Preemptive wait types, I strongly suggest that you look into the wait type and try to know the root cause.
MSQL_XP occurs when a task is waiting for an extended stored procedure to end. SQL Server uses this wait state to detect potential MARS application deadlocks. The wait stops when the extended stored procedure call ends.

MSQL_XP Explanation:

This wait type is created because of the extended stored procedure. Extended Stored Procedures are executed within SQL Server; however, SQL Server has no control over them. Unless you know what the code for the extended stored procedure is and what it is doing, it is impossible to understand why this wait type is coming up.

Reducing MSQL_XP wait:

As discussed, it is hard to understand the Extended Stored Procedure if the code for it is not available. In the scenario described at the beginning of this post, our client was using third-party backup tool. The third-party backup tool was using Extended Stored Procedure. After we learned that this wait type was coming from the extended stored procedure of the backup tool they were using, we contacted the tech team of its vendor. The vendor admitted that the code was not optimal at some places, and within that day they had provided the patch. Once the updated version was installed, the issue on this wait type disappeared. As viewed in the wait statistics of all the 100+ SQL Server, there was no more MSSQL_XP wait type found.
In simpler terms, you must first identify which Extended Stored Procedure is creating the wait type of MSSQL_XP and see if you can get in touch with the creator of the SP so you can help them optimize the code.
If you have encountered this MSSQL_XP wait type, I encourage all of you to write how you managed it. Please do not mention the name of the vendor in your comment as I will not approve it. The focus of this blog post is to understand the wait types; not talk about others.
SQL Server surroundings look like this:
This means that SQL Server, aside from its internal waits, also depends on external waits and settings.
As we can see in the picture above, SQL Server needs to have an interface in order to communicate with the surrounding clients over the network. For this communication, SQL Server uses protocol interfaces. I will not go into detail about which protocols are best, but you can read this article. Also, review the information about the TDS (Tabular data stream).
As we all know, our system is only as fast as its slowest component. This means that when we look at our environment as a whole, the SQL Server might be a victim of external pressure, no matter how well we have tuned our database server performance.
Let’s dive into an example: let’s say that we have a web server, hosting a web application which is using data from our SQL Server, hosted on another server. The network card of the web server for some reason is malfunctioning (think of a hardware failure, driver failure, or just improper setup) and does not send/receive data faster than 10Mbs. On the other end, our SQL Server will not be able to send/receive data at a faster rate either. This means that the application users will notify the support team and will say: “My data is coming very slow.”
Now, let’s move on to a bit more exciting example: imagine that there is a similar setup as the example above – one web server and one database server, and the application is not using any stored procedure calls, but instead for every user request the application is sending 80kb query over the network to the SQL Server. (I really thought this does not happen in real life until I saw it one day.) So, what happens in this case? To make things worse, let’s say that the 80kb query text is submitted from the application to the SQL Server at least 100 times per minute, and as often as 300 times per minute in peak times.
Here is what happens: in order for this query to reach the SQL Server, it will have to be broken into a of number network packets (according to the packet size settings) – and will travel over the network. On the other side, our SQL Server network card will receive the packets, will pass them to our network layer, the packets will get assembled, and eventually SQL Server will start processing the query – parsing, allegorizing, generating the query execution plan and so on. So far, we have already had a serious network overhead by waiting for the packets to reach our Database Engine. There will certainly be some processing overhead – until the database engine deals with the 80kb query and its 20 subqueries. The waits you see in the DMVs are actually collected from the point the query reaches the SQL Server and the packets are assembled.
Let’s say that our query is processed and it finally returns 15000 rows. These rows have a certain size as well, depending on the data types returned. This means that the data will have converted to packages (depending on the network size package settings) and will have to reach the application server. There will also be waits, however, this time you will be able to see a wait type in the DMVs called ASYNC_NETWORK_IO. What this wait type indicates is that the client is not consuming the data fast enough and the network buffers are filling up.
Client Statistics is very important. Many a time, people relate queries execution plan with query cost. This is not a good comparison. Both are different parameters, and they are not always related. It is possible that the query cost of any statement is less, but the amount of the data returned is considerably large, which is causing any query to run slow. How do we know if any query is retrieving a large amount data or very little data? In one way, it is quite easy to figure this out by just looking at the result set; however, this method cannot be relied upon all the time as it is difficult to reach a conclusion when there are many columns and many rows.
To measure how much data is retrieved from server to client side is very simple. SQL Server Management Studio has feature that can measure client statistics. There are three different ways to enable client statistics.
Method 1
Press SHIFT+ALT+S together.
Method 2
Go to Menu >> Query >> Include Client Statistics
Method 3
Once you enable the client statistics, it will display another tab in the result span; when clicked, we obtain the following:
As you see in the image, there are three categories: Query Profile Statistics, Network Statistics and Time Statistics.
Number of server roundtrips–a roundtrip consists of a request sent to the server and a reply from the server to the client. For example, if your query has three select statements, and they are separated by ‘GO’ command, then there will be three different roundtrips.
TDS Packets sent from the client – TDS (tabular data stream) is the language which SQL Server speaks, and in order for applications to communicate with SQL Server, they need to pack the requests in TDS packets. TDS Packets sent from the client is the number of packets sent from the client; in case the request is large, then it may need more buffers, and eventually might even need more server roundtrips.
TDS packets received from server –is the TDS packets sent by the server to the client during the query execution.
Bytes sent from client – is the volume of the data set to our SQL Server, measured in bytes; i.e. how big of a query we have sent to the SQL Server. This is why it is best to use stored procedures, since the reusable code (which already exists as an object in the SQL Server) will only be called as a name of procedure + parameters, and this will minimize the network pressure.
Bytes received from server – is the amount of data the SQL Server has sent to the client, measured in bytes. Depending on the number of rows and the datatypes involved, this number will vary. But still, think about the network load when you request data from SQL Server.
Client processing time – is the amount of time spent in milliseconds between the first received response packet and the last received response packet by the client.
Wait time on server replies – is the time in milliseconds between the last request packet which left the client and the first response packet which came back from the server to the client.
Total execution time – is the sum of client processing time and wait time on server replies (the SQL Server internal processing time)
Here is an illustration of the Client-server communication model which should help you understand the mutual waits in a client-server environment.
Keep in mind that a query with a large ‘wait time on server replies’ means the server took a long time to produce the very first row. This is usual on queries that have operators that need the entire sub-query to evaluate before they proceed (for example, sort and top operators).
However, a query with a very short ‘wait time on server replies’ means that the query was able to return the first row fast. However a long ‘client processing time’ does not necessarily imply the client spent a lot of time processing and the server was blocked waiting on the client. It can simply mean that the server continued to return rows from the result and this is how long it took until the very last row was returned.
The bottom line is that developers and DBAs should work together and think carefully of the resource utilization in the client-server environment. From experience I can say that so far I have seen only cases when the application developers and the Database developers are on their own and do not ask questions about the other party’s world.
I would recommend using the Client Statistics tool during new development to track the performance of the queries, and also to find a synchronous way of utilizing resources between the client – server – client.
Here is another example: think about similar setup as above, but add another server to the game. Let’s say that we keep our media on a separate server, and together with the data from our SQL Server we need to display some images on the webpage requested by our user. No matter how simple or complicated the logic to get the images is, if the images are 500kb each our users will get the page slowly and they will still think that there is something wrong with our data.
And finally, here are some guidelines for monitoring the network performance and improving it:
  • Run a trace and outline all queries that return more than 1000 rows (in Profiler you can actually filter and sort the captured trace by number of returned rows). This is not a set number; it is more of a guideline. The general thought is that no application user can consume that many rows at once. Ask yourself and your fellow-developers: ‘why?’.
  • Monitor your network counters in Perfmon: Network Interface:Output queue length, Redirector:Network errors/sec, TCPv4: Segments retransmitted/sec and so on.
  • Make sure to establish a good friendship with your network administrator (buy them coffee, for example J ) and get into a conversation about the network settings. Have them explain to you how the network cards are setup – are they standalone, are they ‘teamed’, what are the settings – full duplex and so on.
  • Find some time to read a bit about networking.

What is FILESTREAM?

FILESTREAM is a new feature introduced in SQL Server 2008 which provides an efficient storage and management option for BLOB data.
Many applications that deal with BLOB data today stores them in the file system and stores the path to the file in the relational tables. Storing BLOB data in the file system is more efficient that storing them in the database. However, this brings up a few disadvantages as well. When the BLOB data is stored in the file system, it is hard to ensure transactional consistency between the file system data and relational data.
Some applications store the BLOB data within the database to overcome the limitations mentioned earlier. This approach ensures transactional consistency between the relational data and BLOB data, but is very bad in terms of performance.
FILESTREAM combines the benefits of both approaches mentioned above without the disadvantages we examined. FILESTREAM stores the BLOB data in the file system (thus takes advantage of the IO Streaming capabilities of NTFS) and ensures transactional consistency between the BLOB data in the file system and the relational data in the database.

FILESTREAM Wait Types

Since this series is on the different SQL Server wait types, let us take a look at the various wait types that are related to the FILESTREAM feature.
FS_FC_RWLOCK
This wait type is generated by FILESTREAM Garbage Collector. This occurs when Garbage collection is disabled prior to a backup/restore operation or when a garbage collection cycle is being executed.
FS_GARBAGE_COLLECTOR_SHUTDOWN
This wait type occurs when during the cleanup process of a garbage collection cycle. It indicates that that garbage collector is waiting for the cleanup tasks to be completed.
FS_HEADER_RWLOCK
This wait type indicates that the process is waiting for obtaining access to the FILESTREAM header file for read or write operation. The FILESTREAM header is a disk file located in the FILESTREAM data container and is named “filestream.hdr”.
FS_LOGTRUNC_RWLOCK
This wait type indicates that the process is trying to perform a FILESTREAM log truncation related operation. It can be either a log truncate operation or to disable log truncation prior to a backup or restore operation.
FSA_FORCE_OWN_XACT
This wait type occurs when a FILESTREAM file I/O operation needs to bind to the associated transaction, but the transaction is currently owned by another session.
FSAGENT
This wait type occurs when a FILESTREAM file I/O operation is waiting for a FILESTREAM agent resource that is being used by another file I/O operation.
FSTR_CONFIG_MUTEX
This wait type occurs when there is a wait for another FILESTREAM feature reconfiguration to be completed.
FSTR_CONFIG_RWLOCK
This wait type occurs when there is a wait to serialize access to the FILESTREAM configuration parameters.

Waits and Performance

System waits has got a direct relationship with the overall performance. In most cases, when waits increase the performance degrades. SQL Server documentation does not say much about how we can reduce these waits. However, following the FILESTREAM best practices will help you to improve the overall performance and reduce the wait types to a good extend.
OLEDB occurs when SQL Server calls the SQL Server Native Client OLE DB Provider. This wait type is not used for synchronization. Instead, it indicates the duration of calls to the OLE DB provider.
OLEDB Explanation:
This wait type primarily happens when Link Server or Remove Query has been executed. The most common case wherein this wait type is visible is during the execution of Linked Server. When SQL Server is retrieving data from the remote server, it uses OLEDB API to retrieve the data. It is possible that the remote system is not quick enough or the connection between them is not fast enough, leading SQL Server to wait for the result’s return from the remote (or external) server. This is the time OLEDB wait type occurs.
Reducing OLEDB wait:
  • Check the Link Server configuration.
  • Checking Disk-Related Perfmon Counters
    • Average Disk sec/Read (Consistent higher value than 4-8 millisecond is not good)
    • Average Disk sec/Write (Consistent higher value than 4-8 millisecond is not good)
    • Average Disk Read/Write Queue Length (Consistent higher value than benchmark is not good)
At this point in time, I am not able to think of any more ways on reducing this wait type. Do you have any opinion about this subject? Please share it here and I will share your comment with the rest of the Community, and of course, with due credit unto you.
In SQL Server, you can run the following command to get a list of all the wait types:
DBCC SQLPERF(waitstats)
The query above will work in SQL Server 2005/2008/R2  because of backup compatibility. As you might have noticed, I have been discussing everything keeping SQL Server 2005+ in mind, but I have given little consideration on SQL Server 2000. However, I am pretty sure that most of the suggestions I have provided are applicable to SQL Server 2000. The wait types I have been discussing mostly exist in SQL Server 2000 as well. But the difference of the 2000 version is that it gets late recent releases, but it is worth it.
Wait types are very essential to measure performance bottleneck. Because of this, I do not have to state that I am big fan of them just so I could identify performance bottleneck.
The spreadsheet has the following format:
Type – the name of the wait type as found in sys.dm_os_wait_stats
Area – This describes what part of the code the wait is used such as I/O, SQLOS, Network, Memory, Broker, Buffer, Query, …. Think of this as a keyword you can search on to find out wait types associated with a particular topic.
Usage – this is a Description of how the wait type is used within the engine. In other words why does the wait type exist. The possible values are:
Resource: Waiting for a resource to become available or complete such as I/O, Network, Thread, Memory
Sync: Waiting for synchronization to access a resource or code such as locks, latches, or other memory structures
Forced: Waiting is forced by the code such as Yielding or Sleep
External: These are waits that indicate code is running in Preemptive Mode
Background: Waits used mainly by background tasks to indicate they are waiting for work to do
Ignore: Waits that we could probably remove from the DMV as they are not even used in the code
Version – SQL Server 2005, SQL Server 2008
Description – A description of what this wait type means, where it is used in the code, and how you might encounter it
Action – Suggestion on possible actions you should take if you see this wait type show up as a possible “problem”
Wait_TypeAreaUsageVersionDescriptionAction
ASYNC_IO_COMPLETIONI/OResourceSQL 2005;SQL 2008Used to indicate a worker is waiting on a asynchronous I/O operation to complete not associated with database pagesSince this is used for various reason you need to find out what query or task is associated with the wait. Two examples of where this wait type is used is to create files associated with a CREATE DATABASE and for "zeroing" out a transaction log file during log creation or growth.
CHECKPOINT_QUEUEBufferBackgroundSQL 2005;SQL 2008Used by background worker that waits on events on queue to process checkpoint requests. This is an "optional" wait type see Important Notes section in blogYou should be able to safely ignore this one as it is just indicates the checkpoint background worker is waiting for work to do. I suppose if you thought you had issues with checkpoints not working or log truncation you might see if this worker ever "wakes up". Expect higher wait times as this will only wake up when work to do
CHKPTBufferBackgroundSQL 2005; SQL 2008Used to coordinate the checkpoint background worker thread with recovery of master so checkpoint won't start accepting queue requests until master onlineYou should be able to safely ignore. You should see 1 wait of this type for the server unless the checkpoint worker crashed and had to be restarted.. If though this is technically a "sync" type of event I left its usage as Background
CXPACKETQuerySyncSQL 2005;SQL 2008Used to synchronize threads involved in a parallel query. This wait type only means a  parallel query is executing.You may not need to take any action. If you see high wait times then it means you have a long running parallel query. I would first identify the query and determine if you need to tune it. Note sys.dm_exec_requests only shows the wait type of the request even if multiple tasks have different wait types. When you see CXPACKET here look at all tasks associated with the request. Find the task that doesn't have this wait_type and see its status. It may be waiting on something else slowing down the query. wait_resource also has interesting details about the tasks and its parallel query operator
DISKIO_SUSPENDBACKUPSyncSQL 2005; SQL 2008Used to indicate a worker is waiting to process I/O for a database or log file associated with a SNAPSHOT BACKUPHigh wait times here indicate the SNAPSHOT BACKUP may be taking longer than expected. Typically the delay is within the VDI application perform the snapshot backup.
FT_IFTS_SCHEDULER_IDLE_WAITFull-TextBackgroundSQL 2008Used by a background task processing full-text search requests indicating it is “waiting for work to do:”You should be able to safely ignore unless some unexplained FTS issue. High wait times are normal
IO_COMPLETIONI/OResourceSQL 2005; SQL 2008Used to indicate a wait for I/O for operation (typically synchronous)  like sorts and various situations where the engine needs to do a synchronous I/OIf wait times are high then you have a disk I/O bottleneck. The problem will be determining what type of operation and where the bottleneck exists. For sorts, it is on the storage system associated with tempdb. Note that database page I/O does not use this wait type. Instead look at PAGEIOLATCH waits.
KSOURCE_WAKEUPShutdownBackgroundSQL 2005;SQL 2008Used by the background worker "signal handler" which waits for a signal to shutdown SQL ServerYou should able to safely ignore this wait. You should only see one instance of this wait but in SQL Server 2008 what will be unusual is the wait time will show up as 0 in sys.dm_os_wait_stats. Other DMVs like sys.dm_exec_requests will show the SIGNAL_HANDLER with a high wait time of this type.
LAZYWRITER_SLEEPBufferBackgroundSQL 2005;SQL 2008Used by the Lazywriter background worker to indicate it is sleeping waiting to wake up and check for work to doYou should be able to safely ignore this one. The wait times will appear to "cycle" as LazyWriter is designed to sleep and wake-up every 1 second. Appears as LZW_SLEEP in Xevent
LOGBUFFERTransaction LogResourceSQL 2005; SQL 2008Used to indicate a worker thread is waiting for a log buffer to write log blocks for a transactionThis is typically a symptom of I/O bottlenecks because other workers waiting on WRITELOG will hold on to log blocks. Look for WRITERLOG waiters and if found the overall problem is I/O bottleneck on the storage system associated with the transaction log
LOGMGR_QUEUETransaction LogBackgroundSQL 2005; SQL 2008Used by the background worker "Log Writer" to wait on a queue for requests to flush log blocks to the transaction log. This is an "optional" wait type see Important Notes section in blogYou should be able to safely ignore this wait type unless you believe a problem exists in processing log blocks to flush to the transaction log. This wait type is not a wait indicating I/O bottlenecks. It is only for waiting for other workers to request log block flushes. Note that on SQL Server 2005 this wait type will not show up in sys.dm_exec_requests because the Log Writer task does not show up there.
MISCELLANEOUSIgnoreIgnoreSQL 2005;SQL 2008This really should be called "Not Waiting".This may have been used in SQL 2000 but for 2005/2008, it is not used for any valid wait. It is simply the default wait in a list and isn't used to indicate any real waiting. This type shows up twice in sys.dm_os_wait_stats in SQL 2008 but the "other" instance is an older unused wait type in the code. We should be able to remove it.
PREEMPTIVE_XXXVariesExternalSQL 2008Used to indicate a worker is running coded that is not under the SQLOS Scheduling SystemsI will specific PREEMPTIVE_XX wait types or groups of them in 2010. Be sure to read the Important Notes section for bug where this wait type is being over counted by the engine in some situations. Note also that when you see this wait_type in sys.dm_exec_requests the status of the request is RUNNING not SUSPENDED. This is because the engine doesn't really know if the thread is waiting or running "external" code.
REQUEST_FOR_DEADLOCK_SEARCHLockBackgroundSQL 2008Used by background worker "Lock Monitor" to search for deadlocks.  This is an "optional" wait type see Important Notes section in blogYou should be able to safely ignore this one as it is just and indication the lock monitor thread is temporarily sleeping before it wakes up to do work. This wait type should never exceed 5 seconds in one "wait" as this is the interval the lock monitor wakes up to check for deadlocks
RESOURCE_QUERY_SEMAPHORE_COMPILEQueryResourceSQL 2005; SQL 2008Used to indicate a worker is waiting to compile a query due to too many other concurrent query compilations that require "not small" amounts of memory.This is a very complicated problem to explain. The problem is more than just concurrent compilations. It is the amount of memory required by the compilations. Typically this problem is not seen on 64bit systems. The biggest thing you can do is find out why you have so many compilations. Furthermore, a high amount of "query memory" can result in less memory available for compilations so check what other users are consuming high query memory. 
RESOURCE_SEMAPHOREQueryResourceSQL 2005; SQL 2008Used to indicate a worker is waiting to be allowed to perform an operation requiring "query memory" such as hashes and sortsHigh wait times indicate too many queries are running concurrently that require query memory. Operations requiring query memory are hashes and sorts. Use DMVs such as dm_exec_query_resource_semaphores and dm_exec_query_memory_grants
SOS_SCHEDULER_YIELDSQLOSForcedSQL 2005;SQL 2008Used to indicate a worker has yielded to let other workers run on a schedulerThis wait is simply an indication that a worker yielded for someone else to run. High wait counts with low wait times usually mean CPU bound queries. High wait times here could be non-yielding problems
SQLTRACE_BUFFER_FLUSH TraceBackgroundSQL 2005;SQL 2008Used by background workerYou should be able to safely ignore unless some unexplained problem with SQLTrace files not getting written to disk properly. 
THREADPOOLSQLOSResourceSQL 2005; SQL 2008Indicates a wait for a  task to be assigned to a worker threadLook for symptoms of high blocking or contention problems with many of the workers especially if the wait count and times are high. Don't jump to increase max worker threads especially if you use default setting of 0. This wait type will not show up in sys.dm_exec_requests because it only occurs when the task is waiting on a worker thread. You must have a worker to become a request. Furthermore, you may not see this "live" since there may be no workers to process tasks for logins or for queries to look at DMVs.
WRITELOGI/OSyncSQL 2005; SQL 2008Indicates a worker thread is waiting for LogWriter to flush log blocks. High waits and wait times indicate an I/O bottleneck on the storage system associated with the transaction log
XE_DISPATCHER_WAITXEventBackgroundSQL 2008Used by a background worker to handle queue requests to write out buffers for async targetsYou should be able to safely ignore this unless you believe a problem is occurring with processing of events for async targets. Since this works on a queue you can have bursts of high wait times especially when no XEvent sessions are active.
XE_TIMER_EVENTXEventBackgroundSQL 2008Used to indicate a background task is waiting for "expired" timers for internal Xevent engine workYou should be able to safely ignore this one. Just used by the Xevent engine for internal processing of its work. If something was possibly wrong with Xevent processing you might see if this thread ever "wakes up"

No comments:

Post a Comment