Tuesday, 30 April 2013

huge log file

Just try this. 
select log_reuse_wait_desc,* from sys.databases where name='dbname'
-- see what it is waiting on

also run this dbcc sqlperf(logspace)
-- see log space utilization 
If your recovery model is FULL, follow the below steps
1. Take a full database backup
2. Shrink the log file to an appropriate size
3. Take a full database backup again
4. Schedule transaction log backups to avoid such issues in the future
This is a problem for two basic reasons. By default, new databases are created in full recovery mode AND by default, no one has set up log backups on your system. That means it’s up to you. You have to set up log backups. You have to set them and you have to schedule them and you have to ensure they run if you want to recover your database to a point in time, which is also known as, Full Recovery.
Though this would have solved the problem, I'm still interested in showing what t-logs are upto actually.
 What is the Transaction Log?
At its simplest, the transaction log is a record of all transactions run against a database and all database modifications made by those transactions. The transaction log is a critical part of the database’s architecture.

The transaction log is not an audit log. It’s not there so that the DBA can see who did what to the database. It’s also not a data recovery tool.  There are third-party tools that can get audit or data-recovery info from the log, but that is not its primary purpose.

The transaction log is predominantly used by the SQL engine to ensure database integrity, to allow transaction rollbacks and for database recovery. 
How does SQL use the log?
When changes are made to a database, whether it be in an explicit transaction or an auto-committed transaction, those changes are first written (hardened) to the log file and the data pages are changed in memory. Once the record of the changes is in the log, the transaction is considered complete. The data pages will be written to the data file on disk at a later time either by the lazy writer or by the checkpoint process.

If transactions are rolled back, either by an explicit ROLLBACK TRANSACTION, by an error if XACT_ABORT is on, or due to a loss of connection to the client, the transaction log is used to undo the modifications made by that transaction and leave the database as though the transaction had never occurred. In a similar way, the log is used to undo the effects of single statements that fail, whether in an explicit transaction or not.

When a server is restarted, SQL uses the transaction log to see if, at the point the server shut down there were any transactions that had completed but whose changes may not have been written to disk, or any transactions that had not completed. If there are, then the modifications that may not have been written to disk are replayed (rolled forward) and any that had not completed are rolled back. This is done to ensure that the database is in a consistent state after a restart and that any transactions that had committed remain a part of the permanent database state (the Durability requirement of ACID)

Lastly, backups made of the transaction log can be used to recover a database to a point-in-time in case of a failure.

The transaction log is also used to support replication, database mirroring and change data capture. I won’t be going into how they affect the log here.
Simple Recovery Model
In the simple recovery model, the transaction log entries are kept only to allow for transaction rollbacks and crash recovery, not for the purpose of restoring a database. Once the data pages have been written to disk and the transaction is complete; then, in the absence of replication or other things that need the log, the log records are considered inactive and can be marked as reusable. This marking of portions of the log as reusable is done by the checkpoint process.

This is the simplest recovery mode in terms of log management as the log manages itself. The downside of simple recovery is that (because transaction log backups cannot be made) a restore of the database can only be done to the time of the latest full or differential database backup. With a busy database, this can result in unacceptable data loss.

Full Recovery model
In full recovery model transaction log entries cannot be overwritten until they have been backed up by a transaction log backup. Simply having the transaction committed and data pages written to disk is not enough

Full recovery can be more difficult to manage as the log can grow beyond what is expected if transaction log backups don’t occur, or if there’s an increase in the amount of database activity that occurs between log backups.

Without any log backups running (an all-too common occurrence seeing that new databases will default to full recovery model unless the recovery model of the Model database has been changed), the transaction log will grow until it reaches its configured maximum file size (2TB unless otherwise specified) or until it fills the disk. No amount of full or differential backups will allow the log space to be reused as neither marks log space as reusable.

What can be even worse is that a database in full recovery model does not behave like this from the moment created. When created, a database in full recovery model will behave in a manner sometimes called pseudo-simple recovery model. This occurs because no database backup has yet been taken, and a database backup is needed to start a log chain. While in pseudo-simple recovery, the database behaves as though it really is in simple recovery model, truncating the log (marking space as reusable) every time a checkpoint occurs. This state remains until the first full backup is taken of the database. That full backup starts the log chain and from that point on the log will no longer be marked reusable by the checkpoint process and, if there are no log backups, the log will begin to grow.

Because log records are not overwritten until they have been backed up, a database in full recovery mode can be recovered to any time using a combination of full, differential and log backups, assuming a starting full backup exists and none of the log backups since have been deleted.

Bulk-logged recovery model
Bulk-logged is very similar to full recovery, except that in bulk-logged, bulk operations are minimally logged.  When operations are minimally logged, much less information is written to the transaction log compared to when the operation is fully logged.

The advantage of bulk-logged recovery is that if there are bulk operations occurring, the impact those operations have on the transaction log is less than it would be if the database was in full recovery mode. However the transaction log backups may be much larger than the transaction log itself since the log backups include all data pages modified by bulk operations since the previous log backup.

I'm not going to discuss bulk-logged recovery model further than this in the current article. For the purposes of log management, bulk-logged recovery model can be treated much like full recovery.

Managing transaction logs

Managing your transaction log requires you to think about your recovery model, log backups, and various other details regarding the log files. 
Frequency of log backups
The frequency that log backups should be done is dependent on two considerations:

The maximum amount of data that can be lost in the case of a disaster
The size to which the log can grow.
The first consideration is by far the most important.

When in full recovery model a disaster that destroys or damages the database can be recovered from without data loss, providing the transaction log file is available. If the log file is not available then the best that can be done, assuming that the log backups are stored separately from the database files and are available, is a restore to the last log backup taken, losing all data after that point. From this it should be clear that the interval between log backups should be carefully chosen based on the RPO (recovery point objective) requirements for that database. If the mandate is that no more than 20 minutes of data can be lost, then scheduling log backups hourly is a risk, as a disaster can result in losing up to 60 minutes of data if the transaction log file is lost or damaged.

If (once the log backups have been setup and scheduled based on the database's RPO) the log is growing larger than is acceptable, then the log backup frequency can be increased in order to keep the size down. This is by no means guaranteed to have the desired effect as the log must be large enough to accommodate the largest single transaction run against the database (which is usually an index rebuild).
Number of log files
The answer to the question of how many log files a database should have is a simple one. One log file only. SQL uses log files sequentially, not in any form of parallel or Round-Robin mechanism. Hence, if there are multiple log files, SQL will only ever be writing to one at a time.

The one time where there may be a use for a second log file is when unusual database activity needs a total amount of log space larger than what is available on any particular drive or larger than the 2TB limit in the size of a log file. In these circumstances it may be necessary to create a second log file to increase the amount of available log space. This is purely about available log space, not about performance.
Shrinking the log
In general, the transaction log should not be shrunk. It certainly should never be shrunk on a regular basis in a job or maintenance plan.

The only time a log should be shrunk is if some abnormal database activity (or failed log backups) has resulted in the log growing far beyond the size it needs to be for the database activity. In this situation, the log can be shrunk as a once-off operation, reducing it back to the size that it was before the abnormal activity.

Shrinking the log on a regular basis will have just one effect - the log growing again once regular activity on the database requires the old size. Transaction log grow operations are not fast, they cannot take advantage of instant initialisation and hence the new portion of the log will always have to be zeroed out. The other effect of the log repeatedly growing is that unless the auto-grow setting has been carefully chosen, the growth of the log will result in log fragmentation - excessive VLFs that can degrade the performance of backups, restores, crash recovery, replication and anything else that reads the log.

Don't shrink the log regularly. Shrink only if something has blown the size of the log far beyond what it needs to be.
Log fragmentation and VLFs
Internally, the log is divided into sections called Virtual Log Files (VLF). A log will always contain at least 2 VLFs and will usually contain far more. When the log is truncated (checkpoint in simple recovery or log backup in full recovery), only entire VLFs can be marked reusable. SQL can't mark individual log records or log blocks as reusable. A single log record that's part of an open transaction or otherwise needed prevents the entire VLF from being marked as reusable.

When the log is created or the log grows the specified size results in a specific number of VLFs of specific sizes. More details on the exact algorithm can be found on Kimberly Tripp's blog. If the log was improperly sized initially and auto-grew to a huge size, the log can have vast numbers of VLFs (tens of thousands have been seen in production systems). The problem with excessive VLFs is that it can have a massive impact on operations that read the log. These include, but are not limited to, database and log backups, restores, crash recovery, transactional replication, change data capture.

There's no exact number of VLFs that are good or bad. If there are thousands, it's probably bad. Tens of thousands is definitely very bad. Hundreds?  That depends on the size of the log. The number of VLFs should be high enough that SQL doesn't have to keep huge portions of the log active but low enough that reading all the VLF headers doesn't take too long.

The fix for excessive VLFs is to shrink the log to 0 and then regrow it in reasonable chunks to its previous size. This obviously has to be done while the database is idle. The size of the grow increments determines how many VLFs will be in the new file, see Kimberly’s blog post on log throughput for details, as well as the previously mentioned blog post: http://www.sqlskills.com/blogs/kimberly/post/8-Steps-to-better-Transaction-Log-throughput.aspx.

The growth increment must not be an exact multiple of 4GB. There's a bug in SQL if that exact size is used. Any other size works as expected.

No comments:

Post a Comment