Monday, 28 January 2013

Database mirroring: avoiding ‘cannot obtain a LOCK resource’ problems

This is an interesting case that came up recently, and that I’ve seen a few times before, but which I haven’t seen explained anywhere.
Using database mirroring, you may see 1204 and 1454 errors on the mirror server with no other databases or activity on the mirror server apart from database mirroring:
2013-01-15 12:00:00.410 spid29s Error: 1204, Severity: 19, State: 4.
2013-01-15 12:00:00.410 spid29s The instance of the SQL Server Database Engine cannot obtain a LOCK resource at this time. Rerun your statement when there are fewer active users. Ask the database administrator to check the lock and memory configuration for this instance, or to check for long-running transactions.
2013-01-15 12:00:00.430 spid29s Error: 1454, Severity: 16, State: 1.
2013-01-15 12:00:00.430 spid29s Database mirroring will be suspended. Server instance 'instancename' encountered error 1204, state 4, severity 19 when it was acting as a mirroring partner for database 'dbname'. The database mirroring partners might try to recover automatically from the error and resume the mirroring session. For more information, view the error log for additional error messages.
With no activity on the mirror server apart from mirroring, why are locks being taken?
The answer is that locks must be acquired when a transaction is rolled back on the mirror and is just how logging and recovery work. I discussed this some in my post Lock logging and fast recovery back in 2009 – locks are acquired during rollback to ensure that other transactions don’t prevent rollback from completing. The behavior is the same when a mirrored transaction rolls back – the locks need to be acquired on the mirror before the rollback can commence.
You can easily see this for yourself. Create a mirroring session and then a constant insert workload of small transactions. If you look at the locks on the mirror server using sys.dm_tran_locks, you’ll see no locks being acquired in the mirror database. Now create a transaction on the principal server that inserts 10,000 records. Roll it back and quickly examine the locks on the mirror server – you’ll see thousands of locks being acquired in the mirror database.
Why can we see 1204 errors with database mirroring?
Each lock structure takes up a small amount of memory and the more locks that are acquired and held at any one time, the more memory is required for the lock structures. On a memory-constrained system it’s possible to run out of lock memory and result in error 1204.
With database mirroring, if the principal server has a lot more memory than the mirror server, and there are a lot of large rollbacks occurring, or there a multiple mirroring sessions where rollbacks are occurring, it’s possible that the mirror server will run out of lock memory, resulting in errors 1204 and 1454 as above.
How can we stop 1204 errors occurring?
The easiest fix for this (and to hopefully prevent it happening) is to ensure that the mirror server has the same amount of memory as the principal server – so the principal should not be able to create more locks than the mirror server can accommodate. Even then, you may still see these errors if there is a lot of rollback activity across mirrored databases and it all happens to be replayed on the mirror server at the same time – in which case you need more memory on the mirror server than on the principal.
You need to also ensure that nothing else is using memory on the mirror server – e.g. using database snapshots, or any other activity apart from just mirroring.
I hope this explanation helps prevent some of you out there wasting time trying to troubleshoot this issue.