Thursday, 23 May 2013

Managing Backups on Multiple Servers

Central Management Server

If you read up on this, it talks about setting up groups of servers, necessary if you’re managing QA differently than production. It also has scripting against multiple servers, another great thing. Finally it lets you evaluate policies against multiple servers. Cool. It sounds perfect. Let’s explore.

Setting up the Central Management Server

Creating a Central Management Server (CMS) is pretty easy, though it does have to be on SQL Server 2008 or better. You also can’t make the CMS server one of your managed servers. Other than that, just pick a machine and go to the Registered Servers window in SSMS by clicking “View -> Registered servers” in the menus. Right there in the registered server tree on the left is a folder called “Central Management Servers” (and yes, you can set up more than one, or set up Management Servers for your Management Server, or…). Right-click on that folder and select “Register Central Management Server” from the context menu; and then supply it with the credentials for the server you want to use. That’s now, automagically, your CMS. There are some entries in the MSDB for the stuff you’re configuring. Now you need to add Server Groups.
Right-click on the CMS that you’ve registered and then select “New Server Group…” from the context menu. I’ve created two groups locally, one for all my regular machines and one for my virtuals. You could break it down into Development and Production, or whatever you need. By grouping the servers, it gives you the capability to treat them differently, to run different scripts against them and evaluate different policies in different ways for these various groups of servers. It’s a big management step. Plus, you can still run scripts against all the servers at once.
With the Server Groups in place, you just have to add servers to the groups. Right-click on the Server Group where you want to add a server, and select “New Server Registration… from the context menu. Type in the appropriate connection information and you’re good to go. Here’s my current set-up:
Current Server Setup

Working with Servers

Setup was nice ‘n’ easy, right? Now what can you do with it? Right now, it just looks like yet another list of servers that you have to manage, which is true, but, this list is available to other people. Anyone who connects up to the defined CMS will see the servers that have been assigned to it to manage. And, if they have the correct privileges on all the servers under management they can do what we’re about to do.

Central Policy Management

You’re using Policy Based Management (PBM), right? If not, I’d go work on setting that up now, as it’s a great piece of software. My favorite policies are actually backup-related (yes, I haven’t forgotten that the point of the article is managing backups: We’re getting there, calm down). When I set up backups through SQL Agent, I always put in methods that raise errors that will cause alerts to fire (either directly through SQL Agent or through third party monitoring solutions). But, I’ve noticed that all of them can fail under certain circumstances, so I’ve found that putting an additional check on top of the others is worthwhile doing. The additional check comes through PBM.
I’ve set up a simple check that will see whether the last backup was completed within the last 24 hours. It just validates that the daily backups I want done are being completed. I could script this out and run it against every server, or, I could just use CMS. Let’s assume I want to check my physical boxes, which are in the folder labeled ‘Red Court.’ If I right-click on that folder and select “Evaluate Policies” from the context menu, I get a new window that’s going to evaluate those policies that I decide to include against the servers in my list.
First, I have to select the policies that I want to evaluate, which means I can pick from files on my system, or I can pick from one of the servers that I manage. I chose the latter. Here’s the screen ready for the evaluation of my policy:
Evaluation of my policy
Now all I have to do is to click on the ‘Evaluate’ button and this policy will be run against all the servers that are currently under management, and bring back a full set of results. This will allow me to see how many, if any, of my databases have not been backed up within the last 24 hours. Let’s see what happens:
The evaluation results
Oh my. Most of the databases on the server GRANT-RED1\GFR1 have been backed up within the last 24 hours, but it looks like none of the databases on BOB\SKULL have been. You can actually check the evaluation results for each database on each server. Here’s one set of results from BOB\SKULL:
Results from BOB\SKULL
Ouch. In short, backups have never taken place on any of the databases on that server. Someone needs to get backups scheduled there as soon as possible. I wonder if we could use the CMS to make that happen.

Central Script Execution

I need to run backups on all my servers and on all the databases on those servers. CMS offers a mechanism to make this happen. Again, you can select a server, a list of servers, or all servers registered under CMS. Right-click in the appropriate place, and then select “New Query.” This will open a query window that looks a lot like another query window. But, if you look at the information bar at the bottom of the window you can see that you are not connected to a server:
Not connected to server
I’m connected up to my production server group, Red Court. Any query that I run there will run under the context listed, NEVERNEVER\Grant, against all the servers in my list. So if, for example, I wanted to run a backup against all the databases on all the servers on the list, I could do this:
DECLARE @BuScript NVARCHAR(MAX) ;
SET @BuScript 'IF ''?'' <> ''tempdb''
BEGIN
BACKUP DATABASE ? TO DISK = ''\\myshare\documents\bu\' 
@@SERVICENAME
    
'\?.BAK WITH INIT;''
END' 
;
EXEC sp_MSforeachdb @BuScript ;
Executing the script, the results look like this:
Script Results
The results are mixed because the query is running on both servers at the same time and the results come back as each backup process completes.
But, though it’s nice to quickly run these backups across multiple servers, I’m not going to sit down every day and run routine backups by hand. Here’s another option. If I have this script as a backup job that I want to run on all my servers, I could create a script that creates a SQL Agent job on each and every server like this:
IF NOT EXISTS ( SELECT  name
                
FROM    msdb.dbo.syscategories
                
WHERE   name N'[Uncategorized (Local)]'
                        
AND category_class )
    
BEGIN
        EXEC 
@ReturnCode msdb.dbo.sp_add_category @class N'JOB',
            
@type N'LOCAL'@name N'[Uncategorized (Local)]'
        
IF @@ERROR <> 0
             
OR @ReturnCode <> 0
           
)
            
GOTO QuitWithRollback

    
END

DECLARE 
@jobId BINARY(16)EXEC @ReturnCode msdb.dbo.sp_add_job @job_name N'Backup All Databases',
    
@enabled 1@notify_level_eventlog 0@notify_level_email 0,
    
@notify_level_netsend 0@notify_level_page 0@delete_level 0,
    
@description N'No description available.',
    
@category_name N'[Uncategorized (Local)]',
    
@owner_login_name N'NEVERNEVER\grant'@job_id @jobId OUTPUTIF @@ERROR <> 0
     
OR @ReturnCode <> 0
   
)
    
GOTO QuitWithRollback/****** Object:  Step [Database Backup Script]    Script Date: 12/05/2011 08:36:44 ******/EXEC @ReturnCode msdb.dbo.sp_add_jobstep @job_id @jobId,
    
@step_name N'Database Backup Script'@step_id 1,
    
@cmdexec_success_code 0@on_success_action 1@on_success_step_id 0,
    
@on_fail_action 2@on_fail_step_id 0@retry_attempts 0,
    
@retry_interval 0@os_run_priority 0@subsystem N'TSQL',
    
@command N'DECLARE @BuScript NVARCHAR(MAX) ;

SET @BuScript = ''IF ''''?'''' <> ''''tempdb''''
BEGIN
BACKUP DATABASE ? TO DISK = ''''\\192.168.1.80\documents\bu\'' + @@SERVICENAME
    + ''\?.BAK WITH INIT;''''
END'' ;

EXEC sp_MSforeachdb @BuScript ;'
@database_name N'master'@flags 0IF @@ERROR <> 0
     
OR @ReturnCode <> 0
   
)
    
GOTO QuitWithRollbackEXEC @ReturnCode msdb.dbo.sp_update_job @job_id @jobId@start_step_id 1IF @@ERROR <> 0
     
OR @ReturnCode <> 0
   
)
    
GOTO QuitWithRollbackEXEC @ReturnCode msdb.dbo.sp_add_jobschedule @job_id @jobId,
    
@name N'Weekly'@enabled 1@freq_type 8@freq_interval 1,
    
@freq_subday_type 1@freq_subday_interval 0,
    
@freq_relative_interval 0@freq_recurrence_factor 1,
    
@active_start_date 20111205@active_end_date 99991231,
    
@active_start_time 0@active_end_time 235959,
    
@schedule_uid N'a24f9a3a-1990-455f-9d4b-d1ab8437b7d5'IF @@ERROR <> 0
     
OR @ReturnCode <> 0
   
)
    
GOTO QuitWithRollbackEXEC @ReturnCode msdb.dbo.sp_add_jobserver @job_id @jobId,
    
@server_name N'(local)'IF @@ERROR <> 0
     
OR @ReturnCode <> 0
   
)
    
GOTO QuitWithRollbackCOMMIT TRANSACTION
GOTO 
EndSave
QuitWithRollback:
IF @@TRANCOUNT )
    
ROLLBACK TRANSACTIONEndSave:

GO
With this, I can set up backups throughout my enterprise, all from a single location. That’s great news. If I make modifications to my backup script, I can transmit it to all the servers, all at once. I’ve just eliminated the need to connect up to each server to set up backups.
There’s one issue with this though. I can set up and maintain my scripts across multiple servers, but I can’t get the results of all those Agent executions back to the CMS server to so that I know whether or not they’ve executed successfully.

Automation

Unless you bring in some outside agent to work, such as PowerShell, you can’t automate this final reporting task. Sorry, but there it is. Here’s a great way to hook into CMS from PowerShell. With that, you could set up jobs through SQL Agent that then call out to PowerShell scripts that take advantage of the CMS lists. But other than that, there’s nothing built-in.
While I can manually do a lot against multiple servers through CMS, I can’t automatically do enough unless I start programming with other languages and tools. What I can’t do is set up a SQL Agent job that runs through CMS to manage my servers bringing the results back to a common location. Or… can I?

Multi-Server Management

This is totally different from CMS. Instead of using something that is sort of hidden with Management Studio, you’re taking advantage of additional functionality built into SQL Agent. The concept is that you create Agent jobs and provide them with targets against multiple servers. The results all come back to a common server so you can see what ran and didn’t. You can even set up alerts from this server so you know when a job failed or had problems. Let’s see how it works.

Setting Up Multi-Server Management

Conceptually, this is very simple. You just right-click on “SQL Server Agent” in SSMSs Object explorer, use “Multi-Server Management” and then “Make this a Master” from the context menu. It’ll open a wizard that lets you pick the target servers and you’re off. In practice… well, not so much.
I was unlucky. In order to get my setup to work, right out of the gate I had to mess about in the registry. However, once that was done it was pretty simple.
If you right-click on the SQL Agent icon in the Object Explorer window, you’’ll spot a context menu “Multi Server Management.” Selecting this opens another context menu that will allow you to “Make this a Master…” or “Make this a Target…” For the central server that’s going to manage all the others, you make it a Master server. This opens a wizard that starts off with you picking the servers you want to manage as Targets:
Pick target servers
The beautiful thing is that it appears to work from the CMS. Well, it does and it doesn’t. You can see the CMS and you can use it to select servers but, once selected, they’re just managed within the Multi-Server Management interface. There are no longer any sets of folders or groups of servers that let you readily manage them as a set of servers. Instead, it’s just a way to quickly add lists of servers, which is not the same thing at all.
Anyway, the next step is to set up the security context. You have options here. You can set up a common login:
Master server login credentials
This is necessary if the SQL Agent on one or more of your servers is running with insufficient security settings to allow it access to the Master server. If it is, you can click this off:
Untick the check box
Then you click on the “Next” button and it shows you a nice summary page of what you’re about to do:
Summary page
Of course you’re going to click on the “Finish” button, and since this is a wizard, everything will work:
Getting errors
… or not. See, there’s another setting in the Registry that you have to change before the different agents will all communicate with each other. You can see it in the error for the Enlist statement. Fix this, and it all, finally, works.
Yeah, that was a pain.

Creating Jobs

This part of Multi-Server Management could not possibly be easier. You’re going to use SQL Agent. That means that all the stuff you’re used to with SQL Agent, Jobs, Schedules, Alerts, Operators, are in place. I won’t insult your intelligence by showing you how to set up an Agent Job. Just bear in mind that that the same types of “universal” coding practices you followed in the scripts that run from the CMS will apply here. Unless of course you have an X: drive on every server, then you’d be better off using UNC for your backup locations.
The only wrinkle is that you now have a Target for your job. Here’s what it looks like. The nice groupings that were available in CMS are not evident here:
Groupings unavailable
Once you create a Job and set the execution Schedule, it will start running against both servers. It’s that easy. You can monitor the jobs, but not edit them, from any of the target servers. You can monitor all servers from the master server.
When you open your Agent on the Master server, you’ll see that your Jobs folder now has two sub-folders, one for Local and one for Multi-Server jobs. This is where you can manage and maintain the jobs. If you take a look at the history of the jobs, you can see every server that the job was run against:
Seeing where the jobs ran
Overall, it’s very functional. But the fact that you lose the CMS groupings that enabled you to define sets of servers certainly hurts it. While you can set up different jobs for different groups of servers, you have to go in and identify those servers yourself, manually. You can’t just pick a group and have all servers automatically get selected. That’s a shortcoming.

Get Information About Your Backups

Retrieving backup information

We’re going to look at three different commands: LABELONLYHEADERONLYFILELISTONLY. These three read the information stored about the backup in the backup file and display it all for you.

LABELONLY

RESTORE LABELONLY FROM DISK = 'd:\bu\mm.bak' ;
The ouptut from my backup file is as follows:
MediaNameNULL
MediaSetIdC78024B4-4FD7-4141-98F4-685C8B127B14
FamilyCount1
FamilySequenceNumber1
MediaFamilyIdB61166AA-0000-0000-0000-000000000000
MediaSequenceNumber1
MediaLabelPresent0
MediaDescriptionNULL
SoftwareNameMicrosoft SQL Server
SoftwareVendorId4608
MediaDate2012-02-01 14:31:09.000
MirrorCount1
IsCompressed0

HEADERONLY

The amount of information you get when you read the entire header is fairly insane. You retrieve it the same way as you would retrieve the LABELONLY data:
RESTORE HEADERONLY FROM DISK = 'd:\bu\mm.bak' ;
This command outputs a lot of information, listed here:
BackupNameMovieManagement
BackupDescriptionDaily Full Backup
BackupType1
ExpirationDateNULL
Compressed0
Position1
DeviceType2
UserNameNEVERNEVER\Grant
ServerNameGRANT-RED1\GFR1
DatabaseNameMovieManagement
DatabaseVersion661
DatabaseCreationDate2011-04-19 13:24:09.000
BackupSize63001600
FirstLSN89000000041600037
LastLSN89000000043200001
CheckpointLSN89000000041600037
DatabaseBackupLSN89000000039100037
BackupStartDate2012-02-13 10:08:21.000
BackupFinishDate2012-02-13 10:08:22.000
SortOrder52
CodePage0
UnicodeLocaleId1033
UnicodeComparisonStyle196609
CompatibilityLevel100
SoftwareVendorId4608
SoftwareVersionMajor10
SoftwareVersionMinor50
SoftwareVersionBuild1797
MachineNameGRANT-RED1
Flags512
BindingID62BB8F51-371F-4F17-8F43-7644232F8932
RecoveryForkID872E823E-2130-4BF1-8B04-054B295F0E08
CollationSQL_Latin1_General_CP1_CI_AS
FamilyGUIDEEF9BC86-F81B-4CF0-A8E2-462C04E342E0
HasBulkLoggedData0
IsSnapshot0
IsReadOnly0
IsSingleUser0
HasBackupChecksums0
IsDamaged0
BeginsLogChain0
HasIncompleteMetaData0
IsForceOffline0
IsCopyOnly0
FirstRecoveryForkID872E823E-2130-4BF1-8B04-054B295F0E08
ForkPointLSNNULL
RecoveryModelSIMPLE
DifferentialBaseLSNNULL
DifferentialBaseGUIDNULL
BackupTypeDescriptionDatabase
BackupSetGUID56E75391-D781-4394-AE0A-979D85F7A0D9
CompressedBackupSize63001600


FILELISTONLY

Now you can find out lots of information about a backup, but do you have enough to run a restore operation based on the information you’ve collected? Short answer: nope. You still don’t have any clue about how the database itself was distributed, and you need that if you’re running a full restore or any kind of file/filegroup restore. If you’re replacing an existing database with a backup taken from that database, you won’t need this. But if you’re restoring to servers with different disk layouts, you just might want to know how many files you have and where they’re stored. To get that information, you need a RESTORE FILELIST ONLY, which you run like this:
RESTORE FILELISTONLY FROM DISK = 'd:\bu\mm.bak'
The results come back like this:
LogicalNameMovieManagementMovieManagement_log
PhysicalNameC:\Program Files\Microsoft SQL Server\MSSQL10_50.GFR1\MSSQL\
DATA\MovieManagement.mdf
C:\Program Files\Microsoft SQL Server\MSSQL10_50.GFR1\MSSQL\
DATA\MovieManagement_log.LDF
TypeDL
FileGroupNamePRIMARYNULL
Size631767043604480
MaxSize351843720806402199023255552
FileId12
CreateLSN00
DropLSN00
UniqueIdF50B75A6-C564-4E44-A117-D7AF48B637BFF1C86747-7D4F-4E80-9C28-50460B38C537
ReadOnlyLSN00
ReadWriteLSN00
BackupSizeInBytes624558080
SourceBlockSize512512
FileGroupId10
LogGroupGUIDNULLNULL
DifferentialBaseLSN890000000391000370
DifferentialBaseGUIDBB5D1D2C-75AD-4CD6-B20F-70CEBABC017600000000-0000-0000-0000-000000000000
IsReadOnly00
IsPresent11
TDEThumbprintNULLNULL




SQL SERVER instance not starting up

A number of problems can prevent the SQL Server service from starting-

  • Service account password incorrect or account locked or disabled
  • Corrupt or missing master database files
  • Corrupt or missing model database files
  • Unable to create tempDB
  • Unable to open the error log

  • this is the error thats faced initially-


    The first place you should start looking for the cause is the SQL Server error log.
    The SQL error log is just a text file on disk. If you're unsure of its location, check the startup parameters of the SQL Server service in SQL Server Configuration Manager.



    "C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\Log\"

    Incorrect password or service account locked or disabled

    I've seen this happen when the administrator has configured the SQL Server service account to require regular password changes, and also in cases where DBAs or developers know the SQL Server service account password and sometimes use it, often to avoid "inconvenient" security policies such as those forbidding developers from having sysadmin access to the production servers.
    If someone changes the service account password, but doesn't update the SQL Server service with the new password, then SQL Server will run fine until the next time we restart it, at which point we'll see the error messages in Figures 1 and 2.
    Additionally, if a user repeatedly enters the password incorrectly, or a service (SQL, or any other service that may be using the same account) repeatedly attempts to submit the old password, after a password change, then the account will eventually be locked. As before, this will have no effect on a running SQL Server but the next time that the service attempts to start, it will be unable to do so.
    This is one of the cases where looking at the SQL Server error log is fruitless. The latest log file (Errorlog) will contain only messages from before the restart and will have no indication of the problem. If I simulate this problem and look at the error log, the last messages in the log read:
    2012-05-24 19:15:06.66 spid7s SQL Server is terminating in response to a 'stop' request from Service Control Manager. This is an informational message only. No user action is required.
    2012-05-24 19:15:07.16 spid7s SQL Trace was stopped due to server shutdown. Trace ID = '1'. This is an informational message only; no user action is required.
    These messages simply record the last shutdown of SQL Server before the service failed to start, and there is nothing in that log to indicate any problem. However, the lack of error messages is, in itself, useful, as it eliminates many potential causes that would result in logged messages.
    The next place to look, to see if it is a security problem, is the Windows Event logs, specifically the System log (not the security log). If I go and look there (Administrative tools | Event Viewer), and filter for errors, I see the following:
    Useful errors from the Event Log
    Figure 4: Useful errors from the Event Log
    That's a clear indication that an incorrect password caused the problem! Other possible messages are:
    • Logon failure: account currently disabled.
    • Login failed: Account locked out.
    The solution to these problems is very simple; if the password is in error, change the password that the SQL Service is using to match the account's true password, or reset the account password, if it's unknown, and change all services that use it to have the correct password. If the account is disabled or locked, then enable or unlock it.
    To prevent problems such as these, I would recommend that you use very complex passwords for the SQL Server service account, and then don't set the passwords to expire. Nobody, especially not developers or administrators, and nothing other than the SQL Server service, should use the service account. I recommend, generally, that different SQL Server instances and services use different service accounts. If possible, set the SQL Server service account to disallow interactive logins.

    Master database files missing or inaccessible

    One of the first things that SQL Server needs to do in the startup process is to open and recover the masterdatabase, so that it can load the configuration settings and locate and open the other system databases and all the user databases. If SQL Server cannot find the files for the master database, for some reason, then it cannot start.
    If someone modifies the startup parameters incorrectly (forgetting a ";" between startup parameters is a common mistake), SQL Server could look in the wrong place for the master files. Alternatively, the location might be correct but inaccessible due, for example, to a drive failure, or the SQL Server service account not having permission to access the specified files or folders. In each case, the startup will fail.
    In such cases, the SQL Server error log will hold useful information. The last message in the error log will look something like one of these:
    Error 2(failed to retrieve text for this error. Reason: 15100) occurred while opening file 'C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\DATA\master.mdf' to obtain configuration information at startup. An invalid startup option might have caused the error. Verify your startup options, and correct or remove them if necessary.
    Or:
    FCB::Open failed: Could not open file C:\Program Files\Microsoft SQL Server\ MSSQL10.MSSQLSERVER\MSSQL \DATA\mastlog.ldf for file number 2. OS error: 5(error not found).
    The key here is the first part of the message, the operating system error code, in the first case Error 2, in the second Error 5. In the first example, the description isn't useful, but an Internet search for "Operating system error 2" will quickly reveal that it means File Not Found, indicating that the master.mdf file is just not there. In the second example, an "access denied" error indicates that SQL server does not have permissions to the mastlog.ldf file.
    Other common errors include "Error 3: Folder not found" and "Error 32: File in use by another process".
    The first step here is to identify what has happened to the master data and log files. Has someone changed the startup parameter to point to an incorrect location? If so, we need to alter the startup parameter so that it points to the correct location of the database files. You can find the startup parameters under the properties of the service, in the Service Control Manager application, as shown in Figure 3. The –d parameter specifies the location of the data file, the –l specifies the location of the log file.
    Is the drive they are located on available? Is the folder they are located in accessible? Are the files there? If the answer is "yes" in each case then try to start SQL Server again, as the error may have been a temporary one. If that works, then you still need to investigate why the problem occurred (maybe an incorrectly configured anti-virus scanner; maybe the disks take time to come online) and rectify it so that it does not happen again.
    If the files are not where they are supposed to be, perhaps because the drive failed or someone accidentally deleted them, then we'll need to restore them from backup (and everyone does have backups of the system databases, right...?)
    So, off to Books Online and the "Restore the Master Database" page (http://msdn.microsoft.com/en-us/library/ms190679.aspx), which tells us that to restore the master database, we must first restart the instance in single user mode, using the –m flag.
    > Sqlservr.exe –m
    Attempting to restart SQL Server using the –m flag
    Figure 5: Attempting to restart SQL Server using the –m flag
    Or not…! SQL won't start without the master database, even with the –m flag. In order to restore the backup of themaster database, we're going to have to:
    1. Rebuild all the system databases
    2. Start SQL Server in single-user mode
    3. Restore the backup of master, as well as backups of model and MSDB (rebuilding the system databases rebuilds all of them, which is why we need to restore model and MSDB in addition to master)
    As an aside: in addition to taking regular backups of the system databases, I like to take a one-off, post-installation copy of the files for each of the two main system databases (master and model). This is not because copying database files is a good backup strategy, but because it makes it easier to recover from this kind of situation, as it means I have files to put down should the system database files disappear completely. This allows me to start the instance and restore the backups of the system databases.
    Books Online documents well the process for rebuilding all the system databases (http://msdn.microsoft.com/en-us/library/dd207003.aspx). The tool doesn't say much, but it does the job.
    Rebuilding the system databases
    Figure 6: Rebuilding the system databases
    Once done, startup SQL Server in single user mode (-m) and restore mastermodel and msdb from backups, as per the Books Online entries.

    Master database files corrupt

    Just finding the master database files is not enough; SQL Server must be able to open the database, recover it and bring it online. Severe corruption may result in the master database failing to come online and hence the startup of SQL Server failing.
    As in the previous case, the last of the SQL Server error logs will contain an error that describes the problem. The final messages in the error log will look something like this:
    Starting up database 'master'.
    
    8 transactions rolled forward in database 'master' (1). This is an informational message only. No user action is required.
    
    0 transactions rolled back in database 'master' (1). This is an informational message only. No user action is required.
    
    Recovery is writing a checkpoint in database 'master' (1). This is an informational message only. No user action is required.
    
    Error: 824, Severity: 24, State: 2.
    
    SQL Server detected a logical consistency-based I/O error: incorrect pageid (expected 1:65; actual 0:0). It occurred during a read of page (1:65) in database ID 1 at offset 0x00000000082000 in file 'C:\Databases\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\master.mdf'.  Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.
    Or maybe this:
    Starting up database 'master'.
    
    Error: 5172, Severity: 16, State: 15.
    
    The header for file 'C:\Databases\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\mastlog.ldf' is not a valid database file header. The PageAudit property is incorrect.
    Depending on the type of corruption, and on which parts of the files are corrupt, you may see different messages. The key is that there is some fatal corruption found in the master database and the solution in each case is the same as described in the previous section for missing files. Rebuild the system databases, restart SQL Server in single user mode and restore the backups of all the system databases.

    Model database files missing, inaccessible or corrupt

    In the startup process, once SQL has opened and recovered the master database it needs to bring the other system databases online, starting with TempDB. However, SQL Server cannot simply locate and open the TempDBfiles and run crash recovery, as it does for the other system databases. TempDB is not like the other databases in the instance in that it's not intended to be a permanent store of data. As such, logging works differently for TempDB; for this database, SQL Server cannot roll transactions forward, which is part of a normal crash recovery. Therefore, instead of simply recovering TempDB on startup, SQL Server must return it to a known state, and the modeldatabase provides that known state. What happens is that SQL Server:
    1. Locates and opens the existing tempDB data and log files
    2. Sets their file sizes to those specfied for tempDB in the system catalogs
    3. Recreates or 'clears' (the terms are used interchangeably in the documentation and error logs) tempDBusing model, copying over allocation pages, bitmaps, etc.
    If the TempDB files don't exist (see the next section), SQL will create them based off the model database files (much as it would when a user database is created) and size them according to the file sizes specified for TempDB in the system catalog.
    All of this is a slightly long-winded way of saying that in order to start up cleanly, SQL Server needs to clear TempDB. In order to do this, SQL Server needs to bring online (open and recover) the model database, as model is the template. If the model database files are missing or damaged, this clearing of TempDB will fail and the SQL Server startup will fail. However, I hope the more-detailed explanation, above, will help you make sense of the messages you see when troubleshooting startup issues that relate to problems with model.
    The SQL error logs will contain messages explaining the problem and they will be much the same as for missing or corrupt master database files, just for the model files instead.
    Model database file missing:
    Starting up database 'model'.
    
    Error: 17204, Severity: 16, State: 1.
    
    FCB::Open failed: Could not open file C:\Databases\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\model.mdf for file number 1.  OS error: 2(failed to retrieve text for this error. Reason: 15100).
    
    Error: 5120, Severity: 16, State: 101.
    
    Unable to open the physical file "C:\Databases\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\model.mdf". Operating system error 2: "2(failed to retrieve text for this error. Reason: 15105)".
    
    Database 'model' cannot be opened due to inaccessible files or insufficient memory or disk space.  See the SQL Server errorlog for details.
    
    Could not create tempdb. You may not have enough disk space available. Free additional disk space by deleting other files on the tempdb drive and then restart SQL Server. Check for additional errors in the event log that may indicate why the tempdb files could not be initialized.
    Model database file corrupt:
    Starting up database 'model'.
    Starting up database 'model'.
    
    Error: 5172, Severity: 16, State: 15.
    
    The header for file 'C:\Databases\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\model.mdf' is not a valid database file header. The PageAudit property is incorrect.
    
    Error: 945, Severity: 14, State: 2.
    
    Database 'model' cannot be opened due to inaccessible files or insufficient memory or disk space.  See the SQL Server errorlog for details.
    
    Could not create tempdb. You may not have enough disk space available. Free additional disk space by deleting other files on the tempdb drive and then restart SQL Server. Check for additional errors in the event log that may indicate why the tempdb files could not be initialized.
    Note that in each case we get errors relating both to the problem with model and with the inability to create TempDB.
    If a model file is missing, we need to find out what happened to it. Has a drive not come online (or come online after SQL started)? Has someone accidentally renamed or moved the file? If we can make the model files accessible to SQL Server, in a similar fashion as described previously for the missing or inaccessible masterfiles, then this will solve the problem.
    If a model file is corrupt, then we need to restore a backup of model. Let's see what Books Online has to say about restoring model:
    "Restoring model or msdb is the same as for performing a complete database restore of a user database."
    Well that's useful… not! SQL Server won't start without model, so we can't just restore it as we would a user database.
    Let's try starting SQL Server from the command line, with traceflag 3608(http://sqlserverpedia.com/wiki/Trace_Flags). This traceflag tells SQL Server to recover only the master database upon startup, so SQL Server will leave model (and the other system databases) closed; it won't attempt to recovermodel or clear TempDB, so the damaged model file shouldn't cause SQL Server to fail to start.
    > Sqlservr.exe –T3608
    Recovering only master database because traceflag 3608 was specified. This is an informational message only. No user action is required.
    
    Starting up database 'master'.
    
    Recovery is complete. This is an informational message only. No user action is required.
    Sure enough, SQL Server starts. Great, now let's fire up SQLCMD and restore model, as shown in Figure 7.
    Attempting to restore model, after starting SQL Server with traceflag 3608
    Figure 7: Attempting to restore model, after starting SQL Server with traceflag 3608
    That wasn't quite what I wanted to see. Let's look at the error log (which SQL Server also prints to the console when we start it from the command line) and see what it reports.
    Clearing tempdb database.
    
    Starting up database 'model'.
    
    Could not create tempdb. You may not have enough disk space available. Free additional disk space by deleting other files on the tempdb drive and then restart SQL Server. Check for additional errors in the event log that may indicate why the tempdb files could not be initialized.
    What we have here is a classic catch-22. Previously, we got SQL Server started, albeit running in a limited "master-only mode". However, when we issue the RESTORE command, SQL Server, before it even attempts to carry it out, firsts tries to clear TempDB (not because it needs TempDB to perform the restore operation, but as part of bringing the rest of the system databases online). To clear TempDB, SQL Server tries to start up model, which it can't do because one or more of model's files is corrupt or missing. As such, the process fails before the restore of modelstarts.
    Since the problem seems to be the clearing of TempDB, let's go back to the beginning and try to restart SQL Server, but this time with traceflag 3609 instead, which tells SQL Server not to clear TempDB on startup, but just to recover it from the existing files. This is not something that you would ever run normally, because the previous shutdown could have left TempDB in an inconsistent state (remember, for TempDB SQL Server cannot roll transactions forward, which is part of a normal crash recovery). However, for recovering from a disaster it can be useful.
    > SQLServr.exe –T3609
    
    Server      Registry startup parameters:
    
             -d C:\Databases\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\master.mdf
             
             -e C:\Databases\MSSQL10_50.MSSQLSERVER\MSSQL\Log\ERRORLOG
             
             -l C:\Databases\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\mastlog.ldf
             
    Command Line Startup Parameters:
    
             -T 3609
             
    Recovering all databases, but not clearing tempdb. This is an informational message only. No user action is required.
    
    Starting up database 'master'.
    
    Starting up database 'model'.
    
    Starting up database 'msdb'.
    
    SQL Server is now ready for client connections. This is an informational message; no user action is required.
    
    Error: 5172, Severity: 16, State: 15.
    
    The header for file 'C:\Databases\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\model.mdf' is not a valid database file header. The PageAudit property is incorrect.
    
    Error: 945, Severity: 14, State: 2.
    
    Database 'model' cannot be opened due to inaccessible files or insufficient memory or disk space.  See the SQL Server errorlog for details.
    
    Could not create tempdb. You may not have enough disk space available. Free additional disk space by deleting other files on the tempdb drive and then restart SQL Server. Check for additional errors in the event log that may indicate why the tempdb files could not be initialized.
    This time SQL Server won't even start. We specified via use of traceflag 3609 that SQL Server should, on startup, recover TempDB from the existing files but not attempt to clear it. Nevertheless, it still attempted to recover model, along with the other system databases (because this time we didn't tell it not to) and the failure to recover modelresulted in SQL shutting down again.
    Oddly enough, the error log still complained about not being able to create TempDB, though it had no need to do that. Perhaps that's just a default message when failing to recover model.
    Well, if one traceflag won't do the job, maybe two will:
    > SQLServr.exe –T3608 –T3609
    
    This instructs SQL Server not to recover any database other than master, and not to clear TempDB, just to recover it, as is, from its existing files. In other words, if SQL does need to bring TempDB online at a later point, it won't attempt to clear it, which is what requires model). This time, as for when we used traceflag 3608 on its own, SQL Server starts up just fine.
    Now, to restore model:
    Restoring model, after starting SQL Server with traceflags 3608 and 3609
    Figure 8: Restoring model, after starting SQL Server with traceflags 3608 and 3609
    The restore works this time, but let's see what the error log reports:
    Starting up database 'tempdb'.
    
    Recovery is writing a checkpoint in database 'tempdb' (2). This is an informational message only. No user action is required.
    
    Starting up database 'model'.
    
    The database 'model' is marked RESTORING and is in a state that does not allow recovery to be run.
    
    Starting up database 'model'.
    
    Restore is complete on database 'model'.  The database is now available.
    
    Database was restored: Database: model, creation date(time): 2003/04/08(09:13:36), first LSN: 19:304:37, last LSN: 19:336:1,number of dump devices: 1, device information: (FILE=1, TYPE=DISK: {'C:\Databases\Backup\model.bak'}). Informational message. No user action required.
    The error log shows that SQL Server started TempDB but thanks to the use of traceflag 3609 when we started SQL Server previously, it only recovered TempDB; it didn't attempt to clear it. As an aside, this is the only time you'd see the "Recovery is writing a checkpoint in database 'tempdb'" message, as recovery does not normally run onTempDB. Note that if the TempDB files were also damaged or missing however, this would result in SQL Server shutting down.
    The critical point is that, this time, it was the RESTORE command that caused SQL Server to try to start up model(something it would for any database being restored), rather than an attempt to clear TempDB. As such, the existing, damaged model database is already marked as "restoring", and so SQL Server does not attempt to recover it, does not notice the damage and so the restore could complete. In a sense, the restore proceeded from this point just as would the restore of any user database. Once the restore of model completes, we can restart SQL Server as a service without any of the traceflags and it will start correctly.
    The process as I've described it sounds a little more complex than it is, just because I wanted to make it clear why we needed both traceflags. In practice, this restore method is very quick: start SQL from the command line with the traceflags, restore model backup, kill that running instance of SQL, and restart the service.
    Nevertheless, a simpler approach might be to copy the model files from another SQL instance of the same version (preferably service pack too), and use those in place of the missing or damaged files. This lets SQL start normally and then we can simply restore model from backup. If we have backups of the model database files, we can equally well use those.
    Yet another option would be to rebuild the system databases (as discussed earlier) and then restore the backups of mastermodel and msdb. This will take a little longer, as three backups need to be restored, but works just fine.

    TempDB location does not exist

    As we saw in the previous section, in order to start up, SQL Server has to be able to bring TempDBonline. Missing or corrupt TempDB files aren't a huge problem here, because if necessary SQL Server can recreate the TempDBdatabase from model. However, the TempDB folder has to exist.
    If someone altered the TempDB database, and in doing so specified an invalid location for the file, or the drive that housed TempDB failed, or someone changed the folder names while the SQL Server service was stopped, then SQL won't be able to restart.
    As with most problems with system databases, SQL Server logs the error message to the error log and it looks like this:
    Clearing tempdb database.
    
    Error: 5123, Severity: 16, State: 1.
    
    CREATE FILE encountered operating system error 3(failed to retrieve text for this error. Reason: 15100) while attempting to open or create the physical file 'C:\SQLData\tempdb.mdf'.
    
    Error: 17204, Severity: 16, State: 1.
    
    FCB::Open failed: Could not open file C:\SQLData\tempdb.mdf for file number 1.  OS error: 3(failed to retrieve text for this error. Reason: 15105).
    
    Error: 5120, Severity: 16, State: 101.
    
    Unable to open the physical file "C:\SQLData\tempdb.mdf". Operating system error 3: "3(failed to retrieve text for this error. Reason: 15105)".
    
    Error: 1802, Severity: 16, State: 4.
    
    CREATE DATABASE failed. Some file names listed could not be created. Check related errors.
    
    Error: 5123, Severity: 16, State: 1.
    
    CREATE FILE encountered operating system error 3(failed to retrieve text for this error. Reason: 15105) while attempting to open or create the physical file 'C:\SQLData\tempdb.mdf'.
    
    Could not create tempdb. You may not have enough disk space available. Free additional disk space by deleting other files on the tempdb drive and then restart SQL Server. Check for additional errors in the event log that may indicate why the tempdb files could not be initialized.
    Operating system error 3 is, as mentioned earlier, folder not found. It could be that the folder C:\SQLData does not exist or it could be that the drive does not exist at all.
    As in the earlier cases, the first step on encountering this error is to identify the cause. Did a drive fail? Did a drive come online after the SQL Server service? Did someone rename the directory? In short, can we resolve the problem quickly by changing or adding a directory? In the example above, it's far more likely that the directory does not exist than that the C: drive does not exist (seeing as the operating system is running) and as such recreating that directory would allow SQL Server to start.
    If a drive has failed, it may be possible to temporarily map a SAN LUN (or even add an external drive) to that drive letter, to allow SQL Server to start. Once SQL Server has started, we can change the defined location of the TempDBfiles using the ALTER DATABASE command and restart SQL Server, putting the TempDB files into a new location.
    If that's not possible, then we need to start SQL Server without it clearing TempDB so that we can specify a new location that does work. We can do that, as described in the previous section, with the traceflag 3608
    > SQLServr.exe –T3608
    SQL starts in the "master-only" configuration, which shouldn't be a huge surprise at this point, and we can connect via SQLCMD and change the defined location of the TempDB database to a folder that does exist and that SQL Server has permission to access.
    Altering TempDB to change the file locations
    Figure 9: Altering TempDB to change the file locations
    Assuming we specified the new location correctly, we can now stop the command-line instance of SQL Server, restart the service and SQL Server will recreate the TempDB files in their new location.

    Missing Error Log Location

    This is a seriously fun and unexpected error! One of the things that SQL Server needs to do to start is to locate and cycle the error log. It locates the error log using the –e startup parameter, if specified, or it looks in the registry if that parameter is not there. Once it gets the location, it renames the existing error log files (Errorlog becomesErrorlog.1Errorlog.1 becomes Errorlog.2, and so on), creates a new Errorlog and starts logging the startup progress into this file.
    If the directory specified for the error log does not exist, startup will fail and there will be no error log to tell you why it failed.
    There are two main reasons why the error log directory might be missing:
    • Drive failure (if the error log was on a different drive than the SQL Server binaries)
    • An incorrect modification to the service startup parameters so that SQL Server looks for a location that doesn't exist
    The first hint that this is the cause of SQL Server failing to start is the lack of a new error log. We can find out where SQL Server expects to find the error log by checking the startup parameters of the SQL Server service in SQL Server Configuration Manager, as shown earlier in Figure 3. If the location specified for the –e parameter does not exist, then it's a rather large hint that the problem is a missing error log directory. It is also possible for the location to exist but for the SQL Server service not to have permissions to access it, in which case, the location will exist, but there will be no error log from the last startup attempt.
    To confirm the problem is the lack of an error log, we can check the Windows Event Log, specifically the Application log and, if it is, then there we will find the error message shown in Figure 10.
    Checking the Windows event log for missing error log messages
    Figure 10: Checking the Windows event log for missing error log messages
    As discussed earlier, Operating system error 3 is Folder not found. Error 5 would be Permission denied, and Error 32 means that the Errorlog file was in use by another process when SQL attempted to open it, possibly caused by a misconfigured anti-virus scanner.
    Once again, the correct resolution depends on the exact nature of the problem. If we're dealing with a misconfigured anti-virus scanner, which was reading the file when SQL tried to access it, then the resolution may be as simple as restarting the service and fixing the anti-virus configuration to ensure that the problem does not reoccur.
    If the SQL Server service does not have permission to access the folder, then we grant the necessary permission and restart the SQL server service. Once the crisis is past, we can find out who removed the permission and why, perhaps a security admin tightening security or implementing a new group policy.
    If the problem is that the folder or drive is missing, then we can modify the value specified by the –e startup parameter so that it points to a location that exists, which should allow the SQL Server service to start. If only the folder is missing, we can recreate it and assign the correct permissions. SQL Server doesn't need to locate the old error log files in order to start; it just needs the folder to be there.