• subscribe
Michael K. Campbell

Practical SQL Server Blog

by Michael K. Campbell

SQL Server Database Corruption Part IV: CHECKSUM Page Verification

In previous posts within this series, we’ve already looked at how you can’t really prevent corruption. Instead, in order to be able to better deal with corruption, you need to be able to detect it early. To that end, there are actually a number of different ways to enable early (or early-ish) detection of corruption when it happens. And, in this post, we’ll look at the use of CHECKSUM Page Verification as the first of these methods. Enabling CHECKSUM for Page Verification When configured correctly, SQL Server can instantly spot when data that it is pulling off of disks was saved improperly – or corrupted. To enable this functionality, you just need to ensure that desired databases are configured to use the CHECKSUM option for page verification – which you can set on SQL Server 2005 and above. To do this, simply query your server as follows, and review the names of all databases returned – as they are not configured for CHECKSUM verification: .sqlcode { font-size: 13px; font-weight: bold; font-family: monospace;; white-space: pre; -o-tab-size: 4; -moz-tab-size: 4; -webkit-tab-size: 4; } .sqlcomment { color: #00aa00; } .sqlstring { color: #aa0000; } .sqlfunction { color: #aa00aa; } .sqlkeyword { color: #0000aa; } .sqloperator { color: #777777; } USE master GO SELECT NAME ,page_verify_option_desc FROM master.sys.databases WHERE page_verify_option_desc != 'CHECKSUM' GO Then, to enable (or toggle) these database to use CHECKSUM verification, execute the following statement per database that you wish to ‘convert’ to using CHECKSUM verification: .sqlcode { font-size: 13px; font-weight: bold; font-family: monospace;; white-space: pre; -o-tab-size: 4; -moz-tab-size: 4; -webkit-tab-size: 4; } .sqlcomment { color: #00aa00; } .sqlstring { color: #aa0000; } .sqlfunction { color: #aa00aa; } .sqlkeyword { color: #0000aa; } .sqloperator { color: #777777; } ...Read the rest of entry >>
Posted @ 02/21/2012 12:13 PM By Michael K. Campbell
 

SQL Server Database Corruption Part III : Preventing Corruption

In previous posts we touched upon an overview of what causes corruption and looked at ways to simulate corruption. In this post we’ll talk about more about corruption – in terms of addressing how you can prevent it. (Though, if you’ve been paying attention in previous posts, you’ll probably guess what’s coming up). How to Prevent SQL Server Database Corruption In answer to the question: “How can I prevent corruption?” the simple, and honest, answer is: “You really can’t”. As covered in previous posts, corruption is ‘more or less’ a random problem that occurs at the disk subsystem level and really can’t be prevented. (Or as I like to tell all of my consulting clients: “There’s a reason that EVERY major Operating System out there ships with something like CHKDSK.”) So, while disk/storage errors should be rare within any production system, the fact is that these kinds of errors aren’t so much a question of IF they’ll happen, but WHEN they’ll happen – simply because the reality is that magnetic storage is insanely complex and subject to the occasional ‘hiccup’ that can result in corruption. However, given that corruption SHOULD be a relatively rare problem that you will encounter in your environment, it IS safe to say that if you keep bumping into seemingly-regular or semi-regular problems with corruption, it’s fair to say that you may need to start becoming concerned about whether there’s a problem with your hardware or possibly even drivers and so on. (The problem, however, is that troubleshooting these kinds of problems is insanely hard – so if you do start to recognize regular problems with corruption it may be time to start thinking about migrating critical systems to new hardware.) However, even though you can’t prevent corruption, that doesn’t mean that you are powerless against its effects. In fact, not only does SQL Server readily accept or anticipate that disk subsystem errors can and WILL occur, but it also provides a number of great tools and function...Read the rest of entry >>
Posted @ 02/09/2012 10:51 AM By Michael K. Campbell
 

SQL Server Database Corruption Part II: Simulating Corruption

In my last post I provided an overview of what SQL Server database corruption is – and how it’s almost always caused by problems at the IO subsystem (or disk) level. However, while it’s all fine and well to talk about things in such a theoretical sense, in that post I also mentioned that a great way to get a ‘feel’ for how corruption works is to simulate it a bit on your own. Accordingly, in this post I’ll provide a step-by-step walkthrough of what that looks like by simulating some corruption. Setting the Stage Obviously, when it comes to actively trying to ‘corrupt’ a database there are a couple of caveats that need to be addressed – above and beyond the OBVIOUS caveat that this is something you’d never want to do with production database. (It IS a great experiment to test against COPIES of your production databases though). Otherwise, the caveats to simulating corruption are that you’ll want to DELETE any potential existing data from msdb’s suspect_pages table (which we’ll talk about in a future post), and that you’ll obviously want to make sure that you’ve got a viable backup (even if it’s a simple copy/paste of an existing .BAK or .mdf/.ldf files) of whatever database you’ll be corrupting. In my case, I’m corrupting a copy of the AdventureWorks database that I have running in my environment – mostly because I just hate AdventureWorks so much. So, in my case I’m backing it up like so:   Simulating Corruption Then, when it comes to actually simulating corruption, that ends up being a bit hard to do when SQL Server has its ‘hooks’ into the database in question – so I’ll just Detach it using the SSMS GUI, as follows:   Once detached, the database is just a collection of ‘zeroes and ones’ that I can then open up and ‘mangle’ as needed. At this point I then just need to find the actual .mdf file for this database (which I happen to know is in my D:\SQLData\ drive on my test server), and then I can open it up with an application other than SQ...Read the rest of entry >>
Posted @ 01/30/2012 8:19 PM By Michael K. Campbell
 

SQL Server Database Corruption Part I : What Is Corruption

When it all comes down to it there are really only two main things that DBAs need to worry about: making data available to the proper people, and making sure it’s inaccessible to the not-so-proper people. All other considerations are really just appendages to these two concerns. For example, performance is just an extension of making data available to the right users – because if they can’t access it in timely fashion, then it’s not really as ‘available’ to them as it should be. As such, concerns around disaster prevention and recovery should rank high on every DBA’s list. Yet, surprisingly, I find that most DBAs (especially the ‘reluctant’ ones) don’t have enough of a grasp on how to sufficiently protect against database corruption. I therefore thought it would be fun to launch into a multi-part set of posts that provide a hands-on overview of SQL Server Database Corruption Basics – including an overview of what corruption is, why you can’t really prevent it – and what you can do to deal with that reality such that you’re able to ensure proper availability and uptime of your data even when corruption occurs. What is Database Corruption? For the purposes of this series of posts, database corruption is defined as a problem associated with the improper storage of the actual zeroes and ones needed to store you database data at the disk or IO sub-system level. In this sense, the corruption addressed in this series of posts is VERY different from other kinds of ‘disasters’ that might render business data useless (such as when a developer or admin runs an UPDATE statement without a WHERE clause or a software glitch incorrectly ‘mangles’ unit prices). However, while user/application errors or ‘disasters’ are technically much different than disasters caused by corruption, it’s important to note that many of the mitigation strategies outlined in this series of posts for addressing SQL Server database corruption ALSO represent best-practices that can easily be leverag...Read the rest of entry >>
Posted @ 01/27/2012 9:55 AM By Michael K. Campbell
 

Off-Box Backups and Luke-Warm Standby Servers – Part III

Until now, my previous two posts have largely just been a review of best practices for making backups redundant – with only a single mention about RESTORE VERIFYONLY’s LOADHISTORY argument as a means of providing a rationale for enabling luke-warm failover servers from those backups. And, again, just for the record: there ARE better High Availability solutions out there for SQL Server (such as Mirroring, Log Shipping, SQL Server 2012’s AlwaysOn happiness, clustering, and (in some cases) replication). As such, the point of these posts is really just to cover options for less-expensive failover options for organizations that don’t need ‘full-blown’ failover options and as a means of describing some additional ways to make ‘full-blown’ HA solutions more redundant and capable of recovery – as you never know when you’ll need some sort of Remote-Availability or smoke-and-rubble contingency. That, as I’ve said over and over again, high-availability isn’t the same as disaster recovery. So, with all of that said, the big problem at this point is that just having copies of your backups in secondary/remote locations is NOT a disaster recovery plan. Yes, having copies of your backups off-site is better than nothing – but merely having copies of your backups off-site isn’t going to help much in an emergency – especially if we’re talking about multiple databases of any decent size. That, and unless you’re regularly testing these secondary/luke-warm backups, you not only have any idea whether they’re viable or not, but you don’t accurately know if they’re capable of meeting RPOs and RTOs. To that end, you need what I like to call a ‘mass recovery’ script. The Benefits of a Mass Recovery Script In my mind there are two primary benefits of a ‘mass recovery’ script. First and foremost, my experience is that most DBAs aren’t as familiar with all of the ins and outs of restoring databases in an emergency as they need to be. Or, as I like to tell many of my clients: “The LAST thi...Read the rest of entry >>
Posted @ 01/02/2012 7:26 PM By Michael K. Campbell
 

Off-Box Backups and Luke-Warm Standby Servers – Part II

Following up on my previous post, when it comes to the need to create off-box backups, there are really only two (well, three) main reasons you’d want to do Off-Box Backups: Three Primary Reasons for Off-Box Backups First: Redundancy. As I pointed out in my last post: If you’re only keeping backups and data on the same server or hardware, then you’re DOING IT WRONG. From an elementary Disaster Recovery (DR) standpoint you always need a copy of your backups ‘mirrored’ to at least one other location. Drives can fail, RAID controllers can fail and take drives/data with them, and a host of other REALLY UGLY things can happen to data stored on a single host/server. Without off-box backups, then, you’re a sitting duck. And therefore, the reason, in this case to have off-box backups is a question of simple redundancy. (And, as I pointed out in my last post, an additional benefit of ‘off-box’ backups is that you can commonly store these redundant backups on less-expensive (and more voluminous) storage where you can actually, typically, keep copies of backups longer than you could on your primary server. But again, as I mentioned in my last post – you want to make sure you’re keeping backups locally on your primary server as well – to help avoid incurring the cost of pulling backups over the wire WHEN you need to recover. So, in this case, think redundancy of your data. Second: Closely related to the first reason for why you’d want to keep backups in off-box locations is the simple fact that sometimes entire servers can fail. A Windows Update, or the addition of a driver might render a box completely non-responsive. In which case, trying to get backups off of that box, or it’s RAIDed HDs is going to be nothing short of a nightmare. That, and the point of this series of posts is to describe how to effectively set up ‘luke warm’ standby servers. So, in that case, knowing that you need a redundant location for your backups, and knowing that you might need a redundant HOST o...Read the rest of entry >>
Posted @ 12/23/2011 1:03 PM By Michael K. Campbell
 

Off-Box Backups and Luke-Warm Standby Servers – Part I

SQL Server’s RESTORE statement is insanely powerful. Not only because of the obvious fact that it can be handy in a disaster (when you have regular backups in place), but because it’s also so incredibly versatile. So versatile, in fact, that I’d wager that most DBAs only use a fraction of the various options and capabilities that it provides. And, to that end, I wanted to take a multi-part look at how just one or two rarely used facets of the RESTORE command can be used to great benefit when it comes to increasing overall disaster preparedness.  Taking RESTORE VERIFYONLY For a Spin I’m guessing that many DBAs have bumped into the VERIFYONLY argument of the RESTORE clause – as it’s something that many DBAs will either explicitly implement themselves as part of a scripted, regular, backup of their databases, or it’s potentially (hopefully) something that they’re doing with their backups through whatever GUI they’re using to schedule regular backups. But, just to make sure we’re on the same page, here’s what Books Online says about it: Verifies the backup but does not restore it, and checks to see that the backup set is complete and the entire backup is readable. However, RESTORE VERIFYONLY does not attempt to verify the structure of the data contained in the backup volumes. In Microsoft SQL Server, RESTORE VERIFYONLY has been enhanced to do additional checking on the data to increase the probability of detecting errors. The goal is to be as close to an actual restore operation as practical. For more information, see the Remarks. From within SQL Server Management Studio, the option to run this command is ‘baked in’ to ad-hoc backups – as shown in Figure 1. Figure 1: The ‘Verify backup when finished’ checkbox executes RESTORE VERIFYONLY under the covers. There’s also a similar GUI option called ‘Verify backup integrity’ in the ‘Backup Database Task’ that can be defined as part of a SQL Server Maintenance Plan. Likewise, there is typically a simi...Read the rest of entry >>
Posted @ 12/13/2011 1:32 PM By Michael K. Campbell
 

Breaking the Backup Chain – REDUX (Or Eating Crow)

While I like to pride myself on being professional enough to admit when I’m wrong, it turns out that I’m a bit more shallow than I’d like to be. For example, I commonly tell my clients that I don’t know everything about SQL Server (and that anyone who claims to know everything is someone they should run away from). Consequently, I don’t mind being wrong when I ‘guess’ at things with them as we’re figuring out various problems and issues together. But as I’ve found out in the last day, admitting that I’m wrong has actually been a bit of a nightmare – for two reasons: First because I was wrong in a case that really mattered: I offered flawed professional advice. And that, in turn means that I got to simultaneously feel sick about offering bad advice while also feeling dumb about doing so in ‘front of everyone’. Second, this was a nightmare because this was sadly a case where I’ve been WRONG about a key concept for entirely too long. Years in fact. I Was Wrong Given that I was wrong, I want to do two things with this post: First, I want to correct the bad information that I posted previously. To that end, rather than correcting my previous posts, I’m going to leave them intact and ‘update’ them with warnings about the bad info – along with a link to this post. Second, I also want to take a look at HOW I ended up being so wrong because I think a post-mortem of my mind-set might be instructive and can hopefully help others avoid falling into similar traps. (And if I’m honest, I also want a chance to, umm, defend myself a bit.) How I was Wrong In the previous two posts: SQL Server Backups – When More is Less and Unused Secret Weapon – COPY_ONLY Backups, I erroneously linked COPY_ONLY backups to providing ‘protection’ for transaction log backups. That was incorrect. Transaction Log backups are NOT susceptible to the problems I outlined in terms of breaking the backup chain. Only DIFFERENTIAL backups are. Being wrong about that sucks. But it’s also worth m...Read the rest of entry >>
Posted @ 11/29/2011 8:15 PM By Michael K. Campbell
 

Unused Secret Weapon: COPY_ONLY Backups

COPY_ONLY backups were a great addition to SQL Server 2005 (and up). Without them, it was all to easy for DBAs, devs, and SysAdmins to create a ‘backup’ for dev/testing purposes which would actually, silently, break the Log Chain and put production data in severe risk. In other words, if: a) A Non COPY_ONLY backup was made, b) A new or scheduled FULL/DIFFERENTIAL backup hadn’t yet been made, and c) A database disaster occurred Then DBAs _HAD_ to have that non-copy only backup on hand or they wouldn’t be able to recover from the disaster properly – because they’d be missing the proper baseline from which to begin applying transaction log backups. What’s So Secret? As I’ve mentioned before, the notion of breaking the log chain can be a bit counter-intuitive. (Seriously, backing something up can actually BREAK it? That’s not something you expect to find with many systems. Yet, if you understand what’s going on and WHY the log file is so essential to SQL Server, then the fact that you CAN break the log chain becomes something that is not only second nature – but which you’re constantly vigilant against.) Yet, while I think that increasing numbers of DBAs (including even reluctant DBAs) have a decent understanding of the benefits of COPY_ONLY backups in terms of protecting the log chain, I think that the overwhelming majority of DBAS don’t realize that log file backups can ALSO be used against COPY_ONLY backups as well. Therefore, since a picture is worth a thousand words, here’s a diagram that I think increasing numbers of DBAs understand and get – in terms of why COPY_ONLY backups are so important. Figure 1: Why it’s so important to use COPY_ONLY Backups (or keep non-COPY_ONLY backups on hand until next FULL/DIFF backup). On the other hand, while the benefits of COPY_ONLY backups are understood from the sole standpoint of PROTECTING the log chain for a PRODUCTION database, Figure 2 shows that there are also some other, very important, benefi...Read the rest of entry >>
Posted @ 11/17/2011 12:21 PM By Michael K. Campbell
 

SQL Server Backups: When More is Less

For most reluctant-DBAs there’s a certain degree of mysticism or ‘unknown’ about the SQL Server backups that they get ‘roped’ into managing. Moreover, for the uninitiated, there are a number of things about how SQL Server works that can seem counter-intuitive. Such as auto-closing and auto-shrinking databases – things that may seem logical to Systems Admins who are tasked with ‘optimizing’ resources but which, in reality, are easily some of the worst things you can do to your SQL Server databases. SQL Server and Disaster Recovery Just as counter-intuitively: while the data in a SQL Server database may be all sorts of mission critical, more backups of that data may actually JEOPORDIZE your ability to recover a database after a disaster – because MORE backups of SQL Server databases may, in fact, result in less coverage and protection. For example, one thing that I see fairly regularly is that reluctant DBAs who are handed ownership of SQL Server databases typically just don’t get the kinds of training or background needed to help them demystify SQL Server Backups. So, what commonly happens is they’ll spend a bit of time with their ‘new responsibility’ and commonly use SQL Server Management Studio to create a Maintenance Plan that (among other things) executes FULL backups at let’s say… 1AM every night. Then, what commonly happens is plenty of time passes with no problems, incidents, or issues. And the sense of urgency to figure out how these backups work gets pushed into the background – by more immediate problems and day-to-day needs. Then, commonly, these same reluctant DBAs commonly make a critical mistake a bit later on which is while reviewing their normal system-level backups for some Well-Known-3rdParty-Backup-Solution(TM) they decide “Hey, this 3rd party solution that has saved my bacon with file-server or AD or Exchange backups in the past actually supports native SQL Server backups; I’m going to go ahead and provide EXTRA coverage and protection ...Read the rest of entry >>
Posted @ 10/31/2011 12:22 PM By Michael K. Campbell