• subscribe
November 11, 2009 12:00 AM

Administer WSS 3.0 Content Databases from SQL Server

Use these T-SQL queries to gather useful data regarding your SharePoint environment
SQL Server Pro
InstantDoc ID #102848

When it comes to Microsoft network service applications, like SharePoint, a pretty package and simple interface mask the complexity of what the software is really doing. Many SQL Server DBAs have been challenged and confounded by their responsibilities to SharePoint databases that seem to magically appear overnight. This article will take some of the sting out of administering SharePoint content databases in a SQL Server instance. First we'll take a quick peek at the schema of the mysterious Windows SharePoint Services (WSS) content database and identify some objects ripe for querying. Then we'll explore specific T-SQL queries that can be used to garner useful details about a SharePoint environment. Finally, we'll take a look at the potential dangers of altering the content database via SQL Server.

It’s important to note that everything you're about to read goes against Microsoft best practices, and for good reason. Messing around with SharePoint databases directly in SQL Server can cause stability and security problems in SharePoint, as well as prevent successful troubleshooting and support when you need it from Microsoft. Carefully consider these risks before employing any of the outlined procedures in a production SharePoint environment. Remember, it's all fun and games until something (the server) gets hurt.

SharePoint's Content Database Schema
Of all the SharePoint databases in a WSS 3.0 or Microsoft Office SharePoint Server (MOSS) 2007 farm, the content database is by far the most volatile. In fact, the content database sees so much action its default recovery model (Full) makes its transaction log file a prime suspect when it comes to storage depletion. You would think a database that’s so popular among routine SharePoint transactions would have a wealth of documentation written about it. However, that’s not the case. In fact, because Microsoft recommends all interaction with SharePoint databases be conducted either through the SharePoint GUI or via programming against the SharePoint object model, there’s little explanation of the content database's structure.

Identifying SharePoint's content databases in a SQL Server instance is fairly easy if the default database name of WSS_Content({GUID}) was generated by SharePoint Central Administration. However, administrators can also create custom names for content databases, so you might need to peek inside a database to determine if it is, in fact, a SharePoint content database. So let's break things down a bit by starting with some user-defined tables that can be found in a SharePoint content database. Each table serves a particular purpose and several of them would seem to be related, yet few referential integrity connections exist. For example, the three most important tables concerning document libraries all have primary keys and indexes but no foreign key relationships with one another (see Figure 1).

Suffice it to say that the WSS content database doesn’t use normalization to its advantage, which makes writing direct T-SQL queries into it a distinct challenge. Furthermore, most of the row data values are identification numbers (some object GUIDs, others internally generated), and although the columns containing these numbers make excellent reference choices for JOIN statements, the numbers themselves aren’t easily recognizable to users.

So how do you determine which tables hold the information you need to get out of SharePoint? Relying on table names isn't the best method because some of the names are misleading. For instance, the AllDocVersions table appears to contain different versions of documents held in a library that has versioning enabled. However, the actual documents are binary large objects of the image data type held in the AllDocStreams table's Content column with a reference to the version ID number from the AllDocVersions table that corresponds to the current version of the document. Talk about confusing! Table 1 outlines some of the more useful tables in SharePoint's content database that might be ripe for querying.

It would seem from Table 1 that SharePoint fails to follow a tried and true Microsoft best practice in SQL Server: Composite primary keys should be created sparingly and only when absolutely necessary. This SQL Server best practice stems from the theory that composite primary keys lengthen the key values of the corresponding index, resulting in less efficient query optimization and disk utilization. However, since it would be inadvisable to alter the table structure of SharePoint's content database, we'll just have to assume the SharePoint product team had their reasons for violating SQL Server best practices and leave the indexes as they lay. Furthermore, the tables listed with an asterisk next to their name in Table 1 have dependent View objects of the same name without the "All" prefix. For example, the Docs view retrieves rows from the AllDocs table. These views retrieve the entire column set of all rows not marked for deletion. So querying the view instead of the underlying table would retrieve a smaller result set but you could be missing desired rows during a salvage operation. These views support the relational data integrity of rows marked for deletion while also making it possible to recover previously deleted items such as by using the Recycle Bin.

None of the table objects listed in Table 1 contain any foreign key constraints, and as you can see from the Primary Key columns, almost all of them contain duplicate data. The content database is highly denormalized, a condition that enhances OLAP processing. Performance indicators will show more efficiency during read operations than during write operations. Furthermore, the dependencies of these tables list a bevy of stored procedures and functions that SharePoint employs for error control and row manipulation. Any direct queries into these tables should be written so as to eliminate duplicate or unrelated row data in the result set.

Querying the Content Database
Now that we've seen a few tables that contain useful information about our SharePoint environment, let's take a look at writing T-SQL queries directly into them. Keep in mind that the same results could be obtained programmatically by writing .NET code against the SharePoint object model, but for SharePoint administrators or SQL Server DBAs a new Query window in SQL Server Management Studio (SSMS) is sometimes quicker than tasking the developers in IT with a new project. Here are a just a few common scenarios that can be easily resolved by simple T-SQL queries.



ARTICLE TOOLS

Comments
    There are no comments to display. Be the first one!
You must log on before posting a comment.

Are you a new visitor? Register Here
  • SP1?
    I know there is a SP1 for SQL 2008 R2 available....and there is a "feature pack" as well... ...
  • SQL database mirroring
    I have SQL Server 2008 R2 Enterprise 64bit on Windows 2008 R2 Enterprise 64bit.  Each SQL Server has...
  • Dell Compellent Disk Drive
    Does anybody has experience with Dell Compellent Disk Drive? Basically, this system manages all disk...
  • Sql server performance tuning
    I need to find a tool that help me to optimize sql server,queries,improve the performance and solve ...