I first shared my DBA Repository solution back in 2007. Since then, I've received many email messages with feedback and requests, which has left me both honored and chagrinned. I'm honored that many DBAs have incorporated this solution into their own environments, often tweaking, tuning, and ultimately making it their own. I'm chagrinned because I've been remiss in updating the DBA Repository lately. I've now remedied that situation by making the following enhancements:
- Resiliency for offline servers
- SQL Server 2008 support
- Performance improvements through the introduction of parallel processing
In case you're unfamiliar with the DBA Repository, at its heart lies a SQL Server Integration Services (SSIS) package that gathers information DBAs can use to assess and report on the SQL Server machines they manage. For example, information about server names, SQL Server editions and service pack levels, database names and sizes, user and login permissions, and SQL Server Agent jobs is gathered. This data is stored in a central database named DBA_Rep. You can find more information about the original solution and its initial updates in the articles listed in the Learning Path box.
Resiliency for Offline Servers
When I first built the DBA Repository, I was under the gun to get a solution working quickly. My main goal was to use the repository as a documentation source for myself and the other DBAs I worked with. Although I was concerned with error handling within the SSIS package, I didn't spend as much time as I could have on it. When I executed the package with a scheduled SQL Server Agent job, I soon found out that a server was offline when the job ran, raising an error. Because the package's processing was synchronous (i.e., completed one step at a time), all the information that would have been gathered after the error occurred was lost. I had to resolve this problem before a successful run of the repository package could be performed.
There's a property named MaximumErrorCount whose value you can modify to accommodate a higher error threshold for an SSIS package. By default, this property's value is set to 1, which means that one error is the maximum number allowed. When this threshold is reached, all processing stops at that point. As a short-term solution, I changed this property's value to a higher number.
Most objects in SSIS packages include the MaximumErrorCount property. As the package grew larger, I found myself adjusting this property for each object, which quickly became tiresome. At the time, I also configured a simple SMTP object in the Event Handler that would email me the name of the offline server, but this didn't prevent the entire package from failing.
I've now devised a solution that provides resiliency for offline servers. The solution is fairly simple but effective. Instead of expecting that all the servers will be available, I added a ForEach Loop container that attempts a connection with each server before the data collection begins.
The list of target servers is created with a T-SQL query that pulls data from a table named ServerList_SSIS table. In the original solution, this table contained five fields: one varchar field to store the server name and four smallint fields (Connect, Version, DMZ, and LocationID) to control the flow of processing within the SSIS package. To gain the resiliency I needed, I added another field named Online to this table. The Online field is used to check each server in the server list and report whether it's online or offline to the package and ultimately to you. It also prevents the package from accumulating connection errors.