A histogram is a statistical report that shows you the frequency of values within steps or ranges of values that fall between a certain minimum and maximum. Let's take student exam results as an example. Suppose that of 30 students taking an exam, the lowest score is 51 and the highest is 100. You want to generate steps ranging between the lowest and highest scores and count the number of results within each step to get a sense of score distribution in the class. If you wanted to generate five consecutive steps of similar range sizes, the steps and ranges would be 50 to 60, 60 to 70, 70 to 80, 80 to 90, and 90 to 100, lower bound excluded and upper bound included. The histogram would contain the steps and the number of results that fall within each step.
Similarly, you might use histograms to analyze a sampling of values from performance counters (e.g., CPU utilization, memory) set on servers in your network. For example, suppose you record CPU utilization of a certain network server every minute and the measured values for a particular day ranged from 21 percent to 100 percent. You could generate a histogram with four steps20 to 40, 40 to 60, 60 to 80, and 80 to 100to find the number of samples that fell within each step. If your server was overloaded during that day, most of the samples would fall in the fourth step.
Generating a Performance-Counter Histogram
Here's a problem involving histograms; see if you can solve it before looking at my solutions. A scheduled SQL Server Agent job periodically records a sampling of performance counters for a certain network server in a table called Samples. Run the code that Listing 1 shows to create the Samples table and populate it with data. Each row contains the measurement ID (measid), when the sample was taken (dt), the measured value (value), and a filler column of 100 bytes that represents other columns in the table. (For example, you'd usually have a serverid column so that you could record measurement samples of multiple servers. For the sake of this problem, this table contains only one server's data.)
Suppose that measid 1 is CPU utilization in percent and measid 2 is memory usage in megabytes. Your users need a histogram to help them analyze the performance data of a certain measurement over a certain period of time. Your users give you the following as arguments: the number of steps (@numsteps), the measurement ID (@measid), and a date range (@fromdtinclusive and @todtexclusive). Your task is to generate a histogram for the given arguments. Note that you don't have to include steps with 0 occurrences in the result. For example, say a user gives you the following arguments:
DECLARE @numsteps int, @measid
int, @fromdt datetime, @todt
datetime
SELECT @numsteps=5, @measid=1,
@fromdt='20030101', @todt='20030102'
In the Samples table, the minimum measured value for measid 1 in the given period is 26 and the maximum is 50. The number of requested steps in the @numsteps argument is 5. First, you need to calculate the lower and upper bounds of the ranges within the five steps. Because the lower bounds of the ranges aren't inclusive, your calculations should result in the following ranges: 25 to 30, 30 to 35, 35 to 40, 40 to 45, and 45 to 50. Figure 1 shows the steps and ranges for the given arguments.
You need to devise code that will tell you how many measures are in each stepin this case, your code should result in one measure matching Step 1 (26), two matching Step 2 (33, 35), and two matching Step 5 (47, 50). Figure 2 shows how your code's output should look with this specific set of arguments. Following are some solutions I came up with for this problem.