How to Leverage Event Tracing in Real Life
The goal of event tracing is to obtain actionable data about your environment. For example, suppose you have to add a large number of users to AD and you're worried about how the current hardware supporting your Windows 2003 DC will handle the additional authentication traffic. You can use event tracing to get a good picture of the current authentication workload on the server and how many resources each logon event requires. You can then use that information to determine how much more load the additional users will create.
You can set up the trace a couple of ways. The first approach is to schedule an AD trace session to start during the peak logon time in a production environment and collect data long enough to determine each logon event's impact on the DC. The second approach is to enable tracing within a controlled lab environment to determine the average overhead per logon transaction, then extrapolate the average overhead to the number of users you expect to accommodate on the system. Running an AD trace session in a production environment will have a performance impact on the DC. Although the session will likely cause less than 5 percent CPU overhead, some administrators might not be able to run a trace session in their production environments. Thus, let's take a closer look at the lab approach instead.
To run the AD trace session in a lab environment, let's use the Performance Logs and Alerts snap-in to set up two logging sessions on a Windows 2003 DC. One logging session will use the nonsystem providers to capture events that represent AD workload on a DC during a logon. The second session will use the system provider to capture kernel statistics. This additional kernel data ensures that you'll generate an accurate and complete workload report.
To set up the first session, choose the Nonsystem providers option in the snap-in and add all the AD-related providers (i.e., Active Directory: Kerberos, Active Directory: Netlogon, Active Directory: SAM, Local Security Authority, NTLM Security Protocol, and Windows NT Active Directory Service). To set up the second session, choose the Events logged by system provider option, and select, at a minimum, the Process creations/deletions, Thread creations/deletions, Disk input/output, and Network TCP/IP check boxes.
After you set up the two trace logging sessions, manually start each session, then log on to AD on the Windows 2003 machine. After the logon process finishes, stop the two logging sessions. Use tracerpt.exe to combine the two resulting .etl files and generate the CSV, summary, and workload reports.
If you examine a typical .csv file, you'll notice a lot of different events. These events represent transactions that Microsoft has coded into its providers, and the events correspond to a set of actions that are being performed. As the sample .csv entry in Figure 3 shows, such events aren't easily understandable.
The first line in a .csv file contains header information that identifies each data field. For example, the sample entry in Figure 3 contains seven data fields:
Event Namespecifies the name of the event. In this case, the event's name is GetASTicket, which is the client's request for the Kerberos Authentication Service.
Typespecifies the type of event. Typically, you'll see Start or End in this field, which marks the beginning and the end of an event, respectively.
TIDlists the thread ID (in hexadecimal notation) under which the current event is performing.
Clock-Timespecifies the time that the event occurred. The time appears in Integer8 format, which is a 64-bit value that holds the number of 100-nanosecond intervals that have occurred since 12:00 a.m., January 1, 1601. Although the Integer8 format is obscure, Microsoft uses it frequently for services, such as WMI and AD.
Kernel(ms)specifies the time, in clock ticks, used for this event, broken out by kernel-mode CPU time. Clock ticks translate into real time based on the clock cycle of a typical PC, which is 1 clock tick every 10ms. For example, 100 clock ticks would equal 1 second (100 * 10) in real time. Note that the CPU time represented here and for the user time is somewhat misleading because the value listed is the time from the point at which the current thread is started to the time when the event tracer fires the event. Thus, you'll have to rely on the workload report to generate CPU times.
User(ms)specifies the time, in clock ticks, used for this event, broken out by user-mode CPU time.
User Datacontains any associated data that Microsoft has included in the transaction. In this case, Microsoft has provided four additional pieces of information. The first value, 0x00000025, is an instance ID flag that isn't relevant to this discussion. The second value, "administrator", specifies the user ID requesting the authentication service, whereas the third value, "krbtgt/DOMAINA", is the name of the Kerberos ticket-granting service running on the domain called DOMAINA. The final value, "DOMAINA", is the name of the destination domain.
If you don't have a username & password, please
register now.
Reader Comments
Really enjoyed this article..great to have monitoring techniques that can provide so much more information and for what seems to be less of a system overhead as well.
John- January 05, 2004
Very interesting. For some reason local and even domain admin account didn't give me enough rights to start a log session with system providers.
israel- June 01, 2004
Good article but lacks in detail. Wwhat if I want to write code to analyze the ETL file myself? And how can registry be manipulated to enable tracing?