Web Abstract:
- Troubleshoot Microsoft Active Directory (AD) replication.
- Verify that the target Microsoft Active Directory (AD) domain controller’s (DC’s) operating system (OS) and directory service are working properly.
- Verify that the Domain Name System (DNS) directory service is working properly on both the target and source Microsoft Active Directory (AD) domain controllers (DCs).
- Verify that the target Microsoft Active Directory (AD) domain controller (DC) can resolve the source Microsoft Active Directory (AD) domain controller (DC).
- Check the Kerberos authentication protocol and the services Kerberos depends on.
- Check firewall configurations, because some firewall configuration changes might block replication.
|
Active Directory (AD) replication is the method by which directory changes
made at one domain controller (DC) pass to other DCs. AD is a very robust and
fault-tolerant service. Because AD is distributed across many DCs, losing parts
of the whole doesn't cripple the overall directory service. In an AD forest
you must monitor not only the DCs' basic health but also replication between
the DCs. In my experience, replication in an unmonitored forest tends to fall
apart over time, even if you configured the DCs carefully. Monitoring and repairing
replication problems when they occur is much easier than fixing a forest with
accumulated problems. But regardless of whether or not you monitor your AD replication,
you'll inevitably need to troubleshoot it. In this article I explain some basic
replication principles, and I present a straightforward methodology for troubleshooting
AD replication problems. AD replication troubleshooting can be confusing; following
my steps will help remove the "black art" feel from this task.
The Basics
Determining the best approach for troubleshooting replication can be difficult.
Certainly you want to use the method that resolves the problem as quickly as
possible; however, you don't want to skip a step that ultimately drags out the
troubleshooting process. I've always found that a logical, inside-out approach
works the best. Wanting to immediately dive into the fancy troubleshooting tools
is only natural, but you should first use a logical approach to verify that
the basics are working correctly. As you get more comfortable troubleshooting
replication problems, you might breeze through many of the steps. But first
you need the experience of carefully performing each step, to ensure that you
don't jump to the wrong conclusion or have to go back to a previous step. Start
by checking the health of the OS on the DC itself, then check the health of
the directory service. Next, check the DC's basic communications with its fellow
DCs. Finally, verify the protocol that the directory service uses and determine
whether the DCs are authenticating correctly with one another. Following these
steps will help you resolve 90 percent of your replication problems.
|
PROBLEM: Troubleshooting Active Directory (AD) replication problems
SOLUTION: Employ a basic approach and use Windows Server Support
Tools to monitor and repair AD replication
WHAT YOU NEED: Windows Server 2003 or Windows 2000, Windows Server
Support Tools
DIFFICULTY: 3 out of 5
|
Check the Foundation
Your first step is to verify that the DC's OS is working correctly. Replication
errors can be caused by various local errors on a DC, so you need to ensure
that the server's foundation is sound. If you haven't already done so, install
the latest Windows Server Support Tools on all your production systems. (All
the utilities I describe in this article are Windows Server Support Tools.)
You can download these tools from the Microsoft Download Center (http://www.microsoft.com/downloads).
Search for "support tools" for your particular OS version and service pack level.
For a list of useful tools, see the Web-exclusive sidebar
"Replication Troubleshooting Toolkit," http://www.windowsitpro.com, InstantDoc
ID 95634. Resolving all the issues in a large environment might be impractical;
however, you can use an Internet search engine and the Microsoft Knowledge Base
(http://support.microsoft.com/search/?adv=1)
to weed out your most significant problems.
After you install the Windows Server Support
Tools, look at the event logs. First, check the system
log for warnings and errors. If you encounter errors
in the system log, try running NetDiag. Even without
using any of its command-line options, NetDiag runs
23 tests related to the system's network configuration.
Some of the useful tests it runs are domain membership, DNS, client configuration, trust relationships,
Kerberos, and LDAP functionality. If you find a
problem area, rerun NetDiag with the /test:testname
switch and the /v option to get a detailed test analysis
of the area. An important NetDiag option that I refer
to later in the article is the /fix switch, which reregisters the server's DNS entries.
If the directory service log has errors, run DCDiag. DCDiag is a comprehensive
test utility for DCs. Even without using any of its command-line options, DCDiag
runs 27 DC-related tests. As for NetDiag, if you find a problem area, rerun
DCDiag with the /test: testname switch and the /v option to get a detailed
test analysis of the area. Don't get too hung up over errors in the system log
test; any recent errors in the system log will cause this test to fail.
If everything looks good on the home front—that
is, if NetDiag and DCDiag didn't reveal any OS or
directory service–related errors—it's time to start
looking at replication. The best place to start is to
check your DCDiag test results, because DCDiag runs
extensive replication tests.
Let's use the domain that Figure 1 shows
to see how to troubleshoot a common error. This domain, called Deuby.net, has
three DCs. The DCs named Godan and Kohai are in the Hub site. The DC named Sandan
is in the Branch site, connected to the Hub site by a site link with a replication
interval of 15 minutes. Suppose that updates aren't replicating from Kohai to
Godan. Replication is always an inbound operation. Thus, even though replication
in a site is triggered by change notifications from a DC that has been updated,
you need to think of updates as being pulled in by the target DC from
another DC. Start your troubleshooting efforts with the DC that should be receiving
the updates. In my example, this DC is Godan.
Figure 2 shows the partial output from running
DCDiag on Godan. Notice that the Replications test failed because of the error
"[KOHAI] DsBindWithSpnEx() failed with error 1722, The RPC server is unavailable."
Although this error message is dense, we can work through the message to get
a good idea of the problem. The problem obviously has something to do with the
DC Kohai. But what does "DsBindWithSpnEx()" mean? "BindWithSpn" tells us that
the error occurred when Godan attempted to bind (i.e., connect and authenticate)
to Kohai. Therefore the problem appears to be related to Godan unsuccessfully
communicating with Kohai.
We need to determine whether Godan can even locate its replication partner
Kohai. One of the first tests is to ping Kohai's IP address to check basic network
connectivity. If this test works, you could ping Kohai by name (i.e., by its
DNS A record). However, this method isn't a conclusive test for replication
because a DC finds its replication partners not by resolving their A
records (e.g., dc1.mycompany.com), but by resolving a special DNS Canonical
Name (CNAME—i.e., alias) guaranteed to be unique in the forest.
Each DC in the forest must register its CNAME record for the name DsaGuid._msdcs.ForestName;
this CNAME identifies the DC to the replication system as a DC. The CNAME record
maps this string to the DC's A record, which contains its IP address. For example,
the DNS CNAME of dc1.mycompany.com might be d40c01da-23fa-46e6-8bf3798503e2590f._msdcs.mycompany.com.
The CNAME record would be d40c01da-23fa-46e6-8bf3798503e2590f._msdcs.mycompany.com
CNAME dc1.mycompany.com. Note that the directory service agent (DSA) globally
unique identifier (GUID) that comprises the first part of the DC's CNAME isn't
the GUID (specifically, the objectGUID attribute) of the DC's computer object,
as you might expect. Instead, it's the GUID of the NTDS Settings object under
the DC in the Sites container. For example, if DC1 were in the Hub site, its
distinguished name (DN) would be CN=NTDS Settings, CN=DC1,CN=Servers, CN=Hub,CN=sites,
CN=configuration, DC=mycompany,DC= com.
Prev. page  
[1]
2
3
next page