I recently helped a client migrate an application’s back-end databases from Sybase to SQL Server. One of the databases used by this application, which I’ll call MyApp, was a shared database. The other applications that used this database weren’t being migrated from Sybase to SQL Server, so the client needed a data synchronization mechanism that would transfer over changes made in the Sybase tables to the SQL Server tables.
The SQL Server users of MyApp only read data from 19 tables of 100+ tables in the shared database. The data in those tables was modified by several other applications that were still utilizing Sybase. Because the users of the Sybase applications were spread across multiple time zones around the globe, data changes occurred practically every minute, 24 ´ 7. For this reason, the client wanted no more than a 30-minute delay in delivering data changes to SQL Server.
Creating the Data Synchronization Mechanism
Typically, configuring replication is the simplest way to synchronize selected tables in two environments. However, SQL Server doesn’t support Sybase publishers. And using Sybase Replication Server or another third-party tool wasn’t an option for the client for budgetary reasons.
Another synchronization approach is modifying all the applications using the shared database so that updates are made to both the Sybase and SQL Server databases in one transaction. Unfortunately, that approach wasn’t possible because the client wanted to minimize the impact of MyApp’s SQL Server migration on the other applications.
So, using SQL Server Integration Services (SSIS) was my only option. I needed to create an SSIS package that would synchronize the 19 tables and run every 30 minutes. The data synchronization process needed to complete in 12 minutes or less so that at least 18 minutes would be left for other processes to execute during the 30-minute cycle.
After examining the 19 tables, I found that they could be split into two groups. One group consisted of 16 relatively small tables, each of which had 150,000 or fewer rows. The other group consisted of three large tables. Two of these tables had about 1 million rows, and the third had more than 20 million rows.
I also found that all of the tables allowed INSERT, UPDATE, and DELETE operations. However, they didn’t have any timestamps to indicate when a row had been inserted or updated last. More important, I discovered that the deleted rows weren’t “archived” or marked as deleted—the rows were physically removed from the tables when the DELETE command was used. As a result, the biggest obstacle for data synchronization was determining which rows were deleted. The only way to find them was by comparing the data in the Sybase and SQL Server tables.
Initially, I decided to implement a four-phase process for all the tables:
- Create stage tables in SQL Server (one for each Sybase table).
- Copy all the data in the Sybase tables to the stage tables.
- Merge the data in the stage tables with the data in their corresponding core tables (i.e., the production SQL Server tables).
- Use the TRUNCATE TABLE command to remove all the rows from the stage tables in preparation for the next run.
The merge phase turned out to be the trickiest. I knew that running the MERGE command against even a small table (150,000 rows) would hurt server responsiveness, so I devised an update operation that would run in batches of 50,000 records. The update operation consisted of four steps:
- Delete the rows in the core table that don’t exist in the stage table.
- Update the core table with the corresponding rows from the stage table.
- Delete all the rows that were used in the update in step 2 from the stage table.
- Insert the stage table’s remaining rows into the core table.
Figure 1 shows the results of these steps in a simple scenario.

Figure 1: Results of each step in a sample merge operation for small tables
Initially, I created an SSIS package that used the four-phase process to synchronize data in both small and large tables. I discovered that the process worked fine for the small tables, but it took significantly longer to complete—45 minutes—for the three large tables. The vast majority of that time was spent copying and merging the data. Just copying the data from the large Sybase tables to the stage tables took about 15 minutes on the hardware provided (a dedicated virtual box with 2 CPUs, 3.5GB of RAM, and a 32-bit version of SQL Server 2008). I needed a different approach to deal with the large tables to meet requirement of having the entire data synchronization process complete in 12 minutes or less.