DOWNLOAD THE CODE:
Download the Code 9086.zip

Learn how to deploy the technology behind Microsoft's scale-out strategy

Editor's Note: This is the first article in a three-part series about distributed partitioned views in SQL Server 2000.

Online transaction processing (OLTP) environments and large Web site databases usually consist of many individual queries requesting or manipulating relatively small amounts of data. As the system grows, and users issue more queries against the database, DBAs usually try to improve response time by scaling up, which means increasing the server's power. You can add CPUs, replace CPUs with faster ones, add more memory, expand the network, or add faster disk drives with smarter controllers. But at some point, you exhaust the available resources for scaling up because you reach your machine's—or your budget's—limit. SQL Server 2000 introduces a solution for the growing need for more processing power—scaling out.

When you scale out, you split huge tables into smaller tables, each of which is a subset, or partition, of the original table. You place each partition on a separate server. You manage each server independently, but together the servers form a federation. To access the data on any of the partitions, you define a view with the same name on each of the servers, making transparent the fact that the data is distributed among several nodes. A user or an application connecting to any of the servers can issue all Data Manipulation Language (DML) statements (i.e., SELECT, INSERT, UPDATE, and DELETE) against the view as if it were the original table. SQL Server 2000 intercepts the statements and routes them to the relevant servers. This configuration distributes the processing load among the federation members.

SQL Server 7.0 lets you create local partitioned views (see Claude Seidman, "Creating Horizontally Partitioned Views," April 2000, and Tony Rogerson's response to Seidman's article in Reader to Reader, "Accessing Appropriate Table View," June 2000). In SQL Server 7.0, you can also create partitioned views across multiple servers, but they aren't updateable, which greatly reduces their usefulness. In SQL Server 7.0 and earlier versions, any view based on a UNION operation is not updateable, whether it's on a single server or distributed across multiple servers. SQL Server 2000 removes this general restriction (letting users update a certain class of views based on a UNION statement) and introduces new optimization techniques for processing partitioned views. In this article, we discuss the new optimization techniques and show you how to set up and modify distributed partitioned views.

Setting Up Distributed Partitioned Views
Setting up distributed partitioned views involves three steps: setting up the linked servers, creating the partitioned tables, and creating the partitioned views.

Setting up the linked servers. Setting up distributed partitioned views requires a preliminary step that isn't necessary when you set up local partitioned views. Because partitioned tables are distributed across multiple servers, each server needs access to every other server. So, you need to configure all the servers as linked servers. Suppose you want to partition the tables Customers and Orders across three servers called Shire, Hobbiton, and Rivendell. Shire needs to point to Hobbiton and Rivendell; Hobbiton needs to point to Shire and Rivendell; and Rivendell needs to point to Shire and Hobbiton. Figure 1, page 34, shows the linked servers setup. Each arrow stands for a linked server configuration. Listing 1, page 34, contains the script to set up the linked servers on Shire, or Node1.

The @server parameter in the sp_addlinkedserver stored procedure specifies the logical name that we use to refer to the linked server, and the @datasrc parameter specifies the name of the server we want to access. Notice that in this script, remote server names consist of two parts separated with a backslash, as in server\instance. SQL Server 2000 supports multiple instances of SQL Server installations on the same server. So if you want to run these code examples on a single machine, you can install three instances of SQL Server 2000 on your machine and simply replace the server names used in the scripts with your server\instance names. For example, if you have a server called Server1, you can install three instances of SQL Server 2000 called Instance1, Instance2, and Instance3, and refer to them as Server1\Instance1, Server1\Instance2, and Server1 \Instance3, respectively. (For more information about multiple instances in SQL Server 2000, see Kalen Delaney, Inside SQL Server, "Multiple Instances," July 2000, and "Managing Multiple Instances," page 29 in this issue.

After you define the connection to each of the other two servers, use the procedure sp_ serveroption to set lazy schema validation for each linked server. This setting optimizes performance by ensuring that the query processor won't request meta data for any of the linked tables until users request data from the remote member table. Microsoft introduced the lazy schema validation option in SQL Server 2000.

When a user executes a distributed query on the linked server, the local server logs on to the linked server on behalf of the current user. If the following conditions are met, you don't need to configure linked server security:

  • Users connect to SQL Server by using Windows Authentication.
  • Security account delegation is available on the client machine (where the user interface is running) and on the local server. Security delegation is available with Windows 2000.

The configuration in this example meets these requirements, so you don't need to configure linked server security. The three servers are member servers in a Win2K domain called MIDDLE _EARTH, and the users connecting to the servers use Windows Authentication. If your configuration doesn't meet these security requirements, you need to configure linked server security by using the stored procedure sp_addlinkedsrvlogin.

Next, run the script in Listing 2 to set up the linked servers on Hobbiton, or Node 2. Last, run the script in Listing 3 to set up the linked servers on Rivendell, or Node 3.

Creating the partitioned tables. After you configure the linked servers, and each server can communicate with the other servers, you're ready for the most important part of distributed partitioned views—creating the partitioned tables. On each of the servers, create a table that has the same structure as the original table that you're splitting. Each table will hold a horizontal slice of the original table. The key point in creating the partitioned tables is that each table contains a column called the partitioning column, and the value in this column determines which of the partitioned tables any row can be inserted into. The partitioning criteria, which you implement as a CHECK constraint, define the subset of the rows that each partition will host. The CHECK constraints on each table must be mutually exclusive. Each row that you want to insert must satisfy only one of the CHECK constraints, as we'll demonstrate.

We use a UNION statement to combine these tables into an updateable view. As we mentioned, views based on UNION statements are updateable in certain situations. If you want to maximize query efficiency, exploit the new optimization capabilities introduced with SQL Server 2000, and allow updates directly to the view, then you need to ensure that the following conditions are met in each of the tables:

  • The partitioning column needs to be part of the primary key of the underlying tables, must not allow NULLs, and can't be a computed column.
  • The CHECK constraint defined on the partitioning column that defines the partitioning criteria can use only the following operators: BETWEEN, AND, OR, <, <=, >, >=, =.
  • The tables can't have identity or timestamp columns, and none of the columns in any of the tables can have a DEFAULT constraint.

You need to answer two important questions: Which column will you use as the partitioning column, and which range of values does each partition need to host? The most efficient partitioning criterion is one that defines a range of values in each partitioned table that users are most likely to access locally and that results in rows distributed as evenly as possible. This criterion could be a range of customers, a range of part numbers, or a geographical region. This setup lets the local server process most of the queries, minimizing the need for distributed processing.

   Prev. page   [1] 2     next page
 
 

ADS BY GOOGLE