• subscribe
April 01, 2009 12:00 AM

LINQ to SQL in Action: Joining and Grouping

Learn ways to code data-access functionality in your .NET applications
SQL Server Pro
InstantDoc ID #101721
Downloads
101721.zip

Microsoft built the Language-Integrated Query (LINQ) to SQL component in .NET Framework on top of LINQ to achieve one main purpose: Extend the general-purpose query syntax of LINQ to the world of SQL Server databases. As you may know, accessing in-memory data collections, databases, and XML documents today requires a familiarity with a variety of languages and syntax—one for each technology. With LINQ, you can use the same syntax to query different types of data.

The LINQ query engine is extensible enough to support any collection of objects that exposes a made-to-measure interface—the IQueryable interface. This means that any team of developers could, in theory, expose to LINQ their own data and query it through the common syntax—which is what Microsoft did with the LINQ to SQL and LINQ to XML interfaces. LINQ to SQL, in particular, wraps the content of a SQL Server database and makes it queryable through the LINQ syntax. We'll explore LINQ to SQL in depth here, examining the T-SQL basis for LINQ to SQL queries and walking through examples of using LINQ to SQL to code data-access functionality in .NET applications.

LINQ to SQL and T-SQL
LINQ to SQL doesn't work with databases other than SQL Server. So, unlike ADO.NET or industry-standard object-relational mapping tools, you can't use LINQ to SQL to work with, say, Oracle databases. (You can instead load content stored in Oracle databases through DataSets and query over that using the LINQ to DataSet model.) LINQ to SQL still uses T-SQL under the hood to run its queries. LINQ to SQL doesn't attempt to push an alternate route and doesn't aim to replace T-SQL; it simply offers a higher-level set of query tools for developers to leverage. LINQ to SQL is essentially a more modern tool to generate T-SQL dynamically based on the current configuration of some business-specific objects. T-SQL operates on a set of tables, whereas LINQ to SQL operates on an object model created after the original set of tables.

LINQ to SQL opens up SQL Server querying to the larger universe of .NET developers. However, with LINQ to SQL as well as with raw T-SQL, only T-SQL and SQL Server database experts can fine-tune the queries and remove any bottlenecks and performance hits.

Preparing for LINQ to SQL
LINQ to SQL works by exposing an object-based query language to developers and producing T-SQL statements in response to their input. With LINQ to SQL, you don't explicitly use a connection string, nor do you open or close a connection explicitly. All you do is interact with a central console called the data context.

The data context is a class that's ordinarily created by the Microsoft Visual Studio 2008 wizard when you choose to add a new LINQ to SQL class item to the current project, as Figure 1 shows. Adding a LINQ to SQL class to a project displays the Object Relational Designer, through which the developer selects tables, stored procedures, and functions to add to the LINQ to SQL data context.

The completed data context class incorporates references to collections that represent all selected tables. The data context source file also includes the definition of a class that represents an individual row on each selected table. In other words, if you select the Customers and Orders tables from the Northwind database, you'll have in the data context class properties, which the C# code sample in Listing 1 shows.

You'll work against these collections using the LINQ syntax. Any query you perform against the collections will then be translated to T-SQL statements and executed against the database. The data context class takes care of all this for you.

The data context—usually a class derived from the base class DataContext—has a Log property, as the following example shows, which you use to register a stream to capture all T-SQL statements actually generated for each LINQ query to the database.

NorthwindDataContext nwind = new NorthwindDataContext();
nwind.Log = Console.Out;

Through the Log property, you can track what really happens under the hood—that is, snoop on generated T-SQL statements—and decide to intervene with changes if you think the T-SQL code could be improved.

Running a Query through LINQ
The LINQ query language works on the LINQ to SQL object model in much the same way as T-SQL works on table rows. To express a simple query that selects customers from a given country and returns them ordered by name, in LINQ you use the following query:

var data = from c in nwind.Customers
           where c.Country

"Spain"
           orderby c.CompanyName
           select c;

As you can see, in this query there's no explicit reference to T-SQL; furthermore, the only point of contact between LINQ and the outside world is the data collection you work with. That collection is responsible for taking the input data (filter, order) and selecting the results. Being a LINQ to SQL collection, it will do so by running a dynamically generated T-SQL query. As a result, LINQ to SQL offers a platform for you to express entities in the domain space as objects. You can then query entities and update them, using an object-oriented syntax.

^^^^ ^^^^

The bold elements in the code fragment are context keywords added to C# that map to the underlying LINQ query engine. Their role is analogous to the role played by similar T-SQL keywords. You might wonder why select—the projection operator—appears at the end of the statement and from is instead at the beginning of it. The only reason for having this is the need to enable Visual Studio 2008 IntelliSense on LINQ expressions. Because the projection is expressed as a list of properties on any of the objects referenced in the query, IntelliSense would not in fact be possible if the projection is declared before the actual objects. Consider the following:

var data = from c in nwind.Customers
           where c.Country
"Spain"
           orderby c.CompanyName
           select new { c.CompanyName,
           c.ContactName };

With such a select expression at the beginning of the statement, no IntelliSense would be possible as c is not defined yet.

It's interesting to note that LINQ to SQL statements are not run immediately. This is why you need to use a new C# keyword—the var keyword—in the statement. The var keyword (the Dim keyword in Visual Basic) indicates that the developer might not know the data's type at writing time. By using var, the developer delegates the compiler to resolve the type at compile time. The var keyword isn't a sign of weak typing; no weak types are ever used. It is, instead, a sign that some dynamically generated code is going to run and the actual type of results may be anonymous.

Data doesn't flow in the assigned variable right away as the LINQ to SQL statement executes. The execution of the previous statement doesn't fill the variable with any data. Instead, it simply prepares the ground for the execution of the T-SQL statement. The T-SQL statement is generated and configured but isn't executed until the main code stream needs to process resulting data. In other words, as soon as the code consumes the content of the previously mentioned data variable, the T-SQL statement is sent to the database. For example, if you bind the content of the data variable to a control, triggering the data binding process will actually trigger the following T-SQL statement:

GridView1.DataSource = data;
GridView1.DataBind();

This lazy loading feature can have some undesired effects if you don't master it properly. For example, consider the code in Listing 2, a code fragment from an ASP.NET page. You might want to run SQL profiler to trace its execution. First, the code attempts to get all orders shipped through the same shipping company. When the execution flow reaches the foreach statement, the T-SQL statement highlighted in Figure 2 runs.

The following is a simplified version of the statement:

SELECT * FROM orders WHERE shipvia=3

You may think that after running the first query to grab all matching records in the Orders table, you're fine and can work safely and effectively with any selected object. Well, not exactly. As Figure 2 shows, many additional queries actually hit the SQL Server database. Why is that? Let's expand on one of them, the exec sp_executesql statement, in Listing 3. All additional queries are similar to the preceding query, and all they do is retrieve information about the customer who placed the order being processed in the foreach loop in Listing 2.

As a result, you run one query to retrieve all the orders, plus one additional query for each selected order to access customer information. This is the default behavior, which you might not have expected.

By changing the parameters for lazy loading and specifying your fetch plan, you can force LINQ to SQL to run a totally different query and, more importantly, to get all order and customer data in a single shot. Here's how you need to modify your code to do so:

DataLoadOptions options = new DataLoadOptions();
options.LoadWith<Order>(o => o.Customer);
dataContext.LoadOptions = options;

By setting the LoadOptions property, you specify a static fetch plan that tells the runtime to load all customer information at the same time order information is loaded. Based on this, the LINQ to SQL engine can optimize the query and retrieve all data in a single statement. Listing 4 shows the SQL Server Profiler detail information for the exec sp_executesql query after the code modification. The query now includes a LEFT OUTER JOIN that loads orders and related customers in a single step.

What does this mean to you? LINQ to SQL queries have their own logic, and the underlying behavior might not be exactly what you expect. LINQ to SQL produces T-SQL statements, but it's a totally different engine—so don't make assumptions about LINQ to SQL behavior before you have verified how it works in practice. LINQ to SQL is a brand-new API for operating on a SQL Server database; it's not just another way of writing T-SQL code. Let's see now how to perform more advanced query that involve joining and grouping operations on data.



ARTICLE TOOLS

Comments
  • Manuel
    3 years ago
    May 20, 2009

    one problem is that linqtosql (L2S) is now legacy -- linq to entities (L2E, part of the entity framework, EF) is the preferred choice for future compatibility.

  • Anne
    3 years ago
    May 12, 2009

    Marco, thanks for your comment. Here's Dino Esposito's reply:

    "The reader is right when he says that most of the time T-SQL statements written by programmers are the main cause of troubles. However, the challenge (and complexity) of modern software require that developers work with object models thus raising the abstraction level from physical to a more conceptual view. O/RM tools (LINQ to SQL is just one of the simplest) serve this purpose and may generate T-SQL code. Or they can be programmed to invoke stored procedures written by DBAs.

    I also suggest that DBAs learn more about modeling techniques and O/RM tools. O/RM tools (including LINQ to SQL) offer a mapping layer where bindings between object properties and tables/columns are made. This mapping must be the realm of DBAs; the model must be the realm of developers/architects."

  • Marcos
    3 years ago
    May 04, 2009

    It could be a stupid thought, but isn't it easier let a DBA write the T-SQL statements? Just keep the presentation's programmers doing what they do best (Good and useful interfaces). During my DBA carrer I noticed that 99% of the problems are caused by the poor design, most of it done by programmers.

You must log on before posting a comment.

Are you a new visitor? Register Here