The ADO.NET Entity Framework has been widely adopted by developers because it allows them to interact with SQL Server databases through abstractions,
which provides two primary advantages. First, the abstractions let developers avoid the tedium of having to wire up database calls. Second, the
abstractions shield developers from needing knowledge about the size, cardinality, distribution, and indexing of the data with which they're working.
However, shielding developers from this complexity can produce problems. Some operations are nontrivial from a performance perspective. In many cases,
these nontrivial operations can't be hidden without incurring some stiff performance penalties. Stated differently, a cardinal rule of performance
tuning is that bad code typically equates to bad performance. Because the Entity Framework generates large amounts of frequently complex code, it would
be naive to assume that it can properly optimize the T-SQL code it generates for every operation.
When it comes to addressing Entity Framework performance problems, there are a few key items that DBAs must recognize. First and foremost, DBAs need to
understand that the Entity Framework's core concept of entities differs fundamentally from SQL Server projections. DBAs also need to understand what
techniques are available to address Entity Framework performance problems. These techniques include:
-
Preemptively sizing and testing new Entity Framework applications
-
Using plan guides to tune queries in existing applications
-
Using stored procedures to tackle performance problems in existing applications
-
Using the hourglass approach within stored procedures to work with problematic queries that have a lot of aggregations and a substantial amount
of filtering
- Using stored procedures to address concurrency problems
Understanding Entities
Architecturally, the Entity Framework was designed to help developers overcome a phenomenon commonly referred to as impedance mismatch. In other words,
it helps developers overcome the high degree of friction that results when translating set-based data to and from programming objects (i.e., entities)
with complex serialization graphs. Although SQL Server tables can cleanly store data that can be efficiently mapped to programming objects through the
Entity Framework's intrinsic mapping capabilities, there's a fundamental difference between the two systems: the Entity Framework thinks in terms of
entities, whereas SQL Server thinks in terms of projections. The best way to demonstrate this difference is with an example.
One of SQL Server's greatest strengths is projection, which is the ability to return only the columns (or portions of columns) needed to fulfill a
particular query. Figure 1 shows a simple example of projection, in which a query returns the top 10 users based on the number of comments they've
made. Even though this query is joining a table with six columns and a table with eight columns, the query returns only two columns-the bare amount of
data needed to answer the question "Which users have the most comments?" Just in terms of network chatter alone, projection provides huge performance
benefits.

Figure 1: Example of SQL Server projection
The Entity Framework, however, commonly takes a much different approach to answering the same question because it isn't interested in merely answering
the question. Instead, it's designed to return full details about the entities (i.e., entire rows from tables) that satisfy the question. So, for
example, the query generated by the Entity Framework would likely retrieve all 14 columns (i.e., all the columns from both tables) so that developers
could work with the results in terms of entities rather than projections. Although this approach might initially seem inefficient, developers commonly
need to iterate over returned results for additional processing, outputting information to a screen, and so on. Therefore, it's arguable that getting
everything in one pass avoids multiple roundtrips.
The key takeaway for DBAs is that the Entity Framework thinks in terms of entire rows rather than in terms of projections. As such, it's essential they
realize that a covering index will lose much of its power and efficiency when dealing with Entity Framework applications and databases. Granted,
covering indexes can still address commonly executed Entity Framework queries with multiple predicates, but a lookup is almost always going to be
required to retrieve entire rows instead of allowing mere projections from the columns covered in a particular covering index.
In this way, the Entity Framework effectively shunts impedance mismatch onto SQL Server and the underlying hardware in order to streamline developer
productivity. This means that the Entity Framework effectively offloads the "pain" associated with impedance mismatch to DBAs. Thus, as Entity
Framework applications deal with more and more data and more and more users, they're likely to exhibit performance penalties-just as other
non-optimized and untuned workloads typically do as they get larger and more heavily used.
In Entity Framework solutions, scalability isn't about size or growth. Instead, scalability represents a solution's ability to proportionately match
response times with growth rates. Think of a scale model, for example. Size isn't as important of a concern as proportion. In this sense, Entity
Framework databases can grow without problem. However, when the ratio of how long a query takes to execute doesn't scale with the degree to which the
underlying data has grown, there's a scalability problem.