Wednesday, December 29, 2010

Entity Framework Best Practices

Dear Friends,

This post is regarding some of points to be considered while writing the LINQ Queries which may affect the performance alot.

For Beginners in Entity Framework, Please review: Entity Framework Basics

Kindly, find all the below points which can increase the performance:

Compiled queries
When you have an application that executes structurally similar queries many times in the Entity Framework, you can frequently increase performance by compiling the query one time and executing it several times with different parameters. For example, an application might have to retrieve all the customers in a particular city; the city is specified at runtime by the user in a form. LINQ to Entities supports using compiled queries for this purpose.

use Compiled queries to amortize the overhead inherent in SQL generation.
The CompiledQuery class provides compilation and caching of queries for reuse.
For Example:
// Compiled Query
public static Func> _statesByCountry =
CompiledQuery.Compile((Entities db, int CountryID) => db.States.Where(s => s.CountryID == CountryID));

//Using Compiled Query
public virtual IEnumerable GetStatesByCountryID(int countryID)
return _ statesByCountry (Context, countryID).ToList();

Select N+1
Select N + 1 is a data access anti-pattern where the database is accessed in a suboptimal way. Suppose that want to show all Sites Information from all schools. The native implementation would be something like:
var productQuery = from product in _ctx. Product
select Product;

foreach (Product product in productQuery)
         //lazy loading of comments list causes:
        // SELECT * FROM Sites where SchoolId = @SchoolId
       foreach (ProdutDetail productDetail in product.ProductDetails)
       //print comment...

In this example, we can see that we are loading a list of Product (the first select) and then traversing the object graph. However, we access the collection in a lazy fashion, causing Entity Framework to go to the database and bring the results back one row at a time. This is incredibly inefficient.
The solution for this example is simple. Force an eager load of the collection using the Include method to specify what pieces of the object model we want to include in the initial query.
// SELECT * FROM Product JOIN ProductDetail ..
var productQuery = (from product in _ctx.Product.Include("ProductDetail")
                               select product);

foreach (Product product in productQuery)
             // no lazy loading of comments list causes
            foreach (ProdutDetail productDetail in product.ProductDetails)
             //print comment...
In this case, we will get a join and only a single query to the database.
Note: this is the classical appearance of the problem. It can also surface in other scenarios, such as calling the database in a loop, or more complex object graph traversals. In those cases, it is generally much harder to see what is causing the issue.

Avoid too many joins
Queries with too many joins might be a performance problem.
Each join requires the database to perform additional work, and the complexity and cost of the query grows rapidly with each additional join. While relational database are optimized for handling joins, it is often more efficient to perform several separate queries instead of a single query with several joins in it.
For OLTP systems, you should consider simplifying your queries or simplifying the data model. While we do not recommend avoiding joins completely, we strongly discourage queries with large numbers of joins. Another issue to pay attention to is possible Cartesian products in queries contains joins, it is very easy to create such a thing and not notice it during development.

Avoid Too Many Database Calls Per Session
One of the most expensive operations that we can do in our applications is to make a remote call. Going beyond our own process is an extremely expensive operation. Going beyond the local machine is more expensive still.
Calling the database, whether to query or to write, is a remote call, and we want to reduce the number of remote calls as much as possible.
There are several reasons why this can be:

  1. A large number of queries as a result of a Select N + 1
  2. Calling the database in a loop
  3. Updating (or inserting / deleting) a large number of entities
  4. A large number of (different) queries that we execute to perform our task
For the first reason, you can see the suggestions for Select N + 1.
Calling the database in a loop is generally a bug, and should be avoided. Usually you can restructure the code in such a way that you are not required to call the database in that way. All of the above points must taken into consideration while writing the Complex LINQ Queries. 


Paras Sanghani