ADO.NET 2.0 Performance Guidelines - Design Considerations

From Guidance Share

Jump to: navigation, search

- J.D. Meier, Srinath Vasireddy, Ashish Babbar, and Alex Mackman


Contents

Design Your Data Access Layer Based on How the Data Is Used

If you choose to access tables directly from your application without an intermediate data access layer, you may improve the performance of your application at the expense of maintainability. The data access logic layer provides a level of abstraction from the underlying data store. A well-designed data access layer exposes data and functionality based on how the data is used and abstracts the underlying data store complexity.

Do not arbitrarily map objects to tables and columns, and avoid deep object hierarchies. For example, if you want to display a subset of data, and your design retrieves an entire object graph instead of the necessary portions, there is unnecessary object creation overhead. Evaluate the data you need and how you want to use the data against your underlying data store.


Cache Data to Avoid Unnecessary Work

Caching data can substantially reduce the load on your database server. By caching data, you avoid the overhead of connecting to your database, searching, processing, and transmitting data from your database server. By caching data, you directly improve performance and scalability in your application.

When you define your caching strategy, consider the following:

  • Is the data used application-wide and shared by all users, or is the data specific to each user? Data that is used across the application, such as a list of products, is a better caching candidate than data that is specific to each user.
  • How frequently do you need to update the cache? Even though the source data may change frequently, your application may not need to update the cache as often. If your data changes too frequently, it may not be a good caching candidate. You need to evaluate the expense of updating the cache compared to the cost of fetching the data as needed.
  • Where should you cache data? You can cache data throughout the application layers. By caching data as close as possible to the consumer of the data, you can reduce the impact of network latency.
  • What form of the data should you cache? The best form of data to cache is usually determined by the form that your clients require the data to be in. Try to reduce the number of times that you need to transform data.
  • How do you expire items in the cache? Consider the mechanism that you will use to expire old items in the cache and the best expiration time for your application.


Connect by Using Service Accounts

There are several ways to authenticate and to open a connection to the database. You can use SQL authentication that has an identity specified in the connection string. Or, you can use Windows authentication by using the process identity, by using a specific service identity, or by impersonating the original caller's identity.

From a security perspective, you should use Windows authentication. From a performance perspective, you should use a fixed service account and avoid impersonation. The fixed service account is typically the process account of the application. By using a fixed service account and a consistent connection string, you help ensure that database connections are pooled efficiently. You also help ensure that the database connections are shared by multiple clients. Using a fixed service account and a consistent connection string is a major factor in helping application scalability.


References

  • For more information, see "The Trusted Subsystem Model" in the "Authentication and Authorization" chapter of "Building Secure ASP.NET Applications: Authentication, Authorization and Secure Communication" on MSDN at http://msdn.microsoft.com/library/en-us/dnnetsec/html/SecNetch03.asp. This chapter explains how to use service accounts or process identity to connect to a database. You can also use the chapter to learn about the advantages and disadvantages of Windows and SQL authentication.


Acquire Late, Release Early

Your application should share expensive resources efficiently by acquiring the resources late, and then releasing them as early as possible. To do so:

  • Open database connections right when you need them. Close the database connections as soon as you are finished. Do not open them early, and do not hold them open across calls.
  • Acquire locks late, and release them early.


Close Disposable Resources

Usually, disposable resources are represented by objects that provide a Dispose method or a Close method. Make sure that you call one of these methods as soon as you are finished with the resource.


Reduce Round Trips

Network round trips are expensive and affect performance. Minimize round trips by using the following techniques:

  • If possible, batch SQL statements together. Failure to batch work creates additional and often unnecessary trips to the database. You can batch text SQL statements by separating them with a semicolon or by using a stored procedure. If you need to read multiple result sets, use the NextResult method of the DataReader object to access subsequent result sets.
  • Use connection pooling to help avoid extra round trips. By reusing connections from a connection pool, you avoid the round trips that are associated with connection establishment and authentication. For more information, see "Connections" later in this chapter.
  • Do not return results if you do not need them. If you only need to retrieve a single value, use the ExecuteScalar method to avoid the operations that are required to create a result set. You can also use the ExecuteNonQuery method when you perform data definition language (DDL) operations such as the create table operation. This also avoids the expense of creating a result set.
  • Use caching to bring nearly static data close to the consumer instead of performing round trips for each request.


Implicit Round Trips

Be aware that certain operations can cause implicit round trips. Typically, any operation that extracts metadata from the database causes an implicit round trip. For example, avoid calling DeriveParameters if you know the parameter information in advance. It is more efficient to fill the parameters collection by setting the information explicitly. The following code sample illustrates a call that causes an implicit round trip.

// This causes an implicit round trip to the database
SqlCommandBuilder.DeriveParameters(cmd);



Return Only the Data You Need

Evaluate the data that your application actually requires. Minimize the data that is sent over the network to minimize bandwidth consumption. The following approaches help reduce data over the network:

  • Return only the columns and rows that you need.
  • Cache data where possible.
  • Provide data paging for large results.


Use Windows Authentication

From a security perspective, you should use Windows authentication to connect to Microsoft SQL Server. There are several advantages to using Windows authentication. For example, credentials are not passed over the network, database connection strings do not contain credentials, and you can apply standard Windows security policies to accounts. For example, you can enforce use of strong passwords and apply password expiration periods.

From a performance perspective, SQL authentication is slightly faster than Windows authentication, but connection pooling helps minimize the difference. You also need to help protect the credentials in the connection string and in transit between your application and the database. Helping to protect the credentials adds to the overhead and minimizes the performance difference.

Note Generally, local accounts are faster than domain accounts when you use Windows authentication. However, the performance saving needs to be balanced with the administration benefits of using domain accounts.


Choose the Appropriate Transaction Type

Proper transaction management minimizes locking and contention, and provides data integrity. The three transaction types and their usage considerations include the following:

  • Native database support. Native database support for transactions permits you to control the transaction from a stored procedure. In SQL Server, use BEGIN TRAN, COMMIT TRAN, and ROLLBACK to control the transaction's outcome. This type of transaction is limited to a single call from your code to the database, although the SQL query or stored procedure can make use of multiple stored procedures.
  • ADO.NET transactions. Manual transactions in ADO.NET enable you to span a transaction across multiple calls to a single data store. Both the SQL Server .NET Data Provider and the OLE DB .NET Data Provider implement the IDbTransaction interface and expose BeginTransaction on their respective connection object. This permits you to begin a transaction and to run multiple SQL calls using that connection instance and control the transaction outcome from your data access code.
  • Enterprise Services distributed transactions. Use declarative, distributed transactions when you need transactions to span multiple data stores or resource managers or where you need to flow transaction context between components. Also consider Enterprise Services transaction support for compensating transactions that permit you to enlist nontransactional resources in a transaction. For example, you can use the Compensating Resource Manager to combine a file system update and a database update into a single atomic transaction.

Enterprise Services distributed transactions use the services of the Microsoft Distributed Transaction Coordinator (DTC). The DTC introduces additional performance overhead. The DTC requires several round trips to the server and performs complex interactions to complete a transaction.


Use Stored Procedures

Avoid embedded SQL statements. Generally, well-designed stored procedures outperform embedded SQL statements. However, performance is not the only consideration. When you choose whether to store your SQL commands on the server by using stored procedures or to embed commands in your application by using embedded SQL statements, consider the following issues:

  • Logic separation. When you design your data access strategy, separate business logic from data manipulation logic for performance, maintainability, and flexibility benefits. Validate business rules before you send the data to the database to help reduce network round trips. Separate your business logic from data manipulation logic to isolate the impact of database changes or business rule changes. Use stored procedures to clarify the separation by moving the data manipulation logic away from the business logic so that the two do not become intertwined.
  • SQL optimizations. Some databases provide optimizations to stored procedures that do not apply to dynamic SQL. For example, Microsoft SQL Serverâ„¢ versions prior to SQL Server 2000 kept a cached execution plan for stored procedures. The cached execution plan for stored procedures reduced the need to compile each stored procedure request. SQL Server 2000 is optimized to cache query plans for both stored procedure and for dynamic SQL query plans.
  • Tuning/deployment. Stored procedure code is stored in the database and permits database administrators to review data access code. Database administrators can tune both the stored procedures and the database, independent of the deployed application. The application does not always need to be redeployed when stored procedures change.

Embedded SQL is deployed as part of the application code and requires database administrators to profile the application to identify the SQL actually used. Profiling the application complicates tuning, because the application must be redeployed if any changes are made.

  • Network traffic sent to the server. Source code for stored procedures is stored on the server. Only the name and parameters are sent across the network to the server. Conversely, when you use embedded SQL, the full source of the commands must be transmitted each time the commands are run. When you use stored procedures, you can reduce the amount of data that is sent to the server when large SQL operations are frequently run.
  • Simplified batching of commands. Stored procedures make it easy to batch work and provide simpler maintenance.
  • Data security and integrity. With stored procedures, administrators can secure tables against direct access or manipulation, and they can only permit the execution of selected stored procedures. Both users and applications are granted access to the stored procedures that enforce data integrity rules. Embedded SQL usually requires advanced permissions on tables. Using advanced permissions on tables is a more complex security model to maintain.
  • SQL Injection. Avoid using dynamically generated SQL with user input. SQL injection occurs when input from a malicious user is used to perform unauthorized actions, such as retrieving too much data or destructively modifying data. Parameterized stored procedures and parameterized SQL statements can help reduce the likelihood of SQL injection. Parameter collections force parameters to be treated as literal values so that the parameters are not treated as executable code. You should also constrain all user input to reduce the likelihood that a malicious user could use SQL injection to perform unauthorized actions.


References


Prioritize Performance, Maintainability, and Productivity when You Choose How to Pass Data Across Layers

You should consider several factors when you choose an approach for passing data across layers:

  • Maintainability. Consider how hard it is to build and keep up with changes.
  • Productivity. Consider how hard it is to implement the solution.
  • Programmability. Consider how hard it is to code.
  • Performance. Consider how efficient it is for collections, browsing, and serializing.

This section summarizes the main approaches for passing data across application layers and the relative tradeoffs that exist:

  • DataSets. With this approach, you use a generic DataSet object. This approach offers great flexibility because of the extensive functionality of the DataSet. This includes serialization, XML support, ability to handle complex relationships, support for optimistic concurrency, and others. However, DataSet objects are expensive to create because of their internal object hierarchy, and clients must access them through collections.

The DataSet contains collections of many subobjects, such as the DataTable, DataRow, DataColumn, DataRelation and Constraint objects. Most of these objects are passed with the DataSet between the layers. This is a lot of objects and a lot of data to be passed between the layers. It also takes time to fill a DataSet, because there are many objects that need to be instantiated and populated. All of this affects performance. Generally, the DataSet is most useful for caching when you want to create an in-memory representation of your database, when you want to work with relations between tables, and when you want to perform sorting and filtering operations.

  • Typed DataSets. Instantiation and marshaling performance of the typed DataSet is roughly equivalent to the DataSet. The main performance advantage of the typed DataSet is that clients can access methods and properties directly, without having to use collections.
  • DataReaders. This approach offers the optimum performance when you need to render data as quickly as possible. You should close DataReader objects as soon as possible and make sure that client applications cannot affect the amount of time the DataReader and, hence, the database connection is held open.

The DataReader is very fast compared to a DataSet, but you should avoid passing DataReader objects between layers, because they require an open connection.

  • XML. This is a loosely coupled approach that natively supports serialization and collections of data. For example, an XML document can contain data for multiple business entities. It also supports a wide range of client types. Performance issues to consider include the fact that XML strings can require substantial parsing effort, and large and verbose strings can consume large amounts of memory.
  • Custom Classes. With this approach, you use private data members to maintain the object's state and provide public accessor methods. For simple types, you can use structures instead of classes, which means you avoid having to implement your own serialization. The main performance benefit of custom classes is that they enable you to create your own optimized serialization. You should avoid complex object hierarchies and optimize your class design to minimize memory consumption and reduce the amount of data that needs to be serialized when the object is passed between layers.


References


Consider How to Handle Exceptions

In data access code in particular, you can use try/finally blocks to ensure that connections and other resources are closed, regardless of whether exceptions are generated. However, be aware of the following considerations:

  • Exceptions are expensive. Do not catch exceptions and then throw them again if your data access logic cannot add any value. A less costly approach is to permit the exception to propagate from the database to the caller. Similarly, do not wrap transaction attempts with try/catch blocks unless you plan to implement retry mechanisms.
  • If you want to completely abstract your caller from the data-specific details, you have to catch the exception, you have to log detailed information to a log store, and then you have to return an enumerated value from a list of application-specific error codes. The log store could be a file or the Windows event log.



Use Appropriate Normalization

Overnormalization of a database schema can affect performance and scalability. For example, if you program against a fully normalized database, you are often forced to use cross-table joins, subqueries, and data views as data sources. Obtaining the right degree of normalization involves tradeoffs.

On one hand, you want a normalized database to minimize data duplication, to ensure that data is logically organized, and to help maintain data integrity. On the other hand, it may be harder to program against fully normalized databases, and performance can suffer. Consider the following techniques:

  • Start with a normalized model. Start with a normalized model and then de-normalize later.
  • Reduce the cost of joins by repeating certain columns. Deep joins may result in the creation of temporary tables and table scans. To reduce the cost of joining across multiple tables, consider repeating certain columns.
  • Store precomputed results. Consider storing precomputed results, such as subtotals, instead of computing them dynamically for each request.
Personal tools