Web Application Performance Design Inspection Questions - Data Access

From Guidance Share

Jump to: navigation, search

- J.D. Meier, Srinath Vasireddy, Ashish Babbar, Rico Mariani, and Alex Mackman


Data Access Issues



Poor schema design

Increased database server processing; reduced throughput.

Failure to page large result sets

Increased network bandwidth consumption; delayed response times; increased client and server load.

Exposing inefficient object hierarchies when simpler would do

Increased garbage collection overhead; increased processing effort required.

Inefficient queries or fetching all the data

Inefficient queries or fetching all the data to display a portion is an unnecessary cost, in terms of server resources and performance.

Poor indexes or stale index statistics

Creates unnecessary load on the database server.

Failure to evaluate the processing cost on your database server and your application

Failure to meet performance objectives and exceeding budget allocations

Consider the following:

How do you pass data between layers? Do you use stored procedures? Do you process only the required data? Do you need to page through data? Do your transactions span multiple data stores? Do you manipulate BLOBs? Are you consolidating repeated data access code?

How Do You Pass Data Between Layers?

Review your approach for passing data between the layers of your application. In addition to raw performance, the main considerations are usability, maintainability, and programmability. Consider the following:

Have you considered client requirements?

Focus on the client requirements and avoid transmitting data in one form and forcing the client to convert it to another. If the client requires the data just for display purposes, simple collections, such as arrays or an Arraylist object, are suitable because they support data binding.

Do you transform the data?

If you need to transform data, avoid multiple transformations as the data flows through your application.

Can you logically group data?

For logical groupings, such as the attributes that describe an employee, consider using a custom class or struct type, which are efficient to serialize. Use the NonSerializable attribute on any field you do not need to serialize.

Is cross-platform interoperability a design goal?

If so, you should use XML, although you need to consider performance issues including memory requirements and the significant parsing effort required to process large XML strings.

Do you use DataSet objects?

If your client needs to be able to view the data in multiple ways, update data on the server using optimistic concurrency, and handle complex relationships between various sets of data, a DataSet is well suited to these requirements. DataSets are expensive to create and serialize, and they have large memory footprints. If you do need a disconnected cache and the rich functionality supported by the DataSet object, have you considered a strongly typed DataSet, which offers marginally quicker field access?

Do You Use Stored Procedures?

Using stored procedures is preferable in most scenarios. They generally provide improved performance in comparison to dynamic SQL statements. From a security standpoint, you need to consider the potential for SQL injection and authorization. Both approaches, if poorly written, are susceptible to SQL injection. Database authorization is often easier to manage with stored procedures because you can restrict your application's service accounts to executing specific stored procedures and prevent them from accessing tables directly.

If you use stored procedures, consider the following:

  • Try to avoid recompiles. For more information about how recompiles are caused, see Microsoft Knowledge Base article 243586, "INF: Troubleshooting Stored Procedure Recompilation," at http://support.microsoft.com/default.aspx?scid=kb;en-us;243586.
  • Use the Parameters collection; otherwise you are still susceptible to SQL injection.
  • Avoid building dynamic SQL within the stored procedure.
  • Avoid mixing business logic in your stored procedures.

If you use dynamic SQL, consider the following:

  • Use the Parameters collection to help prevent SQL injection.
  • Batch statements if possible.
  • Consider maintainability (for example, updating resource files versus statements in code).

When using stored procedures, consider the following guidelines to maximize their performance:

  • Analyze your schema to see if it is well suited to perform the updates needed or the searches. Does your schema support your unit of work? Do you have the appropriate indexes? Do your queries take advantage of your schema design?
  • Look at your execution plans and costs. Logical I/O is often an excellent indicator of the overall query cost on a loaded server.
  • Where possible, use output parameters instead of returning a result set that contains single rows. This avoids the performance overhead associated with creating the result set on the server.
  • Evaluate your stored procedure to ensure that there are no frequent recompilations for multiple code paths. Instead of having multiple if else statements for your stored procedure, consider splitting it into multiple small stored procedures and calling them from a single stored procedure.

Do You Process Only the Required Data?

Review your design to ensure you do not retrieve more data (columns or rows) than is required. Identify opportunities for paging records to reduce network traffic and server loading. When you update records, make sure you update only the changes instead of the entire set of data.

Do You Need to Page Through Data?

Paging through data requires transmitting data from database to the presentation layer and displaying it to the user. Paging through a large number of records may be costly if you send more than the required data over the wire, which may add to the network, memory, and processing costs on presentation and database tiers. Consider the following guidelines to develop a solution for paging through records:

  • If the data is not very large and needs to be served to multiple clients, consider sending the data in a single iteration and caching it on the client side. You can page through the data without making round trips to the server. Make sure you use an appropriate data expiration policy.
  • If the data to be served is based on user input and can potentially be large, consider sending only the most relevant rows to the client for each page size. Use the SELECT TOP statement and the TABLE data type in your SQL queries to develop this type of solution.
  • If the data to be served consists of a large result set and is the same for all users, consider using global temporary tables to create and cache the data once, and then send the relevant rows to each client as they need it. This approach is most useful if you need to execute long-running queries spanning multiple tables to build the result set. If you need to fetch data only from a single table, the advantages of a temporary table are minimized.

More Information

For more information, see How To Page Records in .NET Applications.

Do Your Transactions Span Multiple Data Stores?

If you have transactions spanning multiple data stores, you should consider using distributed transactions provided by the Enterprise Services. Enterprise Services uses the DTC to enforce transactions.

The DTC performs the inter-data source communication, and ensures that either all of the data is committed or none of the data is committed. This comes at an operational cost. If you do not have transactions that span multiple data sources, Transact-SQL (T-SQL) or ADO.NET manual transactions offer better performance. However, you need to trade the performance benefits against ease of programming. Declarative Enterprise Services transactions offer a simple component-based programming model.

Do You Manipulate BLOBs?

If you need to read or write BLOB data such as images, you should first consider the options of storing them directly on a hard disk and storing the physical path or the URL in the database. This reduces load on the database. If you do read or write to BLOBs, one of the most inefficient ways is to perform the operation in a single call. This results in the whole of the BLOB being transferred over the wire and stored in memory. This can cause network congestion and memory pressure, particularly when there is a considerable load of concurrent users.

If you do need to store BLOB data in the database, consider the following options to reduce the performance cost:

  • Use chunking to reduce the amount of data transferred over the wire. Chunking involves more round trips, but it places comparatively less load on the server and consumes less network bandwidth. You can use the DataReader.GetBytes to read the data in chunks or use SQL Server-specific commands, such as READTEXT and UPDATEDTEXT, to perform such chunking operations.
  • Avoid moving the BLOB repeatedly because the cost of moving them around can be significant in terms of server and network resources. Consider caching the BLOB on the client side after a read operation.

Are You Consolidating Repeated Data Access Code?

If you have many classes that perform data access, you should think about consolidating repeated functionality into helper classes. Developers with varying levels of expertise and data access knowledge may unexpectedly take inconsistent approaches to data access, and inadvertently introduce performance and scalability issues.

By consolidating critical data access code, you can focus your tuning efforts and have a single consistent approach to database connection management and data access.

Personal tools