SQL Server 2000 Performance Guidelines - Queries

From Guidance Share

Jump to: navigation, search

- J.D. Meier, Srinath Vasireddy, Ashish Babbar, Sharon Bjeletich and Alex Mackman


Contents

Know the Performance and Scalability Characteristics of Queries

The best way to achieve performance and scalability is to know the characteristics of your queries. Although it is not realistic to monitor every query, you should measure and understand your most commonly used queries. Do not wait until you have a problem to perform this exercise. Measure the performance of your application throughout the life cycle of your application.

Good performance and scalability also requires the cooperation of both developers and database administrators. The process depends on both query development and index development. These areas of development typically are found in two different job roles. Each organization has to find a process that allows developers and database administrators to cooperate and to exchange information with each other. Some organizations require developers to write appropriate indexes for each query and to submit an execution plan to the database architect. The architect is responsible for evaluating the system as a whole, for removing redundancies, for finding efficiencies of scale, and for acting as the liaison between the developer and the database administrator The database administrator can then get information on what indexes might be needed and how queries might be used. The database administrator can then implement optimal indexes.

In addition, the database administrator should regularly monitor the SQL query that consumes the most resources and submit that information to the architect and developers. This allows the development team to stay ahead of performance issues.


Write Correctly Formed Queries

Ensure that your queries are correctly formed. Ensure that your joins are correct, that all parts of the keys are included in the ON clause, and that there is a predicate for all queries. Pay extra attention to ensure that no cross products result from missing ON or WHERE clauses for joined tables. Cross products are also known as Cartesian products.

Do not automatically add a DISTINCT clause to SELECT statements. There is no need to include a DISTINCT clause by default. If you find that you need it because duplicate data is returned, the duplicate data may be the result of an incorrect data model or an incorrect join. For example, a join of a table with a composite primary key against a table with a foreign key that is referencing only part of the primary key results in duplicate values. You should investigate queries that return redundant data for these problems.


Return Only the Rows and Columns Needed

One of the most common performance and scalability problems are queries that return too many columns or too many rows. One query in particular that returns too many columns is the often-abused SELECT * FROM construct. Columns in the SELECT clause are also considered by the optimizer when it identifies indexes for execution plans. Using a SELECT * query not only returns unnecessary data, but it also can force clustered index scans for the query plan, regardless of the WHERE clause restrictions. This happens because the cost of going back to the clustered index to return the remaining data from the row after using a non-clustered index to limit the result set is actually more resource-intensive than scanning the clustered index.

The query shown in Figure 14.6 shows the difference in query cost for a SELECT * compared to selecting a column. The first query uses a clustered index scan to resolve the query because it has to retrieve all the data from the clustered index, even though there is an index on the OrderDate column. The second query uses the OrderDate index to perform an index seek operation. Because the query returns only the OrderID column, and because the OrderID column is the clustering key, the query is resolved by using only that index. This is much more efficient; the query cost relative to the batch is 33.61 percent rather than 66.39 percent. These numbers may be different on your computers.


image: SQLSelectComparison.gif


Figure 14.6: Difference in query cost for a SELECT * query compared to selecting a column


Often, too many rows are returned because the application design allows a user to select large result sets from search forms. Returning hundreds or even thousands of results to a user stresses the server, the network, and the client. A large amount of data is generally not what the end user requires. Use a design pattern that supports paging, and return only a page or two of the requested data at a time.

Queries that call other queries that return too many columns and rows to the calling query are another often-overlooked consideration. This includes queries that are written as views or table-valued functions or views. Although views are useful for many reasons, they may return more columns than you need, or they may return all the rows in the underlying table to the calling query.


References

For more information about data paging, see How To Page Records in .NET Applications.


Avoid Expensive Operators Such as NOT LIKE

Some operators in joins or predicates tend to produce resource-intensive operations. The LIKE operator with a value enclosed in wildcards ("%a value%") almost always causes a table scan. This type of table scan is a very expensive operation because of the preceding wildcard. LIKE operators with only the closing wildcard can use an index because the index is part of a B+ tree, and the index is traversed by matching the string value from left to right.

Negative operations, such as <> or NOT LIKE, are also very difficult to resolve efficiently. Try to rewrite them in another way if you can. If you are only checking for existence, use the IF EXISTS or the IF NOT EXISTS construct instead. You can use an index. If you use a scan, you can stop the scan at the first occurrence.


Avoid Explicit or Implicit Functions in WHERE Clauses

The optimizer cannot always select an index by using columns in a WHERE clause that are inside functions. Columns in a WHERE clause are seen as an expression rather than a column. Therefore, the columns are not used in the execution plan optimization. A common problem is date functions around datetime columns. If you have a datetime column in a WHERE clause, and you need to convert it or use a data function, try to push the function to the literal expression.

The following query with a function on the datetime column causes a table scan in the NorthWind database, even though there is an index on the OrderDate column:


  SELECT OrderID FROM NorthWind.dbo.Orders WHERE DATEADD(day, 15, 
  OrderDate) = '07/23/1996'


However, by moving the function to the other side of the WHERE equation, an index can be used on the datetime column. This is shown in the following example:


  SELECT OrderID FROM NorthWind.dbo.Orders WHERE OrderDate = DATEADD(day, 
  -15, '07/23/1996')


The graphical execution plan for both of these queries is shown in Figure 14.7, which shows the difference in plans. Note the Scan icon for the first query and the Seek icon for the second query. Figure 14.7 also shows the comparative difference in query costs between the two queries; the first query has an 85.98 percent cost compared to the 14.02 percent cost for the second query. The costs on your computer may be different.


image: SQLQueryComparison.gif


Figure 14.7: Query comparison


Implicit conversions also cause table and index scans, often because of data type mismatches. Be especially wary of nvarchar and varchar data type mismatches and nchar and char data type mismatches that cause an implicit conversion. You can see these in the following execution plan. The following example uses a local variable of type char against an nchar column in the Customers table. The type mismatch causes an implicit conversion and a scan in this example:


  DECLARE @CustID CHAR(5)
  SET @CustID = 'FOLKO'
  SELECT CompanyName FROM NorthWind.dbo.Customers WHERE CustomerID = @CustID


Figure 14.8 shows the results of the type mismatch.


Image:SQLTypeMismatch.gif


Figure 14.8: Output showing an implicit conversion


Use Locking and Isolation Level Hints to Minimize Locking

Locking has a huge impact on performance and scalability. Locking also affects perceived performance because of the wait for the locked object. All applications experience a certain level of locking. The key is to understand the type of locking that is occurring, the objects that are being locked, and most importantly, the duration of each locking occurrence.

There are three basic types of locks in SQL Server:

  • Shared
  • Update
  • Exclusive

Note There are also intent, schema, and bulk update locks, but these locks are less significant and are not addressed in this chapter.


Shared locks are compatible with other shared locks, but they are not compatible with exclusive locks. Update locks are compatible with shared locks, but they are not compatible with exclusive locks or with other update locks. Exclusive locks are not compatible with shared locks, update locks, or other exclusive locks. Different types of locks are held for different amounts of time to obtain the requested isolation level.

There are four ANSI isolation levels that can be specified for transactions in SQL Server:

  • Read uncommitted
  • Read committed
  • Repeatable read
  • Serializable

Each of these isolation levels allow zero or more of the isolation level phenomena to occur:

  • Dirty reads. Dirty reads are transactions that see the effects of other transactions that were never committed.
  • Nonrepeatable reads. Nonrepeatable reads are transactions that see only committed data from other transactions. In a nonrepeatable read, data changes when it is referenced multiple times in the transaction.
  • Phantoms. Phantoms are transactions that see or that do not see rows that are inserted or deleted from another transaction that is not committed yet.


The default isolation level in SQL Server 2000 is read committed. Figure 14.9 shows the phenomena that are allowed at each isolation level.


image:SQLIsolationLevels.gif


Figure 14.9: ANSI isolation levels


Instead of accepting the default SQL Server isolation level of read committed, you can explicitly select the appropriate isolation level for code. You can do this by using isolation levels or locking hints.

WITH NOLOCK and WITH READUNCOMMITTED

If you designed your application to use locking hints, use the WITH (NOLOCK) or WITH (READUNCOMMITTED) table hint in SELECT statements to avoid generating read locks that may not be required. This can provide a significant increase in scalability, especially where SELECT statements are run at a serializable isolation level because the SELECT statement is called within an explicit transaction that starts in a middle-tier object using Microsoft Transaction Server (MTS), COM+, or Enterprise Services. Another approach is to determine if the transaction as a whole can run at a lower isolation level. You can use the SET TRANSACTION ISOLATION LEVEL command to change the isolation level for all transactions in a SQL Server session.


UPDLOCK

A common technique for handling deadlocks is to use an UPDLOCK table hint on SELECT statements that are commonly involved in transactions that deadlock. The UPDLOCK issues update locks, and it holds the locks until the end of the transaction. The typical shared lock that is issued by a SELECT statement is only held until the row has been read. By holding the update lock until the end of the transaction, other users can still read the data but cannot acquire a lock that you may need later. This is a common deadlock scenario.


TABLOCK =

You can use the TABLOCK table hint to improve performance when you use the bulk insert command. When there are large amounts of inserts, requesting a lock on the entire table helps by relieving the lock manager of the overhead of managing dynamic locking. However, requesting a lock on the entire table blocks all other users on the table. It therefore is not something you should do when other users need to use the system.

To use locking and isolation level locking hints effectively, you have to understand locking behavior in SQL Server and the specific needs of your application. You can then select the best mechanism for key queries. In general, the default isolation level and locking of SQL Server is best, but you can increase scalability by using other locking hints when you need to.


Use Stored Procedures or Parameterized Queries

Significant work has been done in SQL Server 2000 to optimize dynamic code, especially with the addition of the sp_executesql system stored procedure and the ability to reuse execution plans for parameterized queries. However, stored procedures still generally provide improved performance and scalability.

Consider the following issues when you decide whether to store your SQL commands on the server by using stored procedures or to embed commands in your application by using embedded SQL statements:

  • Logic separation. When you design your data access strategy, separate business logic from data manipulation logic for performance, maintainability and flexibility benefits. Validate business rules before you send the data to the database to help reduce network trips. Separate your business logic from data manipulation logic to isolate the impact of database changes or business rule changes. Use stored procedures to clarify the separation by moving the data manipulation logic away from the business logic so the two do not become intertwined. Establish standards that identify the proper coding standards to avoid intermingling of logic.
  • Tuning and deployment. Stored procedure code is stored in the database and allows database administrators to review data access code and to tune both the stored procedures and the database independent of the deployed application. You do not always need to redeploy your application when stored procedures change. Embedded SQL is deployed as part of the application code and requires database administrators to profile the application to identify the SQL that is actually used. This complicates tuning, and you must redeploy the application if any changes are made.
  • Network bandwidth. Source code for stored procedures is stored on the server, and you only send the name and parameters across the network to the server. However, when you use embedded SQL, the full source of the commands must be transmitted each time the commands are run. By using stored procedures, you can reduce the amount of data sent to the server, particularly when large SQL operations are frequently run.
  • Simplified batching of commands. Stored procedures offer simplified and more maintainable batching of work.
  • Improved data security and integrity. Stored procedures are strongly recommended to ensure data security, to promote data integrity, and to support performance and scalability. Administrators can secure the tables against direct access or manipulation. Users and applications are granted access to the stored procedures that enforce data integrity rules. Using embedded SQL typically requires advanced permissions on tables and may allow unauthorized modification of data.
  • SQL injection. Avoid using dynamically generated SQL with user input. SQL injection may occur when malicious user input is used to perform unauthorized actions such as retrieving too much data or destructively modifying data. Parameterized stored procedures and parameterized SQL statements can both help reduce the likelihood of SQL injection. By using the parameters collections, you force parameters to be treated as literal values rather than executable code. You should also constrain all user input to reduce the likelihood of a SQL injection attack.


Minimize Cursor Use

Cursors force the database engine to repeatedly fetch rows, negotiate blocking, manage locks, and transmit results. Use forward-only and read-only cursors unless you need to update tables. More locks may be used than are needed, and there is an impact on the tempdb database. The impact varies according to the type of cursor used.

The forward-only, read-only cursor is the fastest and least resource-intensive way to get data from the server. This type of cursor is also known as a firehose cursor or a local fast-forward cursor. If you feel that you really need to use a cursor, learn more about the different types of cursors, their locking, and their impact on the tempdb database.

Often, cursors are used to perform a function row by row. If there is a primary key on a table, you can usually write a WHILE loop to do the same work without incurring the overhead of a cursor. The following example is very simple but demonstrates this approach:


  declare @currid int
  select @currid = min(OrderID)
  from Orders where OrderDate < '7/10/1996'
  while @currid is not null
  begin
    print @currid
    select @currid = min(OrderID)
    from Orders 
    where OrderDate < '7/10/1996'
    and OrderID > @currid
  end


References

For more information about cursors, see "Transact-SQL Cursors" at http://msdn.microsoft.com/library/en-us/acdata/ac_8_con_07_9bzn.asp.


Avoid Long Actions in Triggers

Trigger code is often overlooked when developers evaluate systems for performance and scalability problems. Because triggers are always part of the INSERT, UPDATE, or DELETE calling transactions, a long-running action in a trigger can cause locks to be held longer than intended, resulting in blocking of other queries. Keep your trigger code as small and as efficient as possible. If you need to perform a long-running or resource-intensive task, consider using message queuing to accomplish the task asynchronously.


Use Temporary Tables and Table Variables Appropriately

If your application frequently creates temporary tables, consider using the table variable or a permanent table. You can use the table data type to store a row set in memory. Table variables are cleaned up automatically at the end of the function, stored procedure, or batch that they are defined in. Many requests to create temporary tables may cause contention in both the tempdb database and in the system tables. Very large temporary tables are also problematic. If you find that you are creating many large temporary tables, you may want to consider a permanent table that can be truncated between uses.

Table variables use the tempdb database in a manner that is similar to how table variables use temporary tables, so avoid large table variables. Also, table variables are not considered by the optimizer when the optimizer generates execution plans and parallel queries. Therefore, table variables may cause decreased performance. Finally, table variables cannot be indexed as flexibly as temporary tables.

You have to test temporary table and table variable usage for performance. Test with many users for scalability to determine the approach that is best for each situation. Also, be aware that there may be concurrency issues when there are many temporary tables and variables that are requesting resources in the tempdb database.


Limit Query and Index Hints Use

Although the previous section discusses how to use table hints to limit locking, you should use query and index hints only if necessary. Query hints include the MERGE, HASH, LOOP, and FORCE ORDER hints that direct the optimizer to select a specific join algorithm. Index hints are table hints where a certain index is specified for the optimizer to use. Generally the optimizer chooses the most efficient execution plan. Forcing an execution plan by specifying an index or a join algorithm should be a last resort. Also, remember that SQL Server uses a cost-based optimizer; costs change over time as data changes. Hints may no longer work for a query, and the hint may never be reevaluated.

If you find that the optimizer is not choosing an optimal plan, try breaking your query into smaller pieces. Or, try another approach to the query to obtain a better plan before you decide to use hard-coded query hints.


Fully Qualify Database Objects

By fully qualifying all database objects with the owner, you minimize overhead for name resolution, and you avoid potential schema locks and execution plan recompiles. For example, the SELECT * FROM dbo.Authors statement or the EXEC dbo.CustOrdersHist statement performs better than the SELECT * FROM Authors or the EXEC CustOrderHist statements. In systems that have many stored procedures, the amount of time that is spent to resolve a non-qualified stored procedure name adds up.

Personal tools