Thursday, February 5, 2009

SQL Server Performance Impact of Modifying Columns through Alter Table

Introduction
Recently we had an issue where a value of null was being handled incorrectly by our front end code. After populating the null values, the question was, should we make this column “not null”, to avoid this issue in the future. That would’ve been most appropriate during design, but if we made the change now, it would affect the live database during the DDL execution. Running an “Alter Table” statement to change this field to “Not Null” would’ve taken three minutes. Before moving ahead with this change we needed to fully understand what the effect on our users would be during this modification. The following is the first in a two part article which discusses user effects of DDL changes to data structures. All examples use the stock AdventureWorks database in 2005.

Schema Locks VS Table Locks
When a DDL statement is executed against a table, a schema lock is issued against the entire table. There are two types of schema locks; Modification (Sch-M) and stability (Sch-S). Stability locks are more relevant to referencing objects such as Stored Procedures, Views, Indexes, etc, while modification locks are for data structure changes. Being that we’re talking about “Alter Table” statements the latter is of concern. Most people including myself will often compare a schema modification lock to an exclusive table lock. In most conversations this description is fairly accurate in that data modifications to any part of the table wait until the lock is release, as do select statements in read committed isolation mode. However one key difference is table locks allow for read uncommitted selects, which are denied under a schema modification lock. Although scolded by many DBA’s dirty reads are highly beneficial in high transactional systems. It allows users the ability to retrieve data during peak modification times, without waiting, for DML statements to complete. This is not the case when a schema lock is held, a dirty read needs to wait until the lock is released. If you try to sidestep the concurrency issue using snapshot isolation, try again this doesn’t work either.

If want to look at how this works, open two tabs in SSMS. In the first session (tab) execute the below DDL statement against the Title column in the Employees table of the Adventureworks database.

Begin Tran alter table HumanResources.Employee Alter column Title Char(50)

In your other session ( 2nd tab) attempt to read with a dirty read. Notice no data is returned instead your session is waiting on a schema lock.

select top 10 * from HumanResources.Employee with (nolock)

Had the first session simply been a DML statement using a table lock such as the one below, you would have been able to access the data with a dirty read. To prove this out rollback the first statement and run a table lock dml statement.

rollback tran
Begin Tran Update HumanResources.Employee with (tablock) Set Title = 'Test'

Now rerun your select. Notice you can retrieve data now.

select top 10 * from HumanResources.Employee with (nolock)

Schema Locks and Child Relationships

A significant point with schema modification locks is that they cascade on relationships. Any child table will experience a schema lock if its parent is having a DDL statement run against it. For example the Employee table is the parent to the JobCandidate table on the Employeeid field. If a schema modification is made against the Employee table the JobCanidate is also locked. In fact all the child tables of Employee will not allow access even if the query doesn’t reference the employee table. One would expect that this is limited to changes against the field in which the two entities are related such as the Employeeid field however it’s any change against the parent table.

To demonstrate reuse the previous example. In the first session rollback your transaction if it was open, and issue the first command. Then look at your locks by quering the DMV dm_tran_locks in the same session.

Begin Tran alter table HumanResources.Employee Alter column Title Char(50)
go
select o.name, l.* from sys.dm_tran_locks l with (nolock) join sys.sysobjects o with (nolock) on l.resource_associated_entity_id = o.id where resource_database_id = db_id() and request_mode = 'Sch-M'

Notice all child tables and the relation objects are locked. If you try running any selects against these objects while the DDL transaction is active you won’t be able to access the data. The query below is one that a customer service might use. It has nothing to do with the employee table but the alter statement above would block it. Make sure this is executing in a different session.

select PH.status,ph.ShipDate, ph.TotalDue,pd.Productid, Orderqty from Purchasing.PurchaseOrderHeader ph
inner join Purchasing.PurchaseOrderDetail pd on pd.purchaseorderid = ph.purchaseorderid
where ph.purchaseorderid = 5

Plan Recompilation
Locking is an easy target to attack since it has significant impact on a system. However we need to also consider the cost of recompiling. Any SP or SQL statement in cache that references the object where the DDL statement is performed will be recompiled, with statement completion. Anytime a recompilation occurs there is a wait to rebuild the plan, and the chance you might get bit with “parameter sniffing”. However unlike the schema lock that cascades from parent to child, plans that depend on child tables are not recompiled. So the above customer service query would be free from recompilation. In the large picture recompilations are more of an annoyance then a credible concern to running a DDL statement.

Conclusion
The best solution is to design your table correct the first time, but we should save that verbiage for consultants who don’t have to hear businesses ever changing requirements. The reality is these situations will always come up, and you need to understand the effects of your change so that you can plan out the proper execution and timing. One key to remember is all DDL statements are not created equally some can take quite a bit longer then others. In Part II I’ll discuss different types of DDL statements against tables and the affects they have on your system.

No comments: