Saturday, February 7, 2009

SQL Server Performance Impact of Modifying Columns through Alter Table Part II

Introduction
In part one I talked about the effects of a DDL or Alter Table statement has on your live system. However all Alter Tables statements are not created equal. When a DDL statement is run there are two primary ways SQL Server handles these changes. For a lack of better terminology I refer to these two as “Immediate DDL Update” and “Delayed DDL Update”. Understanding each type is critical for a DBA or Developer in making decisions on how and when to run an Alter Table Statement. The primary purpose of this article is to be used as a cheat sheet, to help you decide on timing, coding and design.


Immediate DDL Update

This is the most straight forward type of modification. An alter table statement is performed and the underlying data structure is changed immediately to accommodate this change. This operation is significantly disruptive to the concurrency of an OLTP system, because it’s physically changing and reorganizing data on every row in your table. As you remember from part I a schema lock is issued, during these DDL statements and therefore the entire table and all its dependencies are inaccessible for the duration of the update. In line with that logic, the longer the update the lower your concurrency.

If you are running this type of command it’s best to ensure there is free space in your data page to accommodate the additional data or reorganization. This might mean reindexing with the appropriate fill factor during off hours, then running the statement. Doing so will alleviate much of the physical page IO processes, which might entail page splits and allocations, resulting in IO latching, and fragmentation.

Delayed DDL Update
This type of update works with meta-data only, and avoids changing underlying data until needed. Because it doesn’t change the data it allows for the DDL command to execute instantaneously, and improves concurrency within your system. I use the term delayed because eventually the underlying data will change with data reorganization such as a reindex. As I go through the different examples below, you’ll see that there are two flavors here one that simply makes the meta-data change, and one that first needs to validate the underlying data.

Modification Types

Adding Null Columns Without Defaults

This is a “Delayed DDL Update”. Here a new column is added but since it allows Null values, and the data going in is null, only metadata changes are made. If this change is made on a highly active OLTP system users will not notice any effects. The real work comes when an update or insert statement is executed against this column. When an update occurs data is added to the row, and the data structure is modified to ensure this new data can fit. It’s here that you’ll see IO issues mentioned above. As long as there are no immediate bulk updates that follow, this can be performed anytime. If you use a default value then refer to the next example, for guidance.


Adding Not Null Columns
When adding a Column that’s “Not Null” it requires you use a default value, as part of the Alter Table statement. If an int column is added, then four bytes of data is added to row. Since the data is being changed underneath it is an “Immediate DDL Update”, which translates into Schema locking and IO processing. Perform this statement during off hours, if the table is particularly large.

Increasing Size on a Varchar Data Type
This is a “Delayed DDL Update”, making changes only in meta-data. This is based on the fact that variable data by nature is meta-data specific using offsets at the end of the data row. When moving from a Varchar(10) to a Varchar(12), no data is changed. This is a perfectly safe statement to run during high oltp times.

Decreasing Size on Varchar Data Types
This is also a “Delayed DDL Update”, but unlike the previous example there is overhead. This is because SQL must first check to ensure that existing values will fit in the new smaller cap. For example when going from Varchar(12) to Varchar(10) SQL must make sure there’s no strings with more then 10 characters. This check is performed using a table scan. Whether this can be done during high transactional times is dependent on two factors. The first is the size of the table, how long does it take to run a table scan on the modified table. The second factor is indexes. Unlike increasing the size where indexes are not affected, this modification can not be performed when the altered column has an index on it. It must first be dropped, perform the statement, and then recreate.

Changing Size on Char Data

Whether you’re increasing or decreasing the size of a Char datatype it’s an “Immediate DDL Update”. Data is copied within the data row to a new location at the end of the fixed length portion of the row. The old data is only soft deleting meaning it still exists within the row, but is not accessible to the user. If you go from char(10) to Char(20) you will increase the data row 20 bytes, while going from Char(20) to Char(10) increases the row 10 bytes. As with the above example all indexes referencing this column must be dropped prior to the action. If you plan on doing this on a hot database, I’d recommend doing during off hours, as there is heavy IO here.

Changing a Not Null Column to Null
This is a “Delayed DDL Update” that is a pure meta-data change. Within a data row null values are tracked in a bitmap, that resides in every row whether there is a null value or not (This is new with SQL Server 2005). Changing to a Null datatype simply alerts SQL to make appropriate changes to this bitmap if a null value is inserted or updated in the modified field. In addition you need not worry about referencing indexes, the procedure can run without needing to drop them. It’s perfectly acceptable to run this during hot times.

Changing Null to Not Null
Like the example above this too is a “Delayed DDL Update”. However unlike it’s sibling it’s not instantaneous, because a check must first be issued to ensure none of the current values are null. SQL performs this using a table scan. It also differs from “Not Null to Null” in that any indexes referencing the column must be dropped. Running this statement during peak hours shouldn’t cause much disruption to your users, if there are no referencing indexes.

Increasing Integer Base Values
Increasing size requires an “Immediate DDL Update”. As with changing the size of the Char value a soft delete occurs and a new value is inserted into the end of the fixed portion of the data row. Going from an Int to a BigInt for example adds an additional 8 bytes of data to each row. When determining whether to execute this on a hot system, take into account the size of the row, and how full the page is. If your row is 80 bytes long and you make the change above, it increases the size 10%. If you have a fill factor of 80% and a reindex has recently occurred then you’re probably alright to run on a hot system. If your row size is 12 bytes, adding an additional 8 bytes increase the data size 40% and will result in page splitting and allocation.

Decreasing Integer Values

Of all the DDL changes this is the most interesting. This is a “Delayed DDL Update”, that occurs in meta-data. When the statement is executed SQL Server first checks through a table scan that the existing data will fit within the new datatype. After the validation then the old data is used, that is there is no moving of data, as seen in other statements. During the next reindex the extra space will be recovered. As with many of these changes any indexes referenceing this column must be dropped, prior to running the command. If you don’t have any referencing indexes here, then it’s a safe command to run.



Converting Datatypes with a different Base Type
This is an “Immediate DDL Update”, and follows the IO intensive process of many of the above examples, where a soft delete occurs on the original field and a new one is created. If you change your data for example on an int to a decimal(10,5), additional 10 bytes is allocated to each row. This type of command should be executed during off hours.

Conclusion
This is a simple list of possible changes you might come across, and should be used as a cheat sheet. There are several more scenarios which can run many more pages. The key with any change is to test prior to execution.

No comments: