Saturday, December 27, 2008

Base Type Changes Within the SQL Server Data Row

I’m currently writing an article on how DDL statements affect users on an active OLTP system. As usual an article is taking me deep into the internals of SQL Server. What I’m trying to determine is how a data row is modified when you increase the size of fixed data under the same base type. For example moving from a tinyint to a int or a Char(10) to a Char(20). After consulting with a trusted book written by an expert of SQL Server the explanation I got was when this type of change occurs the existing data is ignored, and only meta data is changed. Only when new data is added, is space allocated to adhere to the new data type, but only for this particular row. So if you have a tinyint field that’s altered to an int the existing data is not change since the one byte allocation of the tinyint is guaranteed to fit within the four bytes of an int field. It’s only when a new row is added is the four bytes reserved for the int field in the new row.

The results I received are contrary to this explanation at least under SQL Server 2005 Standard edition, in which I ran my tests. This is actually unfortunate, because the method SQL Server is performing on this procedure causes several issues. When a datatype is increase or decreased with the same base type the data is moved to a second location while only soft deleting the original value. This makes for longer schema locking, heavy data page operations, and fragmentation. To make more sense of this below is an example to see what’s really happening.

First in the Adventureworks database run the following code to create some sample data.

--Create Table
create table t1(pkint int identity(1,1) primary key, charfield char(100), holder char(5))

--Populate Table Seed Data
Insert into t1(charfield, holder) values('Hello', 'hold')

--Loop through to populate with 130K
declare @x int
Set @x = 0
while @x < 18
begin
Insert into t1(charfield, holder) Select charfield, holder from t1
set @x = @x + 1
end

Now that the data is in place, let’s look at the physical structure by using some well known undocumented DBCC Commands. The DBCC IND command gives us view of what data pages make up the table and how their linked.

dbcc ind('AdventureWorks', t1, 0)

Make a note on how many pages (Number of Rows returned) in my case 3857 and the pagePid of any page with an IndexLevel of 0. The indexLevel of 0 represents leaf level page where the actual data lives. I’ll be using pagepid 87137

Now look inside the details of the page by using dbcc page


DBCC TRACEON (3604)
dbcc page ( 'AdventureWorks', 1, 87137,3)

Either scroll through the results until you hit “Slot 0 Column 0” or just do a find for the value “Slot 0 Column 0”. Either method will take you to the first data row in the data page. Your page should look similar to that below which is actual data in your table, with each column mapping to the actual column and it’s legnth.

Slot 0 Column 0 Offset 0x4 Length 4

pkint = 1

Slot 0 Column 1 Offset 0x8 Length 100

charfield = Hello

Slot 0 Column 2 Offset 0x6c Length 5

holder = hold


Now lets double the size of the charfield column, by running an Alter table statement

Alter table t1 alter column Charfield char(200)

The first key that existing data is changed is the amount of time it takes to execute the statement above. If it was a pure meta-data statement the change would have been instatenous. Now run the dbcc ind command from above. Notice your how many rows were returned. My results returned almost double the amount of data pages 7712.


dbcc ind('AdventureWorks', t1, 0)

Now the interesting part. Rerun the dbcc page command and look inside the row. Notice it contains 4 column offsets, with column 1 showing a length of 100 and the column name equates to “Dropped”. While the new Column 3 is identical to the original data of column 1 except it has the new length of 101. So the obvious is that the data is copied from the old destination into the new, without a hard delete.

Slot 0 Column 0 Offset 0x4 Length 4

pkint = 1

Slot 0 Column 1 Offset 0x8 Length 100

DROPPED = Hello

Slot 0 Column 2 Offset 0x6c Length 5

holder = hold

Slot 0 Column 3 Offset 0x71 Length 101

charfield = Hello



As the article I’m writing will go more into the detail effects of this, it’s easy to see that this is a heavy operation.

No comments: