In a recent blog I talked about covering indexes for memory optimization within the buffer pool. I used a fictitious comments table to make my point, about how one large field could destroy querying for particular values. After posting I got the thinking about the example I used, and in the real world would I actually design it, in that fashion. The waffling answer is maybe. The reason I say this is that there is another solution that works with the clustered index, without the need to create a second covering index. Rather then using a Char or a Varchar to hold the actual comments field, you could use a text column. Text fields by default reside in separate data pages then the other fields within the row. Rather then these text fields residing in the same b-tree as the clustered index the data row contains a pointer to the text data that lives in a separate page. The effect this has on your memory consumption is, if the text field is not specified within your query it is never brought into memory. Lets go through some code examples similar to my last blog to see the actual impact.
The following code sets up 2 tables, one using a char field for comments and one using a text field for comments.
Create table CommentsText
(pkcomment int identity(1,1),
CommentType int,
CommentDate datetime,
CommentText text,
CONSTRAINT Pk_intText PRIMARY KEY CLUSTERED (PkComment))
Create table CommentsChar
(pkcomment int identity(1,1),
CommentType int,
CommentDate datetime,
CommentText char(2000),
CONSTRAINT Pk_intChar PRIMARY KEY CLUSTERED (PkComment))
Declare @int int
set @int = 1
While @int <>
Begin
insert into CommentsText(CommentType, CommentDate, CommentText) select datepart(ms, getdate()), Getdate(), replicate('A', 2000)
set @int = @int + 1
end
Insert into CommentsChar
Select CommentType, CommentDate, CommentText from CommentsText
Once setup I’ll free the memory, and run the same aggregation query of my last blog, where we aggregate based on types of comments for each day.
checkpoint
dbcc Dropcleanbuffers
select count(*), CommentType, CommentDate from CommentsText
group by CommentType, CommentDate
select count(*), CommentType, CommentDate from CommentsChar
group by CommentType, CommentDate
Once complete, we’re going to run a memory allocation script very similar to the one found in BOL under the sys.dm_os_buffer_descriptors section
--Memory Consumption Script
SELECT count(*)AS cached_pages_count,
(count(*) * 8)/1024 as SizeInMb, name ,index_id
FROM sys.dm_os_buffer_descriptors AS bd
INNER JOIN
(
SELECT object_name(object_id) AS name
,index_id ,allocation_unit_id
FROM sys.allocation_units AS au
INNER JOIN sys.partitions AS p
ON au.container_id = p.hobt_id
AND (au.type = 1 OR au.type = 3)
UNION ALL
SELECT object_name(object_id) AS name
,index_id, allocation_unit_id
FROM sys.allocation_units AS au
INNER JOIN sys.partitions AS p
ON au.container_id = p.hobt_id
AND au.type = 2
) AS obj
ON bd.allocation_unit_id = obj.allocation_unit_id
WHERE database_id = db_id()
GROUP BY name, index_id
ORDER BY cached_pages_count DESC
The results show a difference of a 10 to 1 difference in memory usage favoring the table using a text field. As explained in the last blog this is due to fact that the comments field does live with the rest of the row data. However when we add a covered index, this changes for the char table by eliminating the comments field from the queried data structure.
Create index commentschartest on commentschar(CommentType, CommentDate)
checkpoint
dbcc Dropcleanbuffers
select count(*), CommentType, CommentDate from CommentsText
group by CommentType, CommentDate
select count(*), CommentType, CommentDate from CommentsChar
group by CommentType, CommentDate
--Memory Consumption Script
SELECT count(*)AS cached_pages_count,
(count(*) * 8)/1024 as SizeInMb, name ,index_id
FROM sys.dm_os_buffer_descriptors AS bd
INNER JOIN
(
SELECT object_name(object_id) AS name
,index_id ,allocation_unit_id
FROM sys.allocation_units AS au
INNER JOIN sys.partitions AS p
ON au.container_id = p.hobt_id
AND (au.type = 1 OR au.type = 3)
UNION ALL
SELECT object_name(object_id) AS name
,index_id, allocation_unit_id
FROM sys.allocation_units AS au
INNER JOIN sys.partitions AS p
ON au.container_id = p.hobt_id
AND au.type = 2
) AS obj
ON bd.allocation_unit_id = obj.allocation_unit_id
WHERE database_id = db_id()
GROUP BY name, index_id
ORDER BY cached_pages_count DESC
The results now show both tables consuming reasonable, with slightly better performance going to the table using the covered index. The difference in size can be attributed to the overhead of maintaining pointers to the text fields in the root data, which is 24 bytes of data per row. The question as a DBA you need to make within your envirornment is; Is it more efficient to use a covered index or simply put the data in a text field. In most situations the covered index, allows for more flexibility, and better overall performance.
Before leaving the subject I’d like to touch on the idea of using Varchar(max) as in comparison to a Text field. Varchar(max) tries to provide the best of both worlds. The rule of thumb on varchar(max) is data that will fit on a single page with the other data in the row, will live side by side with the data. However if the whole field exceeds 8KB then the field will be placed in overflow pages. From a memory consumption point of view, with a similar query above, the text field will be more efficient, for the queries we’ve been looking at.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment