The two most hated things a DBA hates is a request for
a backup from three years ago, and coming across a query with an OR clause in
it. Actually that’s not true as DBA’s
we’re just a grumpy group, but I don’t have a month to list out all my
dislikes, so let me focus on the “OR” clause within a query. Unfortunately many devs think that covering
your predicates with a single index works, and on small data sets in test it
does, but as the index grows the duration times grow linear since a scan is
used in an OR clause if not indexed correctly.
This article will talk about how to index for an OR clause. Rather than trying to verbally explain this
through writing, I’ll go through with example.
Below I’ve created a table, index, and populated
enough data to make my point.
create table testtable
(FOne int, ftwo int, FillerField char(1000))
create index aaa on testtable (fone, ftwo)
Declare @interval int
set @interval = 1
while @interval < 10000
Begin
insert into testtable
values (@interval, @interval + 50000, 'Hello')
set @interval = @interval + 1
End
As you can see the table has a
covering index on Fields Fone and Ftwo. I created FillerField so there is enough data
to make my point. The covering index
works great if there is an “AND” clause.
Note when I say covering, I’m talking about the predicates and not the
entire query.
set statistics io on
select * from testtable where
fone = 100 and
ftwo = 50100
If we look at the IO’s in the
messages (Why we use Set Statistics IO on) you’ll see it did 3 reads, using a
seek with a Rid Lookup.
The Seek is shown in the Query Plan
above. The Seek is used because FOne is
unique and the predicate requires both Fone and FTwo meet the value so SQL
simply needs to find all the values on the far left of the index its using, in
this case aaa(fone, ftwo). So it will
look for where fOne = 100, once all the records with this value are found
(acutally as they’re searched) it will examine the second value in FTwo, to see
which ones match in this case 50100.
However using “AND” differs from
the “OR” clause. it doesn’t work this way, it needs to find all
the data from both fields that match fone or data matching ftwo. So if we change the query to an “OR” clause
it will move from a seek to a scan.
select * from testtable where
fone = 100 or
ftwo = 50100
The reads have gone from 3 to
31. Here the predicates are covered but
since it can be either or it can’t do a seek it has to examine each row and
column. In fact in many situations SQL
might find it cheaper to scan the underlying table in this case a RID.
In order to get this efficient, and remove the scan, we need to do a seek against each field in the OR clause, and
leverage an intersection index lookup. By adding the below index, in addition to the
one already created you’ll see each field now has a seek, and the results are concatenated
into temporary storage where dupes thrown out.
Once this distinct intermediate data is determined, it is used to lookup
against the RID.
create index bbb on testtable(ftwo)
select * from testtable where
fone = 100 or
ftwo = 50100
The following example mimics
something I recently saw in a production, the question is how something like
this gets out into the wild. Usually this
is a result of two circumstances. First most testing is functional testing, and the difference
in few milliseconds isn’t noticeable.
The second is even if you’re testing performance the index scan against a
small set doesn’t stand out, and because it is covered the scan itself is
small, its only noticeable after the dataset grows, or the volume of calls increase that it becomes noticeable.
No comments:
Post a Comment