One of the key aspects of being a DBA is security, and what
used to be just protecting your data from unwelcomed intruders through object
protection has morphed into the data inside those objects. For example protecting PII (Personal Identifiable
Information). I know what you’re
thinking, if the tables and the interfaces (Sprocs, triggers, views, etc) are
protected then that should suffice, and in a perfect world it would be. However we don’t live in a perfect world, so
for example what if you need to move data into a test or investigation environment
where access to data is much more liberal, this data needs to be
protected. If you’re lucky enough to
have every column documented in your DB on what is PII I have great admiration
for your companies standards, unfortunately this is a small percentage of
companies. With an army of developers
and 100s of DB’s we needed a way to audit our data before shipping the data
back to our developers in test. One of
the key components of PII data are email data, so I wrote the following script
to ensure there are no public emails in the data we ship. Note we scrub the data before moving it, this
is just an audit process. If you need to
scrub your data the script below can be modified. One note I use the sample option in the statement,
because we have tables with billions of records and rather than scanning the
entire table I only look at a subset of the data.
declare @stmt varchar(1000)
Declare @TName varchar(50)
Declare @ColName varchar(50)
if object_id('tempdb..#PII') is not null Drop table #PII
create table #PII
(Tablename varchar(100), ColumnName varchar(100))
declare DiscoverCursor cursor
for
select c.name, t.name from sys.syscolumns c
inner join sys.tables t
on object_id = id
where xtype in(35, 99, 167, 175, 231, 239)
Open DiscoverCursor
Fetch Next from DiscoverCursor
into @colname, @tname
While @@Fetch_status = 0
Begin
set @stmt = 'if exists(select top 5 * from '+ @tname + ' tablesample (10000 rows)
where ' + @colname + ' like ' + '''' + '%@yahoo.com%' + '''' +
' or ' + @colname + ' like ' + '''' + '%@gmail.com%' + '''' +
' or ' + @colname + ' like ' + '''' + '%@hotmail.com%' + '''' + ')' +
'Begin
Insert into #PII values(' +
'''' + @colname + '''' + ', ' + '''' + @TName + '''' + ')
end'
exec
(@stmt)
print
@stmt
Fetch
Next from
DiscoverCursor
into
@colname, @tname
End
Close DiscoverCursor
Deallocate DiscoverCursor
select * from #pii
No comments:
Post a Comment