Thursday, April 14, 2016

SQL Server Auditing PII Data, Looking for Email Data


One of the key aspects of being a DBA is security, and what used to be just protecting your data from unwelcomed intruders through object protection has morphed into the data inside those objects.  For example protecting PII (Personal Identifiable Information).  I know what you’re thinking, if the tables and the interfaces (Sprocs, triggers, views, etc) are protected then that should suffice, and in a perfect world it would be.  However we don’t live in a perfect world, so for example what if you need to move data into a test or investigation environment where access to data is much more liberal, this data needs to be protected.  If you’re lucky enough to have every column documented in your DB on what is PII I have great admiration for your companies standards, unfortunately this is a small percentage of companies.  With an army of developers and 100s of DB’s we needed a way to audit our data before shipping the data back to our developers in test.  One of the key components of PII data are email data, so I wrote the following script to ensure there are no public emails in the data we ship.  Note we scrub the data before moving it, this is just an audit process.  If you need to scrub your data the script below can be modified.  One note I use the sample option in the statement, because we have tables with billions of records and rather than scanning the entire table I only look at a subset of the data.




declare @stmt varchar(1000)
Declare @TName varchar(50)
Declare @ColName varchar(50)


if object_id('tempdb..#PII') is not null Drop table #PII

create table #PII
(Tablename varchar(100), ColumnName varchar(100))



declare DiscoverCursor cursor for
select c.name, t.name from sys.syscolumns  c
inner join sys.tables t
on object_id = id
where xtype in(35, 99, 167, 175, 231, 239)


Open DiscoverCursor
Fetch Next from DiscoverCursor
into @colname, @tname
While @@Fetch_status = 0
Begin

              set @stmt = 'if exists(select top 5 * from '+  @tname + ' tablesample (10000 rows) where ' + @colname + ' like ' + '''' + '%@yahoo.com%' + ''''  +
' or  ' + @colname + ' like '   + '''' + '%@gmail.com%' + '''' +
' or  ' + @colname + ' like '   + '''' + '%@hotmail.com%' + ''''  + ')' +
'Begin
Insert into #PII values(' + '''' +   @colname +  ''''  +   ', ' + '''' +  @TName + '''' + ')
 end'
             
      
       exec (@stmt)
       print @stmt

       Fetch Next from DiscoverCursor
       into @colname, @tname
End
Close DiscoverCursor
Deallocate DiscoverCursor


select * from #pii







No comments: