The other day at work, I had some free time and decided to do a little housecleaning. The project that inspired me involved converting at Access database (which had started with Ingress, then moved to two other RDBMS' I can't recall) to Sybase. It was a very, how do you say, "icky" project.
The ickiness came not from the unrelational, completely nonsensical nature of the data, although it was bad. (Seriously: do you ever need to have a whole separate table called "us_resident" containing only two rows with 'yes' and 'no' in it? No. You never need that in a modern database. You use a SET or ENUM or some other list column type in your table.) The ickiness was caused primarily because some tables were keyed by student ID number, and some were keyed by the student's social security number. And of the other 40+ tables, only 1 (one!) had both. I called it the "rosetta table", and I had to do some fairly fancy footwork with its data in order to get at the rest of the database. I just sort of ignored all the thousands of orhpaned records I sometimes found. The admin folks weren't even really sure what all they needed, and I wasn't about to spend my life digging through everything. Student data-enty labor is cheap, and so I made lots of printouts. Some poor undergrad will be unknowingly cursing my name this summer.
This ickiness was further exacerbated by the fact that California passed a law last year called SB 25, and it means that anyone who has SSNs anywhere on a computer had better think long and hard about the delete key. The upshot of the law is that if the machine on which the personal data is stored on gets compromised, you have to let everyone who might be affected that they could, maybe, possibly be a victim of identity theft at some future point. Basically, that's what it said, and everyone on campus has been dumping data with SSNs left and right. Old backup tapes are "going away", email spools are being cleaned, etc.
So this info I was dealing with on my workstation had SSNs all through it, and now that the project was done, I wanted the data gone but permanent like. But the rub was that I needed copy of all the data to have around in case the admin dudes wanted something else from it all at some later point. So I burned a CD of the data to give to the admin group. I figured they could keep it on the shelf (which would satisfy SB 25) and pop it in whenever they needed to find something. To further this aim, I made a little navigable index.html page to all the student info, and I even put in an autorun file which would launch (under Windows) the browser with that index page on it so that the admin people could see all their old data without having to dig too much. That's just the kind of guy I am. Well, that plus I wanted it to be easy for them to find stuff on their own instead of call me. And finally because everyone feels better holding dead trees, I also made some relevant printouts for those same admin folks, and then I had a shredding party with all my work docs. Clearly, my work was done.
While I was shredding the physical media, I got to thinking about the digital media: shredding data on my workstation's hard drive. Simply deleting a file doesn't really delete shit, and my knowing that means that any lawyer in the world could easily prove I should have tried harder to get rid of the copious amounts of SSN-based data I had. That's if I wind up in court, being sued by one or more persons who had their identity stolen based on my negligence in getting rid of their data. Not a likely prospect, but why take chances that don't lead to a payoff?
In talking to my boss about the issue, he mentioned that he recently discovered that there's an app called srm installed by default on his on his Mac OS X laptop. Being similarly concerned about SB 25-ish things, he went and found the aforelinked SourceForge page, and sent me the URL. I was more than happy to use it, but I couldn't get it built onmy Fedora Core 1 system. And I tried, believe me -- but only for a half hour or so.
I only tried for 30 minutes because I realized that I could get the very same thing working in like 15 minutes if I wrote it in something portable, like Perl. So I did. Here's my code:
#!/usr/bin/perl -w
#
# sremove.pl - Removes files psuedo-securely by overwriting the file contents
# with zeroes a user-defineable number of times, then truncating
# and then unlinking the file. Probably not DOD secure, but
# seems to work.
#
# Free for non-commmercial use, with no warranty of fitness expressed or implied.
#
# Wm. Rhodes 4/2004
#
use strict;
use File::Find;
use Getopt::Std;
$|++;
# How many times to overwrite the file with zeroes. A default of 7 seems to be
# an OK number.
our ($opt_p);
getopt('p');
my $num_passes = $opt_p ? $opt_p : 7;
foreach my $file (@ARGV) {
if (-e $file) {
find(\&RemoveFile, $file);
} else {
print "File not found: $file\n";
}
}
# This overwrites our file with zeroes $num_passes number of times, then
# truncates it to some smaller size, then unlinks it.
sub RemoveFile {
my $length = (stat())[7];
print "Deleting file '$_' (",
commify($length),
" bytes) with $num_passes passes... ";
for (my $i=1; $i<=$num_passes; $i++) {
ReWrite($_, $length);
}
ReWrite($_, $num_passes);
unlink($_);
print "Done.\n";
}
sub ReWrite {
my ($file, $length) = @_;
open(FILE, ">$file") || die $!;
print FILE '0' x $length;
close(FILE);
}
sub commify {
local $_ = shift;
1 while s/^(-?\d+)(\d{3})/$1,$2/;
return $_;
}
Now, I'm no computer forensics expert by any stretch. But I did take the time to read through the source of srm, and I did take the time to do some empirical testing in order to make sure that my data was gone.
I installed the Coroner's Toolkit. It's a cool set of apps that lets you recover deleted files. And it will recover anything. I recovered a file from when my two year-old Linux workstation had Windows XP on it (which was only for like two weeks, when I first got it). And, yes, installing Linux over a previous Windows XP installation means a format of the hard drive. And yet I still found WinXP data on the partition that I recovered. I didn't think that was possible, but I saw it with my own eyes. So if formatting a hard drive maybe doesn't really permanently erase data, I was very keen to see if my little Perl script would.
I wound up spending the bulk of a workday testing it. I created a file with a text string that I knew wouldn't be found on my machine. I created the file on a small partition and then deleted the file. Then I ran the Coroner's Toolkit over that partition and recovered all the files I could. I found my deleted file. I then made a new file with another distinctive string on another partition and did the same delete/try-to-recover thing. I found the file. I then made a distinctive string-containing file, erased it, then made a new one with the same name but different contents. I found both of the files. Admittedly, this was not a busy system, but that scared me.
Then I reproduced all those tests after deleting a file using my script above. I couldn't recover any of the file's contents. I even tried it a few times in various places more and got nothing. So I'm pretty sure that the above script works. It appears to work anyway and I'm confident that all that SSN data is gone forever.
Having said that, if you use the above script you are on you own and I make no warranties about its fitness for any purpose. It almost certainly works as advertised, but until I get to see what the NSA can do, I ain't promising shit. Further, if you use it and something breaks, you get to keep both pieces. But I'm feeling good that I did something better than simply 'rm' all those people's personal info, and I think the script is relatively safe to use (although it's not terribly robust as far as error checking and whether files are directories and whatver; it worked for my purposes, so I was done with it).
Anyway, use it if you like.
Hi there wee, I was catching up on your news/blogs or whatever you want to call them when I came across this article.. I actually wrote something similar awhile back. My version overwrites the data multiple times with different bit orders..
use integer;
my @files = ('1.txt', '2.txt', '3.txt');
my $obyte1 = pack('b', 01010101);
my $obyte2 = pack('b', 00000000);
my $obyte3 = pack('b', 11111111);
my @overwrite = ($obyte1 x 65536, $obyte2 x 65536, $obyte3 x 65536);
chdir('C:\whatever') or die('could not change the directory');
foreach my $file (@files)
{
open(FILE, "+
my $filesize = (-s $file);
my $binmode = (-B $file);
if($binmode == 1){
binmode(FILE) or die('could not change file mode');}
my $loop = ($filesize / 65536) + 1;
foreach my $obyte (@overwrite)
{
while($loop)
{
print FILE $obyte or die('Could not print to the file');
$loop--;
}
}
close(FILE) or die('Could not close the file');
}
unlink(@files) or die('Could not delete the file(s)');
This will insure your 'admins' and your clients that you did not just shred the documents, but that you cross shredded your documents :)
Anyways, I hope this is of some interest to you. As a perl lover myself, I like seeing code to similar problems that I have encountered.
best regards,
complx