Just another day (or two) of torturing data. Like I mentioned a couple of days ago, a week back I decided to update a data set to include the last year or so of data (the data sources I use were recently updated). Like most "simple" jobs, it's turned out to be much more of a hairball than I expected. Although the program I used was fairly simple to rewrite, I realized that I had to update not one, not tWo, but THREE datasets in order to bring everything up to the present.
Caution: SAS Geekspeak ahead
One of the data sets is pretty large (it was about 70 gigabytes, but with the updates and indexing I've done, it's almost 100 gig). So, adding the new data and checking it took quite a while (no matter how efficiently you code things, SAS simply takes a long time to read a 70 gigabyte file). I thought I had everything done except for the final step. Unfortunately, the program kept crashing due to "insufficient resources."
For the unitiated, when manipulating data (sorting, intermediate steps on SQL select statements, etc...) SAS sets up temporary ("scratch") files. They're supposed to be released when SAS terminates, but unfortunately, my system wasn't doing that. So, I had over 180 gigabytes of temporary files clogging up my hard drive. This means that there wasn't enough disk space on my 250 gigabyte drive for SAS to manipulate the large files I'm using.
Of course, I only realized this when my program crashed AFTER EIGHT HOURS OF RUNNING! TWICE!
I've now manually deleted all the temporary files, and I'm running the program overnight to see if this fixes the problem.
Ah well - if it was easy, anyone could do it.
update (next morning): Phew! It ran - it seems the unreleased temporary files were the issue. On to the next problem.
Showing posts with label SAS. Show all posts
Showing posts with label SAS. Show all posts
Wednesday, 1 July 2009
Thursday, 28 May 2009
SAS is the Devil
I've spent 10 hours over the last two days debugging a SAS program I wrote about 2 months ago. It was written for a paper that's coming along nicely, but I haven't revisited the data (or the program) for a while.
Unfortunately, I didn't document the program very well.
I thought, "well, all I have to do is run this one test. That shouldn't take long."
Cue Jaws theme song and commence profanity
...
/profanity
When will I learn?
Now that I've gotten down to a part of the program that has to run for a while (merging two VERY large (> 50 gig) datasets), I can take a break.
update: It finally finished running. It "only" took 14 hours (yes, that's right, 14 hours). And that's after having used every trick I knew to make it more efficient.
Unfortunately, I didn't document the program very well.
I thought, "well, all I have to do is run this one test. That shouldn't take long."
Cue Jaws theme song and commence profanity
...
/profanity
When will I learn?
Now that I've gotten down to a part of the program that has to run for a while (merging two VERY large (> 50 gig) datasets), I can take a break.
update: It finally finished running. It "only" took 14 hours (yes, that's right, 14 hours). And that's after having used every trick I knew to make it more efficient.
Subscribe to:
Posts (Atom)