Testing your housekeeping code
In many many projects I have been involved in, the importance of housekeeping is often overlooked, in the midst of the rush of delivering core functionalities for the business. Housekeeping is often resurrected only when we either run out of space, or performance started to degrade due to the unplanned large volume of data.
The one thing I would like to talk about today, is that housekeeping code is normally not well tested, due to the excuse that developers have run out of data to test. Sounds familiar? A developer developed a housekeeping code, then tested it against a particular environment. He ran it once, and it did 80% of the tasks correctly, so he needed to do a bit more tweaks. When he tried to rerun the code after, he realised, that his first run already deleted most of the data he could housekeep from his test environment. What now? Asking the DBA to refresh a full historical dataset would mean waiting for a day or two, and to handcraft dummy data manually would take too long. At the end, the half baked code is fast-tracked to production environment.
My solution to this problem is actually quite simple.
- When creating a housekeeping package/procedure, always include a “run_mode” as an IN attribute, accepting the values of TEST run or LIVE run
- during a TEST run, rather than performing actual truncates/deletes against target tables, simply write all the commands that will be called into a log table, so these can be reviewed again and again until the code is satisfactory
- only during a LIVE run would the actual data be housekept for real.