The joys of Visual Studio 2008 regex based find and replace

I was replacing the XML library we were using. Thankfully, we had the sense to put a wrapper on top of this (MSXML6) and use the wrapper always. Most always. So, when the time came, I was roaring to get done within a day. Only, I hadn’t realized the code has a way of getting back at you.

PugiXML seemed the weapon of choice. And off I went replacing the wrapper. In a jiffy. Almost about to hit the build button when I checked the PugiXML documentation one last time, and froze at these very words: You cannot assign NULL to a pugi::xml_node.

We have had the CComPtr playing nasty and had developed (a very bad) habit of sprinkling NULL assignments throughout the code base. Which was bad (particularly when CComPtr default initialized to NULL). What was worse we kept assigning NULL elsewhere, in the body of an otherwise innocuous for-loop, a defaulted parameter ad infinitum (and ad nauseam).

I had to get rid of all occurrences of the following:
Node n = NULL; // don't even ask why!
NodeRoot r = NULL; // yuk
curr = NULL;
and so on….

A naïve search of NULL resulted in 9K+ hits. I left my seat and grabbed the nearest vending machine. Caffeine handles NULLs better or so I realized in a couple of minutes. I settled down and finally, finally, invoked that unutterable of daemons, the universally hated swiss-army knife that always results in suicide: The Regex.

I started with all Node declarations with NULL-assignments — I needed to look for a few spaces (our code isn’t that badly formatted — and chances of a declaration/definition starting at column 0 in C++ is nada, so the Kleene plus is better suited) ([ t]+) from the beginning (^), followed by the string Node, followed by some more space (at least one, the C++ grammar requires it), an identifier ([a-zA-z][a-zA-z0-9]*) and then the assignment operator (=) followed by the string NULL (yes!) and then the end-of-line semi-colon (;).

All of which looks like: ^[ t]+Node[ t]*[a-zA-z][a-zA-z0-9]*[ t]*=[ t]*NULL;
And the infinity boiled down to 117 instances. Yay!

Replacing was fairly easy with the capture blocks — parenthesizing the blocks I wanted to keep by using braces ({}) and then going for the kill:

The Find phrase: ^{[ t]+Node[ t]*}{[a-zA-z][a-zA-z0-9]*}[ t]*=[ t]*NULL{;}
The Replace phrase: 123

The rest was fairly easy, tinkering with the first regex to find-and-replace NodeRoot nasties and then the standalone assignments. A good 500 replacements in under 15 minutes!

I realized while writing this up that I could’ve used the Regex Builder and saved time, but then it never clicks me when I start out that such a thing exists (and it’s for the kids, after all!).


About this entry