2020-06-08

Un-natural scripts

“I liken starting one’s computing career with Unix, say as an under-graduate, to being born in East Africa. It is intolerably hot, your body is covered with lice and flies, you are malnourished and you suffer from numerous curable diseases. But, as far as young East Africans can tell, this is simply the natural condition and they live within it. By the time they find out differently, it is too late. They already think that the writing of shell scripts is a natural act.”
— Ken Pier, Xerox PARC

As I've mentioned before I'm basically a unix guy. I believe I first encountered the OS in the mid eighties and first used it seriously in 1989 (a C course taught out of the New Testament with exercises coded on Xenix systems). For all that, I'm not unaware of the tradeoffs involved and find The UNIX-HATERS Handbook (the sourse of the quote) to be both ammusing and informative.

However, my current professional project is cross-platform by customer requirement: it needs to run on Windows as well as Linux. So this is the perfect opportunity to upgrade my skills in Python for all those tasks where I might have written a shell script. Right?

But the habit of thinking in terms of Unix utilities doesn't go away easily.

Last week I needed to prepend some text to a large number of source files. My project is nearing its first delivery to the customer and, being new to both contract programming and project leadership, I started the work without putting the usual legal boilerplate in all the source files.1 Hundreds of them. Obviously some kind of scripted approach was in order.

The issue is while appending is trivial in the Unix file model, there is no OS level support for insertion before the end (the authors of TUHH are laughing up a storm at this point). You must either employ a temporary file or read the whole file into memory before beginning. Anything involving temporary files is hard to get right (especially if you are worried about security which probably doesn't apply here but it makes me cautious all the time), and reading all of an input into memory is a different kind of risk (again with known inputs it's not a huge risk but thinking of these things is part of the job, right?).

This is the point at which my memory throws up a bit of useful Unix lore: sed like several other old tools has a "in place" editing mode meaning that someone else has already dealt with the hard parts. As far as my searching could tell, python doesn't provide a nice wrapper for this.

So I wrote this little tool in /bin/sh and sed. It's not part of our deliverables so it doesn't have to be cross-platform. And that leaves only two issues: a few of the files are UTF-8 with BOM instead of ASCII and actually getting sed to prepend.

UTF

Frankly I agree with people who argue that every text tool should be UTF aware. But that doesn't mean that those old Unix utilities are. In this case it was few enough file to simply handle those with manual cut-n-paste, but it leaves the mystery of how those dozen files got that way when most of the code base is plain ASCII.

Being clever with sed

If you just try the obvious sed '1r license.txt' input_file.cpp you get the license after the first line (which is what you want on anything using a shebang, so all is not lost). The next step is "OK, fine, let's make that '0r license.txt'", but sed doesn't support that address (or maybe some do, but the one I was using doesn't).

So you have to get clever. What I ended up with looks like this (only without the explanatory comments after the line-continuation marker which only some shells support):

sed -i.old \              # Edit in place but keep a backup
    -n \                  # No automatic output
    -e "1h" \             # Hold the first line
    -e "1r license.txt" \ # Insert the license at line 1
    -e "2x" \             # On line 2, swap hold and pattern space ...
    -e "2G" \             # ... then append hold to pattern (pattern space becomes Line1\nLine2)
    -e "1!p" \            # All lines but 1 print the pattern (which includes line 1 on line 2).
    input_file.cpp

1 Qt Creator supports adding such blurbs automatically at the time it generates files skeletons for you, but not adding them at a later date.

No comments:

Post a Comment