2020-09-16

I'll have the BeautifulSoup

We've delived version 1.0 of my project at work; or perhaps we're in the process of delievering it.

We've handed over a complete copy of the repository and deployable binary bundles for three targets and VM based development environments for two.

We still have some time to complete and hand over the documentation. Better still, by policy I set very early on all the documentation is in the repository, so they already have it. And I tried to keep us writing and updaing documentation often enough that it didn't get too far out of synch with the code, so the documentation they have is pretty good. But we're taking this time to make it better: more complete, easier to understand, and reflecting the code we handed over even better.

There is a twist: even though the project is entirely in-house, it's paid for by a contract to which we are a subcontractor. This means that the prime contractor has the responsibility to report on the work (and the authority to make demands about that). This particular prime contractor has a tradition of including the user manuals for software projects as appendicies to the report.

But my main user documentation takes the form of a pile of individual html files that are displayed in a on-line help system. There is no linear structure imposed and no single document containing it all.

So I needed to flatten it.

There are only 20 files right now, so I could have done it by hand. But that would have taken a few hours or more to get right (all the internal links!) and I have reason to believe the project will be funded for further development meaning I'll have to do it all again in a year or two. That's excuse enough to automate the task.

The tool I settled on was a python library called BeautifulSoup which I had heard of but never used before. I loaded a html/css scaffold file, looped over a hand-tooled list/tuple structure placing my files into a few categories. In the loop I created a table of contents entry for the file; grabbed the file body, re-wrote the header levels, munged the anchors and links, base64 encoded the images, and wrapped the result in a section for nice formatting; and then appended the new section to the body of the scaffold. When the loop ended prettify and write. It just works. I'm hooked.

No comments:

Post a Comment