2020-03-31

I am humbled

Josh Heyer AKA Shog9 is a towering figure of reasonableness and a bastion of deliberate good will standing against a sea of easy emotional back-biting.

I am in awe. 

And all too aware that I have a long way to go before I can be that kind of person.

That is all.

Virial disruptions

As I noted before, my household went into a moderate social lock-down a few days before the Governor made the call for the whole state. And we've tightened up our practices a couple of time since then.

We're not completed locked down first because the government order allows for moving around is outdoor public space (yeah jogging!) as long as you keep your distance and secondly because one member of the household gets in-home healthcare.1

We've started getting groceries by delivery and less perishable stuff from Amazon.2

Internet


We get out internet from our cable TV provider3 which means that we're on a shared segment. Of course these guys always advertise the maximum bandwidth you could possible get and you always actually get less because you are using a shared resource. Easy enough to understand and you can buy with that in mind.

However, with everyone home (and presumably doing a lot more gaming and streaming than usual) our segment appears to be close to saturated and the realized bandwidth is mush lower than I'm used to (and often with higher latency as well). At a time when I'm working from home.

What fun.

Time to call them and upgrade our plan, but it's not clear how much good that will do ... can they actually give me more just now?

It's not exactly cabin fever, but ...


I'm an introvert and not particular social, so staying at home for days at a time is perfectly normal for me. But I'm not completely asocial so staying at home for weeks at a time is not. Oftentimes, taking my books and papers or computer to some coffee shop and sitting quietly working by myself in a setting with some action and conversational buzz would be enough to get my fix, but I need something beyond work even in normal times.4

My wife is a bit less introverted and a bit more social than I am. She used to get out more, so it's hitting her harder. She frets. The more so because she's the one who manages the logistics of the household, which has settled down to just enough work everyday to keep her thinking about it but not enough to keep her busy.

We've gotten to the point where going to the pharmacy drive-through is an exciting family outing. Our toddler really likes the local drive-through car wash.




1 And, yeah. That's a nice balancing act.

2 Tip these people. Seriously. Folks working these jobs generally need the money in the first place, and they are doing a huge service for the public good by letting many of us huddle down and minimize contact, while running more risk than the rest of us in the process.

3 The only other option is a heavily over-subscribed municipal-scale WiFi network operated by a commercial outfit. They must have good marketing because a lot of communities around here contracted with them. Alas, they seem either unwilling or unable to provide enough bandwidth to meet the promises they've made.

4 Being a programmer in a small shop is not a highly social activity, but you do get out from behind the keyboard to talk to your colleagues several times a day if your are doing it right. Teams need to coordinate, managers need to be kept up to date, and communication with the customer has to be kept up.

2020-03-26

Now I have two problems ...

… or why is parsing so @$#^ hard?

So, your design calls for you to take input in a non-trivial text format. What next?
There are lots of ways to proceed: you can hand tool a finite-state-machine or even a recursive descent parser, you can break out some compiler construction toolkit, or you could–just maybe–try to do it with regular expressions.
Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.
Jamie Zawinski
Hand tooling a FSM works great for little problems, and big problems call for either a compiler-compiler or that hand-tooled recursive descent processor.1

But what should I do for medium-sized problems in C++?

Doing it by hand generates a lot of fiddly code to write, test, debug and maintain. The basic problem is that the hand-written code includes a lexing step which is at the wrong level of abstraction: you’re dealing with characters when you should be dealing with tokens.

The purpose of compiler construction tool kits is exactly to let you think at the higher level of abstractions. But they mostly require a separate processing step in your build2 and they all represent another build dependency. And some are hard to get or install on multiple target platforms (I have to support linux and windows at work, including both new and older version of both).

It this point it is very tempting to use regular expressions to implement your lexer or even the whole lexing-parsing pass. Which is generally a bad idea.3

Remember that I’m assuming here you have a non-trivial input syntax. What that means on a practical level is that you have decision making to do as you parse and that means either a lot of little regular expressions wrapped up in code to do the decision making or a smaller number of very complex regular expressions with lots of alternatives and optional groups. Either way it’s going to be hard to understand, hard to debug, and hard to maintain.

How we got here, maybe?


Back in the day (and I really mean the misty depths of history, even before my time), computers were slow and expensive, and programmer time was relatively cheap. And lexically analyzing a file was notorious for being “slow”. In those days a lot of effort went into writing fast lexers. And the same structure was then used for parsers. Why not?

But these systems are complex, and they require a separate step in the build chain.

Even today you wouldn’t willing use a slow version of a component that took up a lot of your run time if a fast version was available. So serious compiler construction kits all advertise their speed.

Why I don’t care


In this case I’m talking about processing mini-languages as a small part of a larger work flow, not traditional compiling where the processing of the input (through several steps) is the work. This kind of parsing is not generally a big part of my run-time. I don’t actually care if it’s a bit slow.

What I’d like


I’d like a class that accepts a grammar and a input and produces a stream of tokens. The grammar could be a run-time input or in the template case a compile time input. But if it is compiled it should not require a separate processing step (i.e. it should be implemented in the template meta processor).
The grammar should include optional function-objects (lambdas, std::function, whatever …) to be run just before emitting the related token (and as an implementation detail, I’d pass a state object for those functors to act on).

What I have in mind is an interface something like
struct rule;

template <typename STATE, 
          typename STRING = std::string, 
          typename TOKEN = uint_16>
class lexer
{
public:
   lexer(const std::vector<rule> &grammar);
   ~lexer() = default;
   
   TOKEN scan(STRING::const iterator &it, STATE &state) const;
};
where the user-provided state-class collects all the information other than the token stream itself.

And I’d like a similar class for parsing.

All in the standard library or available as a header-only implementation, so that adding a lexing/parsing step to your project is easy if you can accept a speed penalty.



1 I notice several compiler-compilers producing recursive descent code these days. That’s a change from when I first studied this problem; back then everything seemed to be table driven.

2 Boost spirit is an exception here, but it imposes a significant compile time penalty, and the need to build it out of c++ operators and live with their precedence structure results in a weird DSL.

3 Don’t get me wrong. I’m with Jeff on this: regular expressions are a must-have tool, but you have to keep them in their place.
Written with StackEdit.

2020-03-20

Search and reinvention

The other day I reinvented the wheel.

I did search for existing solutions first, but all the descriptions of the problem I tried had too many common words and generated pages of search hits to other things described with the same words. Even when I restricted the search to stackoverflow.com.

It took me almost fifteen minutes to get bored with that, and a bit over five minute to find a working solution for myself.

I felt rather proud of my solution as it is quiet neat. Almost elegant really. Which made me absolutely sure it was well-known,1 so I searched again. I was able to quickly find many existing examples by including ceil (the mathematical and programming name for the function returning the smallest integer not smaller than the argument; which is used in my solution) among the search terms.

But until I had the solution there as no way to know how to search successfully for it.2

Search is so much better but it still has weaknesses

I’m sure many (most?) of you have a story like this of your own. Maybe more than one.

I’m old enough to recall the internet before the web and the web before there was real search. I mean, I remember when Yahoo! was a big advance in finding things.3 When Google came along it was a breath of fresh air.

But they haven’t solved the context problem. There is no way to say to Google “search algorithms and programming advice for these terms”, so I got buried in irrelevant junk. But even that might not have done, because I was getting swamped with irrelevant Stack Overflow links as well as irrelevant links from other domains.

The next big thing?


As I recall it the deficiencies of search were one of the reasons Jeff & Joel had for starting the project that became Stack Overflow.4 It is evident both from the duplicate problems (including the problems experienced user have finding duplicates that they know are there) on Stack Exchange sites and from this kind of anecdote that the search issue is only partly solved.

How is the AI coming along in the natural language processing domain? Any chance of context sensitive search anytime soon?

That wheel I reinvented?


Put in the form I encountered it
How many rows and columns should be used to automatically place $N$ plots on a page?
The requirements for solutions being
  • The plots will be arranged in a grid
  • Each plot will be the same size
  • All the plots should be displayed
  • The number of empty cell should be no bigger than necessary
  • The aspect ratio of the plots should be reasonable
As my display mechanism puts the plots on a page in landscape orientation and a typical page has a good aspect ratio for plots the aspect ratio requirement comes down to the grid needing the same number of division horizontally and vertically; and when they differ it should be more across than down.

Of course for $N$ a perfect square you’d just want the grid to be $\sqrt{N}$ on a side, and a little thinking about how sqrt() works with numbers that aren’t perfect squares and the need to never be too small can lead you to the algorithm. In pseudo-code (all two lines of it):
ncols := ceil(sqrt(N)) 
nrows := ceil(N/ncols)
the first use of ceil here insures that if we have an uneven grid the number of columns will be larger than the number of rows and if the grid comes out even it will be large enough. The second use of ceil insures the size is large enough if the grid is uneven,

The results are pretty good, but a human might choose differently. For instance the code generates a 3x3 grid for $N$ equal to 7 or 8 where a human might try out 2x4 to get fewer empty spaces and see how it looks. Something similar happens for 13-15.




1 Very little that is as short and sweet as this little trick is new. The more so when it can be expressed in a domain with a lot of traffic.

2 Many times on Stack Exchange sites I’ve seen users describe a well-know idea or problem in a lot of words because they didn’t know the name or phrase usually used for it. Finding good existing answers and other good resources on the internet would generally be easy if they knew the nomenclature. In this case though, there doesn’t seem to be a name batted about. Anyone know if the problem described in the last section has a conventional name?

3 Who else remembers WWWW? Don’t you wish you could forget?

4 In addition to the way existing models (forums, blogs, Wikipedia, Reddit, and so on…) were generally ill-suited to serving as a repository of good answers to good questions.

Written with StackEdit.

2020-03-15

Why I don't like telework

With the arrival in my patch of this year’s scary new disease,1 my manager has both drawn a line in the sand for when we go on mandatory telework and given us broad approval to make that call on our own.

Our mandatory conditions have not quite been met, but my household includes a person who easily qualifies for a compromised immune system and two more who are more vulnerable than average, so we’ve declared it time to button up.

Whee!

I’m a programmer these days and a team leader at that, so I’m a ideal candidate for telework, but I spent the early part of the afternoon getting a useful workspace set up2 because I haven’t asked my employer for any work-from-home time until now. I’ve done it before and I don’t actually care for it much.


Boundaries


There aren’t any natural boundaries when you work from home, which leads to two kinds of trouble.
  • To make it work you need the cooperation of anyone else present, they have to treat you as “at work” when you are working or your productivity will be nibbled to death by ducks.
  • The flip side is, of course, that you have to be off work the rest of the time, and that’s not necessarily easy either. Especially when you get one of the flashes of insight that will take “just a few minutes” to get into code and test. Write a note, drop it on the keyboard and come away.
I need a dedicated space with a door I can close. When I’m in there with the door closed I’m at work. When I’m not at work I don’t go in there. This worked pretty well the last time I did it, but that was when it was just me and my better half in the place. The toddler does not care for the idea that Daddy can be home and not available to her.


On the up side—maybe…


I’m planning on using my commute time to get a little less out-of-shape. At least as long as they don’t order a full time curfew.
And I’m going to grumble quite a lot if they do because my trotting down the sidewalks (which are habitually empty in this little piece of suburban wasteland) is a decidedly not an issue.



1 And it is more than a little scary and even more disruptive, but to my mind the run-in-circles-scream-and-shout responses on display tell you more about the level of either ignorance or willful denial that most people live with than it does about covid19. We’ve been lucky the last few times out the gate, but you should have felt put on notice, people. It’s not like you weren’t given any hints.


But then maybe we should excuse the person on the street. After all, we’ve seen more than a little of both in high places, too. Haven’t we?

2 And once I got that done I order a couple of upgraded bits and pieces. My work computer is an unreasonably beefy (six fast i7 cores and 32 GB of ram) and trim (less than 2 kg) laptop. Alas, while its keyboard and trackpad are functional, I don’t want to be stuck with them for long.

Written with StackEdit.

2020-03-07

Wrapper code #1

Extending, debugging, or even using legacy code comes with gotchas. I’ve been very lucky in my current job that I’ve been most doing new development for the past year and a half, but that doesn’t mean that the work is free of legacy code. We have preferred tools and a substantial internal library that dates back for decades.

For GUI’s we use Qt. This is not a rant about Qt (though I have several thing to say about it): the framework does what it is suppose to do and is well supported. But it has a lengthy legacy of it’s own and that shows through in places.

Low-level exposure


The place I want to talk about today is the test framework in general, and the output of the comparison macros QCOMPARE in particular. The message delivered to the user when a comparison test fails depends on the type. For some types (mostly built-in types and Qt’s library types) the message nicely shows you the values that were compared. For other type it just tells you that there was a failure and leaves you guessing.

Thankfully there are facilities for enabling the helpful messages on new types. (There are also facilities for providing the macro with a comparison more sophisticated than just operator== if you feel the need.

But there is a gotcha that is easy to overlook if you rely on helpful posts on the internet rather than Troll Tech’s documentation. (Which I do at times because to my mind the official documentation is consistently crystal clear after you understand the system but quite unclear when you first approach a new feature or system.)

The issue is that you are instructed to use toString to implement the pretty-printer and that appears to be an old and low-level facility. If you read the documentation you see
Note: The caller of toString() must delete the returned data using delete[].
In other words, somewhere in the bowels of the library, QTest is calling new[] to construct buffers to hold these strings and then they are returning the pointer that results directly to the caller. In 2020, writing code that exposed uses of new[] to external users like that would be a bit questionable (or at least it would require thought and justification), but that’s legacy code for you.

BTW, there is more to the note than I reproduced here. I’ll talk about the rest later.


Naive approach


could just live with this. Every time you needed to use toString you could write a little cluster of lines similar to
char * element = toString(value);
accumulatingString += '(' + element;
delete[] element;
That’s functional and it is exactly what I’d do if I were only calling toString once or twice in my entire code base. But if you called the library routine many times it would clutter up your code something fierce, and it would be easy to forget it somewhere. A more systematic approach is called for.

A wrapper class


Let’s construct a wrapper that automatically takes care of cleaning up.


Version 0: plug the leak


The purpose of the class is to manage the memory for us. We construct a class that captures the pointer and calls delete[]in it’s d’or. This is straight-forward
class StringCleaner //v0
{
public:
    StringCleaner(char * newedbuffer): m_ptr(newedbuffer) {}
    ~StringCleaner( delete[] m_ptr; )
private:
    char * m_ptr;
};
Then every time you call toString you wrap it up with our class
StringCleaner(toString(value));
which creates an temporary object that deallocates the memory when the compile destructs in at the end of it’s use. Nice.

Aside: I’ve tried so diligently to avoid using bare pointer in my code in recent years that actually manually cleaning up allocations in a d’tor like that feels like a return to my wild and irresponsible youth. Whee!

Version 1: make it useful


Alas, in version 0 we can’t use the string, which just wont do. We need to add some kind of access. You might consider a method like char * string() const; but the access syntax is icky:
StringCleaner(toString(value)).string()

So I prefer to use the * operator (reading the semantics like “look at” similar to the way you use it for an iterator or a pointer). We also want the returned value to be safe, so we return an object. That makes our class
#include "QString"
class StringCleaner // v1
{
public:
    StringCleaner(char * newedbuffer): m_ptr(newedbuffer) {}
    ~StringCleaner ( delete[] m_ptr; )
    QString operator*() const { return m_ptr; }
private:
    char * m_ptr;
};
and we use it with syntax like *StringCleaner(toString(value)).
For most purposes I prefer to work with and return standard library types over framework types, but the context for this is entirely within the framework, so using QString is natural (and as we’ll see it has a specific advantage here).

This is effectively what I did to solve the problem for work.

Version 2:


Finally, as things stand, we’re planning to call toString every time we use this routine. If we are willing to write templated code (and in this case you’re creating little mystery and adding quite modestly to the compile time, so go for it…), we can make that part of the class.
#include "QString"
#include "QTest"

template <typename T> class SafeToString // v2
{
public:
    SafeToString(const T arg) : m_ptr(toString(arg)) {}
    ~SafeToString() { delete[] m_ptr; }
    QString operator*() const { return QString(m_ptr); }
private:
    char * m_ptr;
};
which is simply used in place of toString.

Conforming to the API


The part of the note I elided above reads
Your implementation should return a string created with new[] or qstrdup(). The easiest way to do so is to create a QByteArray or QString and calling QTest::toString() on it (see second example below).
In other words if you are writing your own version of toString (which you are if you started wanting to get pretty printed fail messages when you use QCOMPARE with your own types), you need to duplicate the “mistake” of the API or risk some internal code calling delete[] on a buffer that belongs to an object of some kind with unpredictable (but generally bad) consequences.

Let us assume we’re trying to write a pretty-printer for a 2D lattice offset (so, an integer “vector”) called IVector. (I’m doing integers here to avoid having to discuss if you want and how you get approximate floating-point comparison for a compound type with QCOMPARE.) We’ll assume that IVector implements methods X and Y to access the components and has a working operator== which QCOMPARE can use. Finally, we want it to pretty-print as (x, y).

Following the advice in the note we write
char * IVector::toString()
{
    QString out = "(" + SafeToString(X()) + ", " + SafeToString(Y()) + ")";
    return toString(out); // Not safe, we need to return the new[]'ed buffer
}
We’ve cleaned up the allocation for each integer, and returned a single allocation for our string as required.