When I started with my current employer they asked me to do a "story-board version" of a tool they were pitching to a customer by way of a warm up project. When the customer picked it up they kept me on as the lead designer and coder.1 This was the first time I'd ever been put in charge of the basic decision making for a shared effort or of something intended as "a product", so unsurprisingly, I made some mistakes. There are facts about the codebase that I am not proud of.2 In a couple of cases I cringe inside when I have to explain them to newcomers.
Interchange format
We run a significant amount of computation in separate OS processes.3 Honestly, to an old Unix hand like me that feels completely natural and it's not much harder than threads except for one thing: you can't just hand the worker a binary config object, but have to invoke some kind of interprocess communication. I choose to pass a serialied config object.
Now, a wise developer would have picked an existing serialization library. Obviously. But I didn't understand any of the existing options and saw only obstacles, so I rolled my own.4
After all, I had very simple needs, so I had it done in an afternoon. Tests
included. Then time passed, requirements changed, and the format grew more
elaborate. The code got longer and spawed more templates and more
specializations. Writing the tests got harder. We began using it for a
save-current-configuration format. I learned how to use
the type_traits
header badly. Then better. That heloped with the
tests, but only a little. Still more requirements loomed and the code became a
no-go area for everyone but myself and my best junior dev (now really a
mid-career dev).
For several years I've been telling people that if I had it to do over again we'd be using JSON.
Little did I know.
Careful what you wish for
A couple of months ago we found out that a subset of the tool is being used as a component in a multi-step chain by a different division at the customer's organization. Last month we got funding to support that effort. This month they asked us to replace the custom interchange format with JSON.
We picked a library and went to town. Progress is being made, but it's not quite all sunshine and butterflies.
Why JSON is a really bad choice for this application
The thing to understand up-front is that this is an application in physical modeling. We compute an approximation to the behavior of the real world in a highly specialized domain and we do it fast and with a friendly front end. We don't need "just so" fidelity, but it does need to be reality based.
That means numbers. Almost always in floating point representations.
Nearly all modern platforms support IEEE-754 floating point numbers, which I have mentioned before. A feature of that standard is the ability to represent several special cases without demanding extra space. The available special cases go by three names: "infinity", "negative infinity", and "not-a-number" (AKA NaN). The infnities are generated when you do things like divide a finite value by zero or take the logarithm of zero (which you'd expect from real math) but also when you do things "close" to the real ones like dividing a big enough number by a small enough one. NaNs come out of weird operations like dividing zero by zero, or taking the arcsine of two.
Programmers have limited control over the users and what kind of input they generate so these kind of values pop up from time to time, and we have to decide how to deal with them. It's one of the annoyances of the job. Sometimes you want to do one thing with infinities something else with NaNs.
My custom serialization format handles those values gracefully, but JSON
does not:5 it serializes all of them as null
.
Not only is this ambiguous, but readers are allowed to simply fail when they
encounter null
where they were expecting a number. In particular,
the library we'd started using throws an exception in this case. Really. This
may or may not be reasonable in some domains but it is clearly on error in
scientific computing.
Now what?
The thing is, it's not going to get fixed. You'd have to add new tokens to the grammar which would break many (probably billions) of deploy instances. When one of the selling points of the format has been stability. Total nonstarter.
I'd been planning on deprecating the old interchange format: stop generating it (maybe even remove the generation code) but keep the reading code around for a while. Becuase of the save files, naturally.
Now I'm not sure I want to switch. Maybe I just want to support JSON as an alternative. And support extra code for the foreseeable future. Sigh. Or I can live with not being able to pass well defined specials from component to component. Bigger sigh.
1 Actually, at first (and from time to time since then) I was the only person on the project. But still.
2 And some I am proud of. And a couple I'm ambivalent about because they may be necessary, but they are complex and hard for newbs to wrap their heads around. Well, they were hard to figure out in the first place, too. At least I wrote design documents for the features, so I have an answer when they sak "What were you thinking?"
3 A legacy library we rely on for one of the main computationally bound features of the application is single-threaded and not re-entrant; to let our customers take advantage of their beefy, many-cored, analytic workstations we have to get the OS to isolate instances.
4 Seems to be a habit, doesn't it? Though, in my defnense the logger has been a success. It has ridden out expanding needs with minimal maintenance and without growing out of hand.
5 Interestingly, JSON is a strict subset of JavaScript which does support the specials. To judge from the few things I've seen from the creator, it's likely they were sacrificed on the alter of simplicity. Perhaps a lamb too far, that.