2025-06-17

An orange ring for git submodules

At work we have a common library, call it libThing, that underlies several of the products we produce for one of our big customers. We bring it into the projects that use it with git submodules. And that's a bit of a problem.

Let me explain.

You see, the design of the submodules facility is clearly predicated on an understanding that the submodule is a separate thing, and is not edited in situ by a programmer working on the super-project. If the sub-project needs updating, it is assumed, you will send the maintainer a well defined change request, wait for it to happen, and point your super-project at the updated version and go about your business.

Which is what you would expect if the sub-project belongs to someone else.

And some of our changes look like that: "Folks, we have a note from the customer. There's a new file format for specifying widgets. Someone needs to update the WidgetLoader in libThing to handle it. Joe, you've worked in that module recently, can you get to it this week?". Fine. Joe updates libThing, pushes to the reference repository and the next release of each of our project can manage the new Widget files. Nice. And exactly as Linus envisioned it.

On the other hand a lot of times we find out that libThing needs updates because we're in the course of making changes to one of the project that use it. By working both side together we can work through the trade offs dynamically. Its more natural, and probably faster, to just work on them together. Even though submodules doesn't encourage it.

Buuuut ... if you're not careful, you'll make one or more commits to the sub-project in a detached head state.1

The rest of this post is a recipe getting safely back to a happy state after you make this mistake.

Advancing the branch your aren't on

If you had done this right, you'd have put the subproject on the branch before you started editing and the branch would have advanced as you made commits. We want to get the repository into the state it would have had.

  1. Determine the name of the branch you're suppose to be on by looking in the .gitmodules file of the super-project. Remember that for later.
  2. Give the current state of your work in the submodule a temporary name with git checkout -b life-preserver2
  3. Get on the right branch with a git checkout to the branch you found in step 1.
  4. Fast-forward with git merge life-preserver (and it should be a fast-forward merge; if not you should probably take stock at this point.)
  5. Check that nothing is missing by examining the commit tree. I like gitk --all.
  6. Assuming all is well, dispose of the evidence with git branch --delete life-preserver.
  7. Detach the head again (at least if you're done) git checkout --detach HEAD.
  8. Pretend you're the kind of coder who never misses the little details.

1 If you don't work with git, then this post probably isn't much use to you, but short-short version:

  • In git a "branch" is a name that refers to some remembered state of the project. A project can have a lot of branches, and you can start with one state and make different changes to the project and have both of the remembered as different branches. That is, they can split off like the, well, branches of a tree.* They can also join up which is called a "merge".
  • You can tell git to remember a new state of the project. That's called a making a "commit" and each commit knows about the one(s) it came from, so the software can navigate back in time.
  • A commit can be the target of zero, one, or more branches.
  • You can be "on a branch" meaning that (a) you are working on the state of the project remembered by that branch name and (b) git has a record of which branch you're "on".
  • When you're on a branch and you make a commit, git changes the association of branch name to the new commit. Remember that commits know what came before, so you can still go back, but the name now refers to the new state of the project.
  • If you're not on a branch you are in a "detached head state" which means git knows which version of the project you start with but doesn't know a branch. Any commits you make in this state are nameless because there is no branch name to move forward.
  • * In computerese it's actually a directed acyclic graph (DAG), but that's neither here nor there.

2 Yes, I have a name I use for this. Not that it happens often or anything.

2025-05-14

Getting to do grown-up things

We took the seven year-old to an art museum yesterday so that Mommy and Daddy could see a traveling exhibit focused on a single artist.

And not only was it not a disaster, after a while she disengaged from the iPad and took an active interest in the art. She has a favorite among the pieces we saw.1 She remarked on how interesting it was to see the studies the painter did toward another large canvas.

Victory!

Well, at least a conditional win. We didn't stay to see any of the permanent exhibits because she was clearly done at that point. But still.


1 It's one of my top ten, too!

2025-05-13

Maybe another example of my favorite weakness of LLMs?

We've flown out to stay with my in-laws for the best part of a week. Because reasons.

I brought my work computer in case I need to respond to some crises or just find some time to do useful work rather than burning PTO (which is always in short supply). Unfortunately, my work computer is currently an eight-pound, desktop-replacement beast. For which I really want to bring a spare display (and I have fifteen-inch, USB HD panel for that purpose). With all the weight and volume devoted to work computing I opted for the barest minimum of person hardware: my Raspberry Pi 400, its mouse and, associated cabling to talk to that same panel.

Now, the 400 is a surprisingly good computer for what I paid for it, but it's also quite limited. In particular it has only 4GB of RAM, and while the CPU supports NEON SIMD there is no dedicated graphics processor. It's completely unsuited to running LLMs without add-on hardware. But I got bored, so I decided to try anyway.

I was looking for relatively modern models that will fit in RAM and found Llama3.2:1b (7 months old) and codegemma:2B (9 months old). One conversation and one code support. Nice.

I've been speculating about down-quantizing some slightly bigger models, but in the mean time I started playing around with the baby llama. I didn't want to challenge it with my usual questions for probing the weakness of larger and more powerful models, so I started by just asking it to tell me about the band Rush.

As models do, it then produced a bunch of plausible sounding text. Some of it was incomplete and some was simply wrong. For some reason it thinks Rush won more major award than they did. It seems to have selected exclusively praising text to reproduce (which is fine by me: I like the band and don't find most complaints directed at them to be at all convincing), but there is a non-trivial amount of criticism and none of it is represented or even acknowledged by the answer.

Now, let's be frank: there is a limit to how much factual detail you can expect to be encoded in what is, after all, a somewhat niche cultural subject when you have only two billion parameters available to try to represent a full cross section of English language knowledge. No general purpose model of this size is going to get everything right on a query of that kind. So, I'm not saying that the fact of errors is surprising or even interesting. But I am wondering what we can learn from the nature of the errors.

So let's look more closely at some of the errors. In its "Discography" section the model lists

  • Start with School Days (1974)
  • Caress of Steel (1975)
  • Fly by Night (1975)
  • 2112 (1976)
  • Two-Side Streets (1978)
  • Permanent Waves (1980)
  • Moving Pictures (1981)
  • Hold Your Fire (1987)
  • Presto (1997)
  • Test for Echo (2004)
  • Snakes & Arrows (2007)
  • Clockwork Angels (2012)

which has both omissions and errors.

I haven't the faintest clue what to make of the title mistake for the first album (which was 1974 but was self-titled). It's not the title of any Rush song or album that I'm aware of and a casual search of the web doesn't turn up any song or album by that name at all. The search does turn up a reasonable number of hits on Stanley Clarke's 1976 album School Days and a number of suggestions that people just beginning to explore Mr. Clarke's music should "Start with" that very same album. Interesting, but not terribly enlightening.

Nor do I have any theories concerning which albums are omitted. It doesn't look to me like the list is either the commercial hits or the fan favorites, and beyond that I don't know what patterns to look for.

But I do want to talk about the 1978 entry in the model's list. The proper album title for that year is Hemispheres.The thing that strikes me here is proper title and the substitute text share a conceptual relationship to the number two (two halves; two ways). My (admittedly wild) guess is that we're seeing a side affect of the model's attention system identifying "two" as an important concept to use when trying to infer tokens.

If true that would be interesting, because the attention system is one of the significant ways in which LLMs differ from Markov Chain generators. But it may also be responsible for the models difficulty in know what is a quotation and what is commentary which I've already discussed in the context of scientific papers.

2025-04-27

Why did all the numbers move to the back of my credit card?

One of my credit cards has a feature that baffles me: all the numbers are on the the same side1 of the card.

Now, credit cards present an interesting trade-off problem between security and convenience. Bruce Schneier spends some time on the matter in one of his books. Maybe Liars and Outliers. A certain level of loss to fraud is accepted to ensure that the system is convenient and ubiquitous; and the history of the technology is an epic tale of continual re-tuning of the risks.

In the very early days the system was surprisingly simply, relying on people and paper. Really. By the time I came on the scene, the cards had raised numbers on the front to impress multiple copies of the transaction record on carbon papers. The "chunk, chunk" of the cashier making the impression was the sound track of eighties retail. Later we got smart cards and now touchless payment.2

Hey, Ma! Look at me! I'm living in the future!

And somewhere along there (late '90s) the "card security code" was added to the back. It didn't show up on the mechanical impressions (still in use then in the US, if not in really advanced parts of the world) or in a single-sided xerox of the card, so it made it slightly harder for bad actors to capture enough information to create fraudulent charges. Not really hard, mind you. Just hard enough. That's one of the surprising things about this story.

But now I have this card that has the CSC printed right next to the main card number and expiration date.3 Huh? Is that Okay because merchants are now using billing zip code as an additional (if very weak) authenticator? Or is it something else?


1 The side with the magnetic strip and opposite the conductive pads which I would describe as the back, but it's the fact that they're all on the same side that bothers me.

2 The touchless systems marks one of the first moments I started to feel like technology was leaving me behind. Another unanticipated milestone.

3 It also has the main numbers printed flush on the surface of the card: they're not raised.

2025-04-16

This hard programming problem keeps butting its head into my workflow!

The design (sometimes architecture) of a non-trivial program's source code and build is replete with problems. Multiple books are written on the matter every year. The code goes into many different files which are stashed in multiple folders. If you're paying attention the folders represent elements of a modular build. Mostly.

Architecture level problems come in lots of kinds, but the ones I want to focus on today are excessive interconnectedness of logical units, and excessive interconnectedness of build-time units. The former makes the code hard to reason about because when I'm look at a bit of code here I may not know if some bit of code in a different folder is going to reach in and change something when I least expect it. The latter driving up build times. Both can make changes that looked initially confined to a single file cause adjustments in many other files.

So now we have two things to keep in mind:

  1. The organization of files in the file-system should parallel the build organization.
  2. File should know as little as possible about other code units and especially about units in other build modules.

So, when one bit of code needs to know a lot about another bit of code it goes nearby. Maybe in the same file, and almost certainly in the same build module. Fine.

Now consider the "extract function" feature of your programming environment.1 It's a shining example of good tooling as it encourages and helps with refactoring. But it usually crashes me right out of the zone. Because if I'm creating a new "piece" of code and I have to find it a home. And that can be hard.

Some cases are easy.

  1. If it's coming out of object's instance function (AKA method), and it needs to access instance state then is must live in the object.
    1. If it does not respect the class invariants then it must be private. Except that if it is virtual then you may want it to be protected2
    2. Otherwise it can be public. But start with it private until you know of a use case: every API you expose is one you have to maintain.
  2. If it is reaching into another class's object, then consider moving it to that class.

But after that it starts getting tricky. I mean, it can be a free function or a class static, but where should it live?!? If it is highly specific, perhaps keep it in the file it's in (and perhaps even private), just with a name now. If it represents a specific behavior within the domain modeled by this module, make it it accessible throughout the module (new files, class static, who knows?). If it's very general purpose, first look to see if you missed it in the standard library or any frameworks you're using, and if not put it in your project wide utility code. Maybe?

In any case, having to make these decisions often knocks me out of my flow. And since it doesn't have a name that I know of I haven't been able to google advice.


1 Obviously you don't actually need a tool for this. I got on fine with cut-n-paste for decades. Still do when certain IDEs (no names, Qt Creator you .... you ... wonderful program) just refuse to have it enabled for some mysterious reason. But I will sure as heck use the tool if present.

2 I'm largely using the C++ nomenclature here because that's the sea I swim in daily, but I think the considerations apply more broadly.

JSON Update

A followup to a recent rant.

First an admission. At least one thing I complained about was not a feature of JSON per se but of the library (nlohman::json AKA JSON for Modern C++) that we're using. In particular the behavior of serializing a floating point IEEE-754 special to null, and then throwing when trying to deserialize a null into a floating-point variable is library specific. And they stand by it. Grrrr!

Second, by defining a strong-typing wrapper I was able to (de)serialize those values from-and-to strings. I even provided multiple acceptable spellings on the deserialize path. Then by writing explicit to_json and from_json routines for my objects (rather than relying on the handy-dandy macros in the library) I was able to apply the strong-types only at the point of (de)serialization reducing what initially looked like a major intrusion into the code. Yeah.

It's not a complete win, however, because I have a std::variant in the code-base. The usual advice for deserialize a variant with the library is to detect the json-type of the value1 and use that to know which member type to set. Only the number can give rise to a string value. So I had to explicitly (de)serialize the current-type, too. Bletch!

Long story short. We're going ahead with this and I may replace ny custom interchange format after all. Just because other users will stand a better chance of decoding the JSON.

But the lack of infinity and not-a-number is still a bug and still renders the format poorly suited for use in numeric computing.


1 JSON for Modern C++ uses a domain-model object as an intermediary, so this is relatively easy.