2025-05-14

Getting to do grown-up things

We took the seven year-old to an art museum yesterday so that Mommy and Daddy could see a traveling exhibit focused on a single artist.

And not only was it not a disaster, after a while she disengaged from the iPad and took an active interest in the art. She has a favorite among the pieces we saw.1 She remarked on how interesting it was to see the studies the painter did toward another large canvas.

Victory!

Well, at least a conditional win. We didn't stay to see any of the permanent exhibits because she was clearly done at that point. But still.


1 It's one of my top ten, too!

2025-05-13

Maybe another example of my favorite weakness of LLMs?

We've flown out to stay with my in-laws for the best part of a week. Because reasons.

I brought my work computer in case I need to respond to some crises or just find some time to do useful work rather than burning PTO (which is always in short supply). Unfortunately, my work computer is currently an eight-pound, desktop-replacement beast. For which I really want to bring a spare display (and I have fifteen-inch, USB HD panel for that purpose). With all the weight and volume devoted to work computing I opted for the barest minimum of person hardware: my Raspberry Pi 400, its mouse and, associated cabling to talk to that same panel.

Now, the 400 is a surprisingly good computer for what I paid for it, but it's also quite limited. In particular it has only 4GB of RAM, and while the CPU supports NEON SIMD there is no dedicated graphics processor. It's completely unsuited to running LLMs without add-on hardware. But I got bored, so I decided to try anyway.

I was looking for relatively modern models that will fit in RAM and found Llama3.2:1b (7 months old) and codegemma:2B (9 months old). One conversation and one code support. Nice.

I've been speculating about down-quantizing some slightly bigger models, but in the mean time I started playing around with the baby llama. I didn't want to challenge it with my usual questions for probing the weakness of larger and more powerful models, so I started by just asking it to tell me about the band Rush.

As models do, it then produced a bunch of plausible sounding text. Some of it was incomplete and some was simply wrong. For some reason it thinks Rush won more major award than they did. It seems to have selected exclusively praising text to reproduce (which is fine by me: I like the band and don't find most complaints directed at them to be at all convincing), but there is a non-trivial amount of criticism and none of it is represented or even acknowledged by the answer.

Now, let's be frank: there is a limit to how much factual detail you can expect to be encoded in what is, after all, a somewhat niche cultural subject when you have only two billion parameters available to try to represent a full cross section of English language knowledge. No general purpose model of this size is going to get everything right on a query of that kind. So, I'm not saying that the fact of errors is surprising or even interesting. But I am wondering what we can learn from the nature of the errors.

So let's look more closely at some of the errors. In its "Discography" section the model lists

  • Start with School Days (1974)
  • Caress of Steel (1975)
  • Fly by Night (1975)
  • 2112 (1976)
  • Two-Side Streets (1978)
  • Permanent Waves (1980)
  • Moving Pictures (1981)
  • Hold Your Fire (1987)
  • Presto (1997)
  • Test for Echo (2004)
  • Snakes & Arrows (2007)
  • Clockwork Angels (2012)

which has both omissions and errors.

I haven't the faintest clue what to make of the title mistake for the first album (which was 1974 but was self-titled). It's not the title of any Rush song or album that I'm aware of and a casual search of the web doesn't turn up any song or album by that name at all. The search does turn up a reasonable number of hits on Stanley Clarke's 1976 album School Days and a number of suggestions that people just beginning to explore Mr. Clarke's music should "Start with" that very same album. Interesting, but not terribly enlightening.

Nor do I have any theories concerning which albums are omitted. It doesn't look to me like the list is either the commercial hits or the fan favorites, and beyond that I don't know what patterns to look for.

But I do want to talk about the 1978 entry in the model's list. The proper album title for that year is Hemispheres.The thing that strikes me here is proper title and the substitute text share a conceptual relationship to the number two (two halves; two ways). My (admittedly wild) guess is that we're seeing a side affect of the model's attention system identifying "two" as an important concept to use when trying to infer tokens.

If true that would be interesting, because the attention system is one of the significant ways in which LLMs differ from Markov Chain generators. But it may also be responsible for the models difficulty in know what is a quotation and what is commentary which I've already discussed in the context of scientific papers.