2026-02-28

Progress in the artificial "comprehension" of humor

A minor side-line in my on-going investigations of how well or poorly LLM perform has been teasing them with jokes. Back when I started they were consistently abysmal at explaining why jokes (even very simple ones) were funny, though they could fairly consistently categorize them into wordplay, dark humor and similar bins. Some models would confidently assure me that I wrote the jokes wrong or that they didn't make sense.1

When the models started to get a little better (able to do a passable job on the easy ones) I added a couple of relatively subtle jokes to my list and those promptly stonkered the largest models I had access to.

Until today.

Aside

My 64 GB RAM framework laptop (which I had been using to run mid-sized models locally) was stolen last fall and I haven't replaced it. I was unwilling to send my questions to the AI companies lest they train to the test; so I didn't have access to large or mid-sized models to try for a while. Then ollama started offering cloud services. I think it is much less likely that queries made through that channel are making their way back to the model vendors (ollama says they don't and I suspect the massive social cost of getting caught would deter them even if they were inclined to cheat), so I started trying the big models on their servers from time to time.

Meanwhile, back on the ranch

The big open-weights GPT (gpt-oss:120b) is a really impressive model—I've been using it for a lot of chat tasks I might previously have lobbed at ChatGPT—but it still failed at both my "trick" joke questions. In fact it gave almost the same wrong answers as phi4, gemma3 and so on. Maybe the result of a effectively common training set?

On the other hand its answers to the technical question I ask models were much more like those the big (300B+ parameters) leading edge models were giving six months ago than the one mid-size models (30-70B parameters) were giving while I still had the framework. So I concluded that progress was being made on several fronts including coding2 and what I call "deep-search"3 but not necessarily on making connections for which its training set had few examples.

Today, after doing my regular Saturday chores, I updated my ollama and looked at the recent models. Hmmm ... I'd seen a youtube about this gfm model and they made a variety of brags that would be impressive if true. So I tried it. It aced one of my favorite, easyish-but-out-of-the-mainstream coding questions. Not screaming fast, but fast enough that running the model and looking through the output would be faster than my solving it by hand if I hadn't worked it out in advance to use as a key for the test.

I got ambitious and asked it about the jokes.

One of the prompts includes a couple of hints: it notes that the joke was current "in mid to late 2011" and asks for an explanation of the physics behind the humor. For a human not familiar with the episode that generated the joke it would probably require some web searches to answer, but I suspect most educated people would get there. Gfm-5 if the first model I've put this to that nailed it.

The other prompt is a little more blind. The model has to recognize the relationship between the scenario in the joke and a much more common, but generally not humorous, scenario in fiction. Then it has to examine the change in the joke's version and work out why people do a double take and then laugh or groan. The answer I got from the model was not great, but it was the first LLM answer to ID the underlying story fragment, ID the crucial change, and write that the change represent an unexpected or twist ending. Close enough for government work.

Wow. Just wow.


1 Why do LLMs hate Moby Pickle and Smokey the Grape, anyway? Parke Godwin may have been dead for ten years, but he still had the power to baffle GPT-4 with children's riddles.

2 Generally something that is vaguely like a common example problem but differs in significant ways. My prompt for writing a wavefront model (.obj and .mtl files) is enough like the usual example (a cube) that many models start hallucinating cube half-way through.

3 That is, digging into a topic and giving me a top-level explainer such as you might get from a academic colleague in a different department who knows you are smart but not familiar with the domain.

2026-02-26

Modifier precedence in English

Languages often build more complex ideas by combining symbols for simpler ideas. How this works in any particular language is governed by some set of rules or another. It's pretty typical to divide the rules into (at least) two groups: some tell you what symbols can go where (grammar), and other tells you what it means when you put them there (semantics).

To clarify the difference, let's some example rules from each family across a set of natural and synthetic languages. Some example of syntactic rules are:

English
Prepositions are generally followed by objects or object phrases
Algebra
An equal-sign, other equivalence symbol, or inequality has an expression on each side either explicitly or implicitly
C
A declaration consists of one or more identifiers (with optional initializers) and information about their type1
Notably these are all about the grouping (and sometimes order) of language symbols (words and punctuation) in the text. By contrast, semantic rules are about the meaning of combinations of symbols.
English
Appending "ly" to many nouns converts them into associated adjectives2
Algebra
Compound expressions are reduced by respecting grouping symbols to identify sub-expression, followed by applying exponentiation, then applying multiplicative operations, and finally applying additive operations
C
A declaration gives the identifier(s) meaning within the program and instructs the compile on how it (they) can be used (the type information).

The fun part of this is that none of it is forced on it. Both sets of rules are devised by people for people reasons. In "natural" languages this comes about slowly and often organically for reasons that I certainly don't understand. Talk to a linguist. In programming languages some person or small group of people sat down and consciously decided them (though after the first couple of decades there came to be some broad consensus understanding to build upon).3

An advantage of someone making a deliberate decision shows up when you have complicated rules. The originator can write down an authoritative description of the method and that's that. For instance, the c-declaration int (*normalized_comp)(unsigned, const char *, const char *) may be pretty complex,4 but by looking up the procedure in The C Programming Language, the standard document or some website, we can know with certainty that "normalize_comp" is a pointer to a function taking three arguments (one unsigned integer, and two pointers to const characters) and returning a integer value.5

My beef today is about the rules for expressing frequency in English. In particular, we can use the "ly" suffix formation discussion about to modify a time-period into a frequency. Monthly. Daily.

Fine.

But we also have access to some prefixes that modify the number of thing: "bi" and "semi" for two and one-half are common in this use.

Alas, there is no authoritative author's document to tell us if "biweekly" should be interpreted as "twice weekly" or as "every two weeks". I'm fairly sure it's the former, but...


1 I'm going to ignore the wrinkle in which multiple identifiers in a single declaration can have different type when some of them are pointers. Those of you who need to know, know. And for the rest of you it doesn't add anything to the discussion.

2 I'm also going to largely ignore the irregularities of English. They are other ways to make adjectives but, again, it doesn't add anything to the discussion.

3 The algebraic symbols and order of operations is an intermediate case. It came to be through an organic process of push-n-pull in a community, but it was a small community and the candidates were generally formed deliberately by one or a few participants. Fun stuff.

4 This kind of thing is hard enough that a typical course in c includes a bunch of exercises in how to read these thing, but in case you fall out of practice there is a tool (cdecl) just to help you out.

5 Experienced c-programmers will likely intuit still another layer of meaning. Guessing that the pointer arguments are probably meant to point to character buffers (that is strings) rather than single characters, and the return value probably takes on values in the range -1 to +1 ala strcmp. The name of that layer is "idiom", and like in natural languages it is required to be really fluent. Our hypothetical experienced program might also have a guess about the initial argument (unsigned, so probably a size, so probably the max number of characters to compare...), but that is not so well established in the idiom.

2026-02-02

Hey, ya wanna help?

Here's the thing about smart phones: you cannot reasonably prop them between your shoulder and you ear. Not only will you get an instant muscle cramp (and probably scoliosis within minutes if you persisted), but the thing won't actually stay there. In that respect they really, deeply suck.

But whatever. Price you pay for the benefits of the form factor. Or whatever.

That said, this has a consequence: if someone calls and (a) you don't feel you can skip it, (b) you still need to have both hands for something (anything) other than the phone, and (c) you don't currently have your buds in then you must, in short order:

  1. answer the call
  2. switch to speaker
  3. prop the phone somewhere

Presumably the people who write the UI for these things have this experience, too.

But recently with my phone, when I tap to answer, the UI goes through some flashy, battery-draining, nonsensical animation which results in the hang-up control landing right where the change-the-audio-button was a moment before. I have no words.

Random musings

If you found yourself in the same room as whatever self-satisfied twit is responsible for foisting "liquid glass" on us and asked the two nearest other iPhone users if they wanted to help administer a swirly, what do you think the odds would be?

I put them over 2/3, personally.