2024-07-14

Color me impressed with up-to-date LLMs

I only recently started playing around with LLMs, and I started with somewhat dated models.1 Yesterday I signed up for ChatGPT and I can see why some former sceptics have gone fully bought in. The 4o model is impressive.

My main investigative tool was to pick things that I know something about, and ask the model a question from those fields neither trivial nor really hard but requiring some nuance. I tried to select ideas with a range of popularity because I believe that the volume of writing on a subject may influce the "skill" the model exhibits. My subjects so far:

  1. Discuss the Epistemological implications of Godel's incompleteness theorem.
  2. Summarize Freemon Dyson's "Time Without End" paper. (If the model did well on that I followed up with a question on which of the papers conclusions had been overcome by more recent developments in cosmology.)
  3. Summarize Lamb's "Anti Photon" opinion piece.
  4. Discuss the similarities and differences between the movies Ghost Dog and Leon: The Professional.
  5. Explain the continued strong culture of short form composition in speculative genre fiction as compared to general fiction
  6. Differences between classical and modern guitar
  7. Prepare a packing list for a lightweight wilderness survival kit to carry on my day hikes in the desert southwest.

My highly unscientific observation suggest that items 1, 6, and 7 are the more popular topics and should provide mode source material for the models. The movies question (4) is weird: each movie has been extensively discussed so there is a lot of source material for each movie, but I'm not sure how much human generated comparison text is out there. Then I would rank 2 and 5 a more common than 3.

I didn't get any major factual errors from ChatGPT4o. I's main fault was that much of the writing was bland and characterless on all responses. Of course, the desireablity of "character" is context dependent: an encyclopedia or other raw factual source should be pretty neutral and that's what most of the responses sound like. ALso I didn't do any prompt engineering to elicit character: I just gave the model the question and let it go.

I do have two speific comments on ChatGPT's responses. First, it's interpretation of Lamb's paper if different enough from mine that I wasn't particularly happy with it, but I wouldn't be surprised to learn that other trained and capable physicists subscribed to that interpretation. Second, the answers it gave for the movies really sounded like they were drawn almost entirely from reviews for each film in isolation:2 Ghost dog this; Leon that. Over and over again.


1 My initial goal was to investigate the possiblity of using a LLM as a purely local coding assistant (that effort continues). At work our security guidelines imply that no internal code or design details should be release to the wider world or sent over an unsecured internet connection. Obviously just using CoPilot from VS Code is out. But I started this investigaction on a personal machine to find out if the tooling would meet requirements before going in search of management buy-in. Alas, my "best" machine is totaly unsuited for even late 3rd gen models. I can run models up to about 1 billion parameters smoothly and easily. A 7B parameter model (zephyr) is too slow for use in a work-flow (seconds-per-word), though it's fast enough to learn about the models. I have put a 22B parameter model on the machine, but it runs at minutes-per-word.

2 Indeed that observation is what prompted my comments above.

2024-07-13

Goto is flow control!

I'm reading The Legacy Code Programmer's Toolbox: Preactical Skills for Developers Working with Legacy Code by Nathan Boccara. I'm not far enough in to have a settled opinion of the book, but I think I'm far enough along to describe it a "promising" at least.

Anyway, I'm in the first section of the book where he talks about understanding code you've just encountered. And I've already hit my second "Well, why didn't I think of that?" moment1 where he suggests filtering functions to only display the flow control as a way of getting to know them. Fantastic idea and I feel like a dunce for leaving that option on the table for years, but I have a bone to pick with his implementation in the example he provides.

His example works on a piece of C++ code from an open source project, and he filters for

  • if
  • else
  • for
  • while
  • do
  • switch
  • case
  • try
  • catch
which probably seems like a good list at first glance, but the function the very first flow control token in the example function is a label! And yeah, that means there is a goto further down.

Now, I almost never use goto, and when I do it's usually to jump to some clean-up or post-processing operation from nested scope. This one jumps beck outside a loop to do some re-initialization. Who knows if that is justified. Maybe it spares some extra nesting or multiple condition variables or something.

Whatever, I just think we ought to include goto and labels as part of the set "flow control" to filter on. And the same can be said for break, continue, and return.

Now maybe the author has a reason for the omission and just didn't tell us, and I'm guessing it makes little difference to the utility of the procedure, but I just had to get it off my chest. We now return you to your regularly scheduled life.


1 Thanks, Nathan!

<

2024-07-03

UPOL

We're on vacation this week. Out on the east coast to visit some of my wife relatives. And there is, of course, a rental vehicle involved. Now we take several short trips a year (to visit medical specialists, mostly), so we cycle through various rental options which is always nice in the sense of getting to know what kinds of mid-price car choices are out there but we usually only have them for a couple of days at a time. So we don't usually both trying to figure out the car's console.

But this time is different for a couple reason.

First because we're staying a bit over a week, and second because there is absolutely no place to prop a phone running nav where the driver can actually see it.

So we've got a phone synched to the thing and can display nav data. Yeah.

Unless, that is, you want to change the AC settings. Or mess with the audio. Or someone sends you a text. Or a myriad other things that might happen. Because the designers of this thing have moved very nearly everything that used to have a button or switch onto that single center panel and let each and every function take over the whole display whenever they are active.

I have christened it the Universal Panel Of Lose.

Now, it could be worse. I known some Tesla owners, and Muskmobile is trying to remove the stems from the steering column in favor of a cleaner look.

And I'm not against manufactures trying to improve the user interface of cars. There is no reason to think that the layout we're used to is ideal. Not even for the central controls, but especially for the lots of little auxiliary things that have accreted over the decades. Even in my life time I've gotten to watch the interface for cruise control appears in a variety of clunky states and improve by little increments until now I see one of two pattern that each works pretty well. Good job, guys. Much appreciated.

But it is painfully clear that the testers of the UPOL (which I known has infested multiple major brands all up and down the price spectrum) never actually took it for a test drive on a route they didn't know. Never needed to use the nav even while they tried to communicate with the people they were suppose to be meeting at a new-to-them destination. Basically never really ate their own dog-food.