"Any fool can write code that a
computer can understand. Good programmers write code that humans can
understand." Martin Fowler
Consaider this (fictionalized) exchange about the abilities and limitation
of LLM coding tools
- Enthusist:
- See, the model can write code for you!
- Skeptic:
- Well, it does seem to be correct, but the
code quality isn't that great.
- Enthusist:
- It doesn't matter, those "code quality" rules were
for people, and now the machines will write all the code.
And that might be the case, but I see two potential barriers. First, even if
Claude is not as frail as—say—me, it does have
limitations. Second, we don't yet know what the unsubsidized cost of machine
coding will look like. And sitting at the intersection of those concerns: while
it is technologically possible to work around some of the limitations those
efforts probably drive the cost up. Alternately, if cost is your big concern you
could try running locally but the cost-capability trade-off could be
punishing.
Okay, the model can comprehend spaghetti code, but how much?
My experiment with machine coding have been pretty limited: any given
session starts with a few hundred lines of human written, or
machine-written-but-human-reviewed-and-tweaked code. Then as the session
proceeds, the machine generates a few dozen lines of new or changed code. The
generated code is sometime perfectly acceptable and sometimes not really up to
production standards. But the interesting thing is that I can continue with the
session without manually fixing so-so code generated in the last round, and the
machine can comprehend it just fine. Nice
That's promising, the model is not confused by a small amount of spaghetti
code. That said, I could handle that much (and in my younger days routinely
did). I fix messy code pretty early in my work not because I can't work with it
up to a point, but because I know it's easier to fix in small batches and it
also prevents me from forgetting to go back to some section or another.
But there is a limit to how much bad code a human can deal with, and we
should expect there to be a limit for the models as well (though it could by
a bigger limit). And that is where size fo the code base comes in. A
lot of coding class assignments and similar toy problems can be done in a few
hundred lines. A hobby project might be a few thousand lines. A small, focused
tool might be a few tens-of-thousand lines. A small production applications
(like my main project at work) might be in the low hundred-thousands. Any
seriously large project will exceed a million lines.
How much poorly organized code can a LLM (even a big one) actual maintain in
the long run? I don't think our current experience is much of a guide to that
yet.
And yes, we see regular reports of big, capable models working in largish
code bases, but the code they're starting with was (at least initially) laid
down by and organized by humans. And often humans with quite stringent standards
at that. The model being able to work in that code base is a different thing
from the model being able to work in one that isn't so carefully tended. Unless
the models can write quality code, or they are subject to constant supervision
by humans who can, projects they work on will succumb to entropy over time. And
then we'll see.
Can you afford it?
It's widely reported that all the big LLM providers (OpenAI, Anthropic, etc)
are still subsidizing users with venture capital. No surprise, really, the
lock-in-then-enshitify strategy has been the way to make serious money in tech
for decades. But it means that the last couple of years of corporate experience
in the economic viability of machine coding may not be representative of the
after-the-llm-market settles down situation.
Does code-quality matter to the model?
The interesting case for the code quality situation comes up when two things
are true: (a) the model can comprehend and work in "nice" code better than
"messy" code1, and (b) the cost situation isn't a complete blowout in
either direction. Under those conditions, there is a clear incentive to keep the
code base in a less expensive state. Either the models will have to learn the
lessons we teach to human beginners, or the human supervisors will need to
steadily manage the chaos introduced by machines that haven't learned better for
themselves. Honestly, the latter possibility sounds like a brutal and
unrewarding job.
It's early days yet, just you wait
We've been hearing some variation on "Today is the worst it will even be
again" since the whole idea roared into the mainstream a few years ago. And
that's not wrong, but it doesn't tell us anything about where (or if) the tech
will plateau.
It is certainly true that, as time passes (and money gets spent like its
going out of style), the state of the art models are getting bigger, and that
models of every size are getting more capable. Though I may play the grumpy
contrarian at times I am genuinely impressed. But if we're not looking at some
kind of run-away self improvement, then there will be both hard limits and
probably a cost function that grows rapidly as one approaches the hard
limit. Either could represent a barrier to the dominance of machine coding.
Or not. But I'm not taking the hype machine as a guide. Their incentives are
all too plain for them to be trustworthy.
Variations
Up to this point I've been focusing on understanding the future shape of
programming on the assumption that the models will be good enough to write all
the code. In that model, economically motivated programming will be shaped
mostly by economic factors: things like "how much do models cost compared to
skilled humans?" and "How does the cost of getting the model to do a nice job
compared to the cost of letting it spend time grinding through a huge and
complex context?". But we can also throw some other assumptions against the wall
and see what the resulting mears look like.
Models master the small picture, but not the wide view
We can image that the models never really get the hang of both understanding
a big project on the scale of interacting modules and then writing code on a
modest scale with that design in mind. That when they work in a big project
without supervision they make a mess by violating architectural separation that
were designed in, by losing track of existing utility code and duplicating it
locally, by putting routines in less than optimal parts of the module tree, and
so on.2 In a world like that we might need to have human supervisors
even if we're handing most or all of the coding over to the machines. And the
codebase need to be comprehensible to those supervisors, so code quality will
still be a thing that people care about.
The state-of-the-art is good enough, but it's expensive
In another scenario, the best models available can produce and maintain
software as well as a expert team of programmers, but the cost per unit-code is
higher than a typical human team supported by less able models. We might see the
most valuable project done almost entirely by machines, while more marginal
projects use non-trivial human input. We'll probably have a smaller industry and
with different skills in demand: humans will need to drive and supervise models
which are doing most of the grunt work.3 And here the code quality
issue is driven by the understanding of who need to comprehend each individual
project: if it's only models then we just pay what it costs if the readers
include humans then we exercise the discipline. And man, if you have to
downgrade a project from machine-only to machine-assisted there is going to be a
heck of a bill...
A word of caution
Even if machine code is objectively worse in some sense it could crush human
coding as an economic activity. There are multiple cases from the industrial
revolution of craft industries being pushed almost completely out of existence
by worse-but-cheaper machine-made goods. The small number of practitioners that
stayed in business were serving either luxury markets or special use cases and
many of them came under increasing pressure as the industry got better at the
task. That is a thing that could happen to human coders, too.
1 I haven't seen anyone on-line addressing this question yet, and
am just getting started on my own investigations of machine assisted code so I
don't even have a feel for it yet. But it feels right to me: with a messy code
base you're going to need more context for for the machine to address any given
problem, and context means processing.
2 You know, like people do? Occasionally I assign people working
on my project (including myself, of course) to look through the module they're
working in, find any stray utility code and see if it needs replacing with
centralized tools, or ought to be moved to the make it more widely
available. Likewise, I look closely at change sets that touch build files for
evidence of potential harmful added inter-connectedness. It's an ongoing effort
because it's often easier to do the wrong thing than the right one.
3 My biggest worry in that kind of scenario is what does the
pipeline for training new human experts look like. It's not clear that anyone
knows yet and there could easily be a lean time as existing experts retire and
an insufficient number of new experts are emerging to take their place.