2024-10-20

Elaborate solution to a very minor pain-point

Over the last few years I've repeatedly encountered an annoyance in my work projects. It's not a practical problem, you understand, merely a stylistic one. I mean, when the issue rears it's ugly head I pause, curse, write the code that works just fine but offends my sense of modern C++ idiom, and move on.

It's just that the issue lives rent-free in my head.

So here's the set up: I've spent the last six years upgrading my C++ skills to the "modern" era which means, amongst other things, preferring ranged for loops and routines from the algorithms header to explicit loops using indices, pointers or even iterators.1 Only my code has to interact with libraries and plugins written in C or exposing C-centric APIs, and those interactions often involve linked-lists. Which you can't put into ranged for loops or algorithms.

So, in my copious spare time, I set out to write a set of iterator templates that you can bolt-on to an arbitrary C linked list to support using modern C++ idioms with the list. That project is now approaching minimum-viable-product levels of completness, though there is a lot left to do.


1 That kind of preference is, of course, not absolute. Sometimes the old, explicit loops have some over-arching advantage and you use them without guilt.

2024-09-10

ODR? Fuggedaboutit!

I was hacking together a thin patch at work today. Tyring to get the logging calls from a library to play nice with the rather hacky logging infrastructure of a legacy project we maintain. Basic translation worked fine, build support took a couple of tries but seemed to be coming together, but then there was the matter of Level enum and getting string representations for the values ... so I shoved a copy of the original enum and a definition of levelText(Level level) into the header.

Yeah, the definition. In the header. I should've know better being, as I allege I am, a big-boy programmer. Well, nature took it's course and the compiler complained: I was violating the One-Definition Rule (oh, holy of holies!) on a massive scale.

But I was ready to go to lunch and I didn't want to deal with hard problems like where in the project tree to put the source file that ought to hold the implementation and how to add it to the build system and so on. It was at this point that some rebel-without-a-cause, burn-it-all-down inner-self threw up a suggestion: why not try something like:

template <int=0>
std::string levelText(Level level) 
{ 
    /*...*/ 
}

I mean, what could it hurt?

Rhetorical question, of course, but let's list some of the failing of this cutesy little trick (which compiles):

Hacky, non-idiomatic, confusing
Any C++ prgrammer who sees a template declaration will expect you to do something with the template parameter, which we don't. Time will be spent look for the use and trying to suss out what the paramter might mean. Worse, if they're experienced they'll notice that it's a non-type parameter and be expecting some kind of clever (read tricky) metaprogramming, which there isn't.
Contibutes to slow build times
Evey template read takes a little more time and little more working set. So does every template instantiated. I menetioned that this is a legacy project, right? It already takes 12-15 minutes to build on a fast hexacore machine with lots of memory.
Contributes to bloated binaries
Every template instantiation creates a separate bit of code. We're writing that function for every source file that includes the header.

Once I'm fueled back up I can go back to the office and do it right. But now I know a new stupid C++ trick. Yeah!

2024-07-14

Color me impressed with up-to-date LLMs

I only recently started playing around with LLMs, and I started with somewhat dated models.1 Yesterday I signed up for ChatGPT and I can see why some former sceptics have gone fully bought in. The 4o model is impressive.

My main investigative tool was to pick things that I know something about, and ask the model a question from those fields neither trivial nor really hard but requiring some nuance. I tried to select ideas with a range of popularity because I believe that the volume of writing on a subject may influce the "skill" the model exhibits. My subjects so far:

  1. Discuss the Epistemological implications of Godel's incompleteness theorem.
  2. Summarize Freemon Dyson's "Time Without End" paper. (If the model did well on that I followed up with a question on which of the papers conclusions had been overcome by more recent developments in cosmology.)
  3. Summarize Lamb's "Anti Photon" opinion piece.
  4. Discuss the similarities and differences between the movies Ghost Dog and Leon: The Professional.
  5. Explain the continued strong culture of short form composition in speculative genre fiction as compared to general fiction
  6. Differences between classical and modern guitar
  7. Prepare a packing list for a lightweight wilderness survival kit to carry on my day hikes in the desert southwest.

My highly unscientific observation suggest that items 1, 6, and 7 are the more popular topics and should provide mode source material for the models. The movies question (4) is weird: each movie has been extensively discussed so there is a lot of source material for each movie, but I'm not sure how much human generated comparison text is out there. Then I would rank 2 and 5 a more common than 3.

I didn't get any major factual errors from ChatGPT4o. I's main fault was that much of the writing was bland and characterless on all responses. Of course, the desireablity of "character" is context dependent: an encyclopedia or other raw factual source should be pretty neutral and that's what most of the responses sound like. ALso I didn't do any prompt engineering to elicit character: I just gave the model the question and let it go.

I do have two speific comments on ChatGPT's responses. First, it's interpretation of Lamb's paper if different enough from mine that I wasn't particularly happy with it, but I wouldn't be surprised to learn that other trained and capable physicists subscribed to that interpretation. Second, the answers it gave for the movies really sounded like they were drawn almost entirely from reviews for each film in isolation:2 Ghost dog this; Leon that. Over and over again.


1 My initial goal was to investigate the possiblity of using a LLM as a purely local coding assistant (that effort continues). At work our security guidelines imply that no internal code or design details should be release to the wider world or sent over an unsecured internet connection. Obviously just using CoPilot from VS Code is out. But I started this investigaction on a personal machine to find out if the tooling would meet requirements before going in search of management buy-in. Alas, my "best" machine is totaly unsuited for even late 3rd gen models. I can run models up to about 1 billion parameters smoothly and easily. A 7B parameter model (zephyr) is too slow for use in a work-flow (seconds-per-word), though it's fast enough to learn about the models. I have put a 22B parameter model on the machine, but it runs at minutes-per-word.

2 Indeed that observation is what prompted my comments above.

2024-07-13

Goto is flow control!

I'm reading The Legacy Code Programmer's Toolbox: Preactical Skills for Developers Working with Legacy Code by Nathan Boccara. I'm not far enough in to have a settled opinion of the book, but I think I'm far enough along to describe it a "promising" at least.

Anyway, I'm in the first section of the book where he talks about understanding code you've just encountered. And I've already hit my second "Well, why didn't I think of that?" moment1 where he suggests filtering functions to only display the flow control as a way of getting to know them. Fantastic idea and I feel like a dunce for leaving that option on the table for years, but I have a bone to pick with his implementation in the example he provides.

His example works on a piece of C++ code from an open source project, and he filters for

  • if
  • else
  • for
  • while
  • do
  • switch
  • case
  • try
  • catch
which probably seems like a good list at first glance, but the function the very first flow control token in the example function is a label! And yeah, that means there is a goto further down.

Now, I almost never use goto, and when I do it's usually to jump to some clean-up or post-processing operation from nested scope. This one jumps beck outside a loop to do some re-initialization. Who knows if that is justified. Maybe it spares some extra nesting or multiple condition variables or something.

Whatever, I just think we ought to include goto and labels as part of the set "flow control" to filter on. And the same can be said for break, continue, and return.

Now maybe the author has a reason for the omission and just didn't tell us, and I'm guessing it makes little difference to the utility of the procedure, but I just had to get it off my chest. We now return you to your regularly scheduled life.


1 Thanks, Nathan!

<

2024-07-03

UPOL

We're on vacation this week. Out on the east coast to visit some of my wife relatives. And there is, of course, a rental vehicle involved. Now we take several short trips a year (to visit medical specialists, mostly), so we cycle through various rental options which is always nice in the sense of getting to know what kinds of mid-price car choices are out there but we usually only have them for a couple of days at a time. So we don't usually both trying to figure out the car's console.

But this time is different for a couple reason.

First because we're staying a bit over a week, and second because there is absolutely no place to prop a phone running nav where the driver can actually see it.

So we've got a phone synched to the thing and can display nav data. Yeah.

Unless, that is, you want to change the AC settings. Or mess with the audio. Or someone sends you a text. Or a myriad other things that might happen. Because the designers of this thing have moved very nearly everything that used to have a button or switch onto that single center panel and let each and every function take over the whole display whenever they are active.

I have christened it the Universal Panel Of Lose.

Now, it could be worse. I known some Tesla owners, and Muskmobile is trying to remove the stems from the steering column in favor of a cleaner look.

And I'm not against manufactures trying to improve the user interface of cars. There is no reason to think that the layout we're used to is ideal. Not even for the central controls, but especially for the lots of little auxiliary things that have accreted over the decades. Even in my life time I've gotten to watch the interface for cruise control appears in a variety of clunky states and improve by little increments until now I see one of two pattern that each works pretty well. Good job, guys. Much appreciated.

But it is painfully clear that the testers of the UPOL (which I known has infested multiple major brands all up and down the price spectrum) never actually took it for a test drive on a route they didn't know. Never needed to use the nav even while they tried to communicate with the people they were suppose to be meeting at a new-to-them destination. Basically never really ate their own dog-food.

2024-05-22

It's a Singleton for life-cycle management, maybe?

Global state has it's well known costs, so avoiding it where reasonable is a thing. Great. But sometime you might feel you need some. Then what? This article is to record some of my thoughts about the Singelton pattern in C++

Avoid singletons —C++ Core Guidelines

Which Singelton is easy

A lot of Singelton implementations are not thread safe, and/or hard to get right. The Meyer Singelton is surprisingly simple and inherits thread-safety from guarantees made by the C++-11 standard.

When to consider a singleton

Programs can need data and they can need behaviors. You might want (or think you want) either with global scope,1 but a Singelton class doesn't offer anything over a simple global variable for plain data. Nor does it offer anything over a simple free-function unless you want some persistent state attached to more than one function together.2 Either of these is simple on their own, it's only when you need the combination, behavior working on global data that Singletons offer any advantage over simple global data or free functions. So when we look at alternatives, we should be thinking about the combination.

Alternatives

Simple global data. Suffers from the static initialization order fiasco. More than a few sources suggest that an suitable alternative is to use a namespaces and just write free code. This isn't Java and things don't have to be objects.

Interestingly the Core Guidelines suggest using the same static local variable idea exploited in the Meyers singleton implementation without the object context which does not enfore the single nature, but does give you a centralized access point for a particular instance of the data.

Object Lifecycle

But here's the thing: in C++, objects are about lifecycle. They have a defined lifetime and we are guaranteed that their constructors are called at the beginning and their destructors at the end. That's how scope bounded resource management works, afterall.


1 Indeed, in the usual C++ model, almost all behavior is nominally available at global scope.

2 We'll talk about peristent state attached to a single function along the way.

If incompentence seems unlikely may I assume malice?

I've mentioned before that I generally keep my project source directories at work in a place where there are accessibly from the host (Windows) operating system, my msys working environment, and my WSL (linux) instances.1 Some IT trouble recently has seem me migrate off my usual work machine onto a loaner and now back to the (freshly re-imaged) original. Which has meant re-building my working environments a couple of times.

Now, with one thing and another my re-imaged machine came back to me without a stand-alone X11 server,2 so I though "Now'd be a good time to see if WSL2 really is better than WSL1; what with the wslg display adapter meaning that WSL2 doesn't need X11." Anyone familiar with this domain will know that WSL2 has been a thing for quite a while now, but the combination of (a) inertia and (b) reports that access to the host-filesystem access was slow on 2 had kept me from moving forward.

Anyway, I tried it today.

The virtual machine actual runs noticably faster which is nice, but "slow" doesn't even begin to describe the productivity bottle-neck that is access to the host filesystem. Holy deity-figure-on-a-pogo-stick, Batman! This performs like something offered as a minimum viable product from a failing startup run by kids who really should have stayed in school. A sloth dying in a tarpit moves faster than a cmake configure over that channel. I was longing for the heady days of running autoconf on a pentium. I had time to muse about the bandwidth of a stationwagon full of tapes as delivered by continental drift.

I'm following a colleague's lead and keeping a separate source tree for the WSL image and using rsync-spawned-by-cron to keep them tracking one-another.3

And now we get around to the question of incompetence versus malice. The famous adage tells us to set our Baysean prior in favor of "Incompetence", but this is a Microsoft product on it's second version number. Incompetence simply isn't what I would expect. The usual excuses just ring hollow: the people who put this together are not beginners, they had access to as much deep technical support as they needed, and they had a v1 product out in the world doing well so there wasn't a live-or-die deadline to meet.

On the other hand, Microsoft has more than a little history of trying to leverage their market share to try to kill inconvenient competitors.


1 For that matter before some "security" upgrades broke all the non-Microsoft virtualization tools on Windows (VirtualBox we hardly knew 'ya), I would use those same files from my virtual machines, too.

2 Probably an oversight on my part as I don't recall saying that I needed one.

3 The thing is that parts of the host filesystem (includind the bit where I keep the source trees) is auto-backed up by IT, a feature that's very handy for the biggest part of your daily work, eh?

2024-05-08

Vectors in C: an all-you-can-eat buffet of so-so choices

Let's say you want to model a physical system. Often you'll want to represent spacial vectors (members of the space $\mathbb{R}^N$) somehow. In fact there are many use case for Cartesian 2-, 3-, or 4-vectors, and for Lorentz 4-vectors as well as less common uses for other dimensionalities, but let's stick with Cartesian 3-vectos for the sake of definiteness because I want to focus on the programming tradeoffs you face if you want to use C (or a C interface1) for this purpose. Nor do we want to worry about manually optimising for the SIMD module of our chips (compilers are smart these days) or worse still laying things out for the benefit of the GPU.2

Aside: I mostly program in C++ where there are some better options, but I get to mess with a lot of legacy code, so the consequences of someone else making this choice for a code-base I work on are still with me. I might get around to a followup post on ways to adapt an existing legacy library for cleaner inter-operation with new C++ code, but that's for another day: first we must understand the root problem.

C offers us an obvious choice with two painful drawbacks (or perhaps it's one underlying drawback that rear's it's ugly head in two contexts) and a clever way to avoid that issue at the cost of having your soul slowly nibbled to death by syntactic ducks. Nice, huh?

Arrays

The obvious choice is the built-in array type: double vec[3];, right?

The underlying problem is that array are only sort-of first-class types. Consider this code:

#include <stdio.h>

void pass_ptr(double* p)
{
    printf("Passed pointer: %lu\n", sizeof(p)/sizeof(double));
}

void pass_ary(double a[3])
{
    printf("Passed 'array': %lu\n", sizeof(a)/sizeof(double));
}

void pass_c99(double a[static 3]) // Syntax added in c99
{
    printf("Staticly sized: %lu\n", sizeof(a)/sizeof(double));
}

int main()
{
    double vec[3];
    
    printf("In local scope: %lu\n", sizeof(vec)/sizeof(double));

    pass_ptr(vec);
    pass_ary(vec);
    
    return 0;
}
Each of the printf statements is executed once on the same variable, but they generate two results: one says 3 and the others say 1.

This is a classic trap for C newbies. Arrays are not, as is sometimes said, "just pointers" because the symbol table knows how big they are when declared at static or automatic scope. But most things that you can do with them drop that knowledge at which point all that is left is a pointer to the start.3

The other manifestation of the limitation is that you can't assign or perform operations on arrays. That is, this is not legal code:

double v1[3] = {1, 2, 3}
      double v2[3];
      v2 = v3;       // Error! Even when the compiler *does* know the sizes!
and as a result you end up writing your library functions with signatures like cross(double *result, const double *v1, const double *v2) which makes you write particualrly clunky code to use the library:
double v1[3] = {1, 2, 3}
      double v2[3] = {4, 5, 6};
      double cp[3];
      crossProduct(cp, v1, v2);
Ugh. And you get to do it over and over again.

Array-in-struct

Oddly it is amazingly easy to solve both these problems: you just wrap the array declaration in a structure declaration (and typedef it for convenience):

typedef struct {
    double a[3];  // a for "array"
} vector;
This takes up exactly as much memory as before, still knows how many elements are involved when you pass it to a function, and can be assigned! Yeah! It's like magic.

Of course, to access an elements you now write vec.a[2] instead of vec[2], but that's a small price to pay. Right? It's not like your soul will die a little bit each time or anything.

Seasoning with unions

Once you're drunk that CoolAid there is no reason not to go a little further: maybe sometimes you'll want to talk about the coordinates of these things, right? So you make that possible, too:4

typedef struct {
    union {
        double a[3];               // a for "array"
        struct {double x, y, z} c; // c for "coordinate"
    }
} vector;

You still need to write a full set of library routines, but now they can have signatures like5 vector cross(const vector v1, const vector v2) and you can call them like double v1[3] = {1, 2, 3} double v2[3] = {4, 5, 6}; double cp[3] = crossProduct(v1, v2); which is much better.


1 Even if you're confident that you won't be writing in C, you may find it necessary to deal with the limits of the language while planning a binary interface.

2 Those are great tools and worthy of your attention if you have a computationaly demanding task, but beyond the scope of this post.

3 Note that as far as the compiler is concered the declaration of pass_ary is identical to that of pass_ptr: the array-like notation is accepted as syntactic sugar only and the compiler pays no attention to the 3. The _c99 variant is a little more subtle but it still doesn't really know the size of the array, it just assumes a minimum for the sake of optimization (and compilers can complain if static analysis show the assumption is violated). Some folks like to use the array form because it makes the declarion express intent to the human reader (though, like comments, it can lie). Others are not so enamored of it, with Linus famously coming out strongly against it.

4 Type punning with unions this way is strictly forbidden in C++ (because lifetime-model and invariant-enforcement, that's why; and don't give me any guff about POD types either, sonny, the divine gave you memcpy and std::bit_cast for a reason!), but C programmers are rugged, self-reliant individualists who carry Colt-45 six pointers (colt45****** shootin_iron;) on their hips and ain't afraid of no Endian no how.

5 Of course, you might pass pointers for the in-parameters to save a little copying at the cost of writing & all over the place. C sure seems have that same issue come up a lot, eh?.

In which the author sits on his porch and shakes his cane at passersby

Is it just me or have UI designers taken the idea of clean and unobtrusive interfaces to the point of not actually offering any affordances at all? And if so, am I justified in suggesting that they might have missed an important point somehwere?

2024-04-14

Neal Stephenson has it wrong

For no reason I can identify I suddenly noticed something today:

They're called "emojis", not "mediaglyphics".

Of course, he pretty much nailed everything except the name.

2024-04-04

Yeah, well, you know, that’s just, like, your workflow, man.

I caught some flack at work this week: I circulated an early draft of a document that I was struggling with in plain text1 and my boss was very clear that he wanted me to use Word in the future so there would be change tracking and out-of-band comments. On the plus side those remarks came packaged up with some useful suggestions for the piece.

Once I tamped down my reflexive defensiveness and the basic anxiety that comes with screwing up at work, I pulled up my big kid underwear and moved on. Then, having decided to be an adult about this, I ran smack dab into a counter example for $BOSS's point. I received a second set of highly useful changes in the same document. Conflicting changes. I'm not aware of any good tooling to handle conflicting changes in Word, but it was no problem for me to handle the conflicts in my text document: I just opened the files in my favorite visual merge tool.2 and got on with it.

Caveat time. To take the "plain text means we can use good tools" thing seriously we'd want to put all our draft work in VC repositories, and when that occurred to me my first reaction was "Who'd want to do that?" I mean, yeah that makes sense for major pieces of writing, but it's not obvious that you want to maintain a full history on every minor document you bang out day in and day out.

But then I had another thought...

Caveat on the caveat. Which was "Hey, how do people who are really committed to Word deal with the possibility of conflicting changes, anyway?" A little poking around the web suggest that my employer's answer is completely mainstream. At the management tier we put everything in SharePoint and let it enforce serialized editing, so they're already putting all their work in a repository. Maybe the whole idea isn't so silly after all.


1 Now, I would never send plain text to the clients, but I often do my initial composition in text because the sense of informality helps me feel safe trying out different formulations in search of a natural arc through complex subjects.

2 Meld as it happens. But not because I've tried all the options: it was just the first one I spent any time with and it's been consistently available.

2024-03-21

Algorithms header knowledge check

Note some late additions marked with *.

* After posting this I began to feel it needed a little bit more detail. And then that it needed quite a bit more detail.


The C++ standard library's algorithm header has a routine sutiable for counting the number of places of disagreement between two equal sized collections of elements. What's it called? Hint: it's not called "count_differences". *Nor is it called count_if (unless you happen to be using C++-23 because count_if doesn't have an overload for parallel containers.0) My work projects are in C++-17 and my home projects in'17 or '20.

Answer
inner_product

*One approach is to write a "sum" function that adds just as in the usual inner product, but the make the one that would do element multiplication in the standard inner produt return zero when the two values are equal and one if they differ.

For that matter, what educational backgrounds would prepare you to recognize that as the routine you want?1 How does this compare to Kate Gregory's story about partial_sort_copy and how it would be better called top_n?


*0 C++23 doesn't introduce a parallel overload either, but it does introduce zip_view and zip which will allow you to efficiently produce on single container of apparent pairs from the parallel containers. Then you can use the single-container version of count_if. Obvious. Right?

1 My combination of physics and prior experience with the algorithm header's love affair with having a user-supplied-predicate-to-change-the-behavior overload meant that I spotted it as soon as I read the name, but ... that's a rather esoteric requirement for user's to know what they're seeing.

2024-03-05

Voice-assistant fails

Accumulated over the years, but I got another one the other day that triggered me.

Me:
[Navigating to a business in the US sowthwqest]
Creppy voice assistant (CVA):
In five-hundred feet, turn left on El Camino Real Street1.
Me:
[::Sighs::]


CVA:
[Interrupts a conversation in the car]
Me:
Hold your horse, [CVA].
CVA:
[Starts reading the Wikipeia article on the idiom]


Me:
[CVA], play 2112.
CVA:
Now playing two-thousand-one-hundred-twelve.
Me:
[::Fumes:: until the music sweeps me away]

It's all about context seneitivity. Or the lack thereof.


1 With "Real" pronounced as a single sylable. Of course.

2024-01-28

Career opportunity

Desperately seaking a licensned professional to tell us that we're making good parenting choices.

2024-01-25

Bringing C Structs into the C++ Lifetime Model

In addition to legacy code in our own projects, I sometimes "get" to work against libraries (legacy or modern written in plain C. Which is OK. I learned C a long time ago and I'm not intimidated by it, though it can take a while to get back into the right mindset. Of course, there are things I miss. Static polymorphism and namespaces, for instance, are pretty small conceptual changes with significant convenience factor for the programmer.1

Now, C++ has a reputation as being a dangerous language where it is easy write really broken code. That impression is not wrong, but it is incomplete: the lagnuage also offers features that support writing code that has enforced safety in some aspects. It's not trivial and it takes both discipline and some understanding of how the features work, but in my opinion it takes less discipline to write memory-safe C++ code than memory-safe C code.2

This article covers one way to bring a C struct into the C++ lifetime model to leverage the better (or at least more automatic) memory safety of C++ library primitives.

We start with a highly artificial example struct designed to be a pain memory wise:3

 struct thing {
    int i;
    double d;
    char *s
    int *ary;
};

Each of the pointers pose us some (interrelated) questions:

  • Where do the objects that will be pointed to live? Heap? Stack? Data segment? Global memory? Memory mapped file? Something really exotic?
  • How do we ensure that the pointer is not used after the objects go away (if they go away)?
  • If they exist on the free-store, how do control deallocation?
The questions aren't unique to C, they are the same ones that must always be dealt with. But C code deals with them all every time, while other languages may have built-in answers to some of them.4

Nor can you necessarily answer the questions by static examination of the code, but in the case I faced at work, both pointers were consistently pointing at dynamically allocated objects. Moreover the number we needed could not be determined at compile time, so we were storing the thing *'s in a vector.

We had an existing C function thing *newThing(size_t array_size, const char *label) which would create a new struct thing on the heap (with a alloc family function), set default values of i and d, set the string and allocate (but not populate) the array and return the pointer to the thing. This is analogous to a C++ constructor, but for some reason (history, no doubt) we were handling the three calls to free manually each time we needed to reap one of these things.

Then we did something roughly like this:


{
    std::vector<const thing*> thing_list;
    for (const auto &input : inputs)
        list.push_back(newThing(input.name, input.size()));
    process_list(thing_list);
}

Which, of course, loses three heap allocated objects for every item in the inputs container.

Replicating a proper, but C-like, approach to memory management here would mean writing a destructor-analog (perhaps void reapThing(thing *p)) as a free function and inserting std::for_each(thing_list.begin(), thing_list.end(), reapThing); before the closing brace. That works and I wouldn't be displeased to see it in a legacy project like the one I'm working on, but I think we can do a little better.

The "doing better" interface is actually quite simple:5


#include "thing.h"

struct thing_wrapper : public thing
{
    thing_wrapper();
    thing_wrapper(size_t array_size, std:string_view label);
    thing_wrapper(const thing_wrapper &);
    ~thing_wrapper();
    
    thing_wrapper &operator=(const string_wrapper &);
}

The wrapper has the same data, but manages the sub-allocations for you. What complexity there is lies in ensuring that the constructors, assignment operators and destructor all agree on memory management of the sub-allocations.6 You might also want to add a constructor and assignment operator taking a const thing &, but this is a leap of faith insofar as nothing will enforce a consistent allocation strategy on those inputs. Similarly you can consider supporting move operations if you have a particular use for them.

With the wrapper in place we can change the original code to something like:


{
    std::vector<std::unique_ptr<const thing>> thing_list;
    for (const auto &input : inputs)
        list.emplace_back(std::make_unique<thing_wrapper>(input.name, input.size()));
    process_list(thing_list);
}

With no need, now, for explicit clean-up code.


1 Oddly, neither of these is trivial to add because they imperil the universal linkability of C (which depends on not needing a vendor-dependent name-mangling scheme).

2 At the foundational level, it is the object lifetime model that supports this, and at the practical level it is exploited in the standard library which offers a more powerful set of primitives than the C standard library. Step one for writing a robust C program at scale is to get a more robust library (which you might be able to get off the shelf or might want to write yourself).

3 It is, however, analogous to the problem I faced at work today.

4 Many "managed" languages have everything on the free-store, and use a garbage collector to resolve the lifetime question.

5 I've chosen to make this a struct rather than a class for two reasons. First because the whole interface we want to derive is public: we're not going to extend thing in any way beyond supporting the C++ lifetime model. Second because of Core Guideline C2: the C code enforces no invariant so we don't add one.

6 The safe thing to do, is use the facilities used by the code that provides the underlying structure, which in the case of pure c libraries usually means *alloc/free or some wrapper around the same. You may be able to defer to any pre-existing C functions that perform the set-up and tear-down.