2024-01-28

Career opportunity

Desperately seaking a licensned professional to tell us that we're making good parenting choices.

2024-01-25

Bringing C Structs into the C++ Lifetime Model

In addition to legacy code in our own projects, I sometimes "get" to work against libraries (legacy or modern written in plain C. Which is OK. I learned C a long time ago and I'm not intimidated by it, though it can take a while to get back into the right mindset. Of course, there are things I miss. Static polymorphism and namespaces, for instance, are pretty small conceptual changes with significant convenience factor for the programmer.1

Now, C++ has a reputation as being a dangerous language where it is easy write really broken code. That impression is not wrong, but it is incomplete: the lagnuage also offers features that support writing code that has enforced safety in some aspects. It's not trivial and it takes both discipline and some understanding of how the features work, but in my opinion it takes less discipline to write memory-safe C++ code than memory-safe C code.2

This article covers one way to bring a C struct into the C++ lifetime model to leverage the better (or at least more automatic) memory safety of C++ library primitives.

We start with a highly artificial example struct designed to be a pain memory wise:3

 struct thing {
    int i;
    double d;
    char *s
    int *ary;
};

Each of the pointers pose us some (interrelated) questions:

  • Where do the objects that will be pointed to live? Heap? Stack? Data segment? Global memory? Memory mapped file? Something really exotic?
  • How do we ensure that the pointer is not used after the objects go away (if they go away)?
  • If they exist on the free-store, how do control deallocation?
The questions aren't unique to C, they are the same ones that must always be dealt with. But C code deals with them all every time, while other languages may have built-in answers to some of them.4

Nor can you necessarily answer the questions by static examination of the code, but in the case I faced at work, both pointers were consistently pointing at dynamically allocated objects. Moreover the number we needed could not be determined at compile time, so we were storing the thing *'s in a vector.

We had an existing C function thing *newThing(size_t array_size, const char *label) which would create a new struct thing on the heap (with a alloc family function), set default values of i and d, set the string and allocate (but not populate) the array and return the pointer to the thing. This is analogous to a C++ constructor, but for some reason (history, no doubt) we were handling the three calls to free manually each time we needed to reap one of these things.

Then we did something roughly like this:


{
    std::vector<const thing*> thing_list;
    for (const auto &input : inputs)
        list.push_back(newThing(input.name, input.size()));
    process_list(thing_list);
}

Which, of course, loses three heap allocated objects for every item in the inputs container.

Replicating a proper, but C-like, approach to memory management here would mean writing a destructor-analog (perhaps void reapThing(thing *p)) as a free function and inserting std::for_each(thing_list.begin(), thing_list.end(), reapThing); before the closing brace. That works and I wouldn't be displeased to see it in a legacy project like the one I'm working on, but I think we can do a little better.

The "doing better" interface is actually quite simple:5


#include "thing.h"

struct thing_wrapper : public thing
{
    thing_wrapper();
    thing_wrapper(size_t array_size, std:string_view label);
    thing_wrapper(const thing_wrapper &);
    ~thing_wrapper();
    
    thing_wrapper &operator=(const string_wrapper &);
}

The wrapper has the same data, but manages the sub-allocations for you. What complexity there is lies in ensuring that the constructors, assignment operators and destructor all agree on memory management of the sub-allocations.6 You might also want to add a constructor and assignment operator taking a const thing &, but this is a leap of faith insofar as nothing will enforce a consistent allocation strategy on those inputs. Similarly you can consider supporting move operations if you have a particular use for them.

With the wrapper in place we can change the original code to something like:


{
    std::vector<std::unique_ptr<const thing>> thing_list;
    for (const auto &input : inputs)
        list.emplace_back(std::make_unique<thing_wrapper>(input.name, input.size()));
    process_list(thing_list);
}

With no need, now, for explicit clean-up code.


1 Oddly, neither of these is trivial to add because they imperil the universal linkability of C (which depends on not needing a vendor-dependent name-mangling scheme).

2 At the foundational level, it is the object lifetime model that supports this, and at the practical level it is exploited in the standard library which offers a more powerful set of primitives than the C standard library. Step one for writing a robust C program at scale is to get a more robust library (which you might be able to get off the shelf or might want to write yourself).

3 It is, however, analogous to the problem I faced at work today.

4 Many "managed" languages have everything on the free-store, and use a garbage collector to resolve the lifetime question.

5 I've chosen to make this a struct rather than a class for two reasons. First because the whole interface we want to derive is public: we're not going to extend thing in any way beyond supporting the C++ lifetime model. Second because of Core Guideline C2: the C code enforces no invariant so we don't add one.

6 The safe thing to do, is use the facilities used by the code that provides the underlying structure, which in the case of pure c libraries usually means *alloc/free or some wrapper around the same. You may be able to defer to any pre-existing C functions that perform the set-up and tear-down.