2024-05-22

It's a Singleton for life-cycle management, maybe?

Global state has it's well known costs, so avoiding it where reasonable is a thing. Great. But sometime you might feel you need some. Then what? This article is to record some of my thoughts about the Singelton pattern in C++

Avoid singletons —C++ Core Guidelines

Which Singelton is easy

A lot of Singelton implementations are not thread safe, and/or hard to get right. The Meyer Singelton is surprisingly simple and inherits thread-safety from guarantees made by the C++-11 standard.

When to consider a singleton

Programs can need data and they can need behaviors. You might want (or think you want) either with global scope,1 but a Singelton class doesn't offer anything over a simple global variable for plain data. Nor does it offer anything over a simple free-function unless you want some persistent state attached to more than one function together.2 Either of these is simple on their own, it's only when you need the combination, behavior working on global data that Singletons offer any advantage over simple global data or free functions. So when we look at alternatives, we should be thinking about the combination.

Alternatives

Simple global data. Suffers from the static initialization order fiasco. More than a few sources suggest that an suitable alternative is to use a namespaces and just write free code. This isn't Java and things don't have to be objects.

Interestingly the Core Guidelines suggest using the same static local variable idea exploited in the Meyers singleton implementation without the object context which does not enfore the single nature, but does give you a centralized access point for a particular instance of the data.

Object Lifecycle

But here's the thing: in C++, objects are about lifecycle. They have a defined lifetime and we are guaranteed that their constructors are called at the beginning and their destructors at the end. That's how scope bounded resource management works, afterall.


1 Indeed, in the usual C++ model, almost all behavior is nominally available at global scope.

2 We'll talk about peristent state attached to a single function along the way.

If incompentence seems unlikely may I assume malice?

I've mentioned before that I generally keep my project source directories at work in a place where there are accessibly from the host (Windows) operating system, my msys working environment, and my WSL (linux) instances.1 Some IT trouble recently has seem me migrate off my usual work machine onto a loaner and now back to the (freshly re-imaged) original. Which has meant re-building my working environments a couple of times.

Now, with one thing and another my re-imaged machine came back to me without a stand-alone X11 server,2 so I though "Now'd be a good time to see if WSL2 really is better than WSL1; what with the wslg display adapter meaning that WSL2 doesn't need X11." Anyone familiar with this domain will know that WSL2 has been a thing for quite a while now, but the combination of (a) inertia and (b) reports that access to the host-filesystem access was slow on 2 had kept me from moving forward.

Anyway, I tried it today.

The virtual machine actual runs noticably faster which is nice, but "slow" doesn't even begin to describe the productivity bottle-neck that is access to the host filesystem. Holy deity-figure-on-a-pogo-stick, Batman! This performs like something offered as a minimum viable product from a failing startup run by kids who really should have stayed in school. A sloth dying in a tarpit moves faster than a cmake configure over that channel. I was longing for the heady days of running autoconf on a pentium. I had time to muse about the bandwidth of a stationwagon full of tapes as delivered by continental drift.

I'm following a colleague's lead and keeping a separate source tree for the WSL image and using rsync-spawned-by-cron to keep them tracking one-another.3

And now we get around to the question of incompetence versus malice. The famous adage tells us to set our Baysean prior in favor of "Incompetence", but this is a Microsoft product on it's second version number. Incompetence simply isn't what I would expect. The usual excuses just ring hollow: the people who put this together are not beginners, they had access to as much deep technical support as they needed, and they had a v1 product out in the world doing well so there wasn't a live-or-die deadline to meet.

On the other hand, Microsoft has more than a little history of trying to leverage their market share to try to kill inconvenient competitors.


1 For that matter before some "security" upgrades broke all the non-Microsoft virtualization tools on Windows (VirtualBox we hardly knew 'ya), I would use those same files from my virtual machines, too.

2 Probably an oversight on my part as I don't recall saying that I needed one.

3 The thing is that parts of the host filesystem (includind the bit where I keep the source trees) is auto-backed up by IT, a feature that's very handy for the biggest part of your daily work, eh?

2024-05-08

Vectors in C: an all-you-can-eat buffet of so-so choices

Let's say you want to model a physical system. Often you'll want to represent spacial vectors (members of the space $\mathbb{R}^N$) somehow. In fact there are many use case for Cartesian 2-, 3-, or 4-vectors, and for Lorentz 4-vectors as well as less common uses for other dimensionalities, but let's stick with Cartesian 3-vectos for the sake of definiteness because I want to focus on the programming tradeoffs you face if you want to use C (or a C interface1) for this purpose. Nor do we want to worry about manually optimising for the SIMD module of our chips (compilers are smart these days) or worse still laying things out for the benefit of the GPU.2

Aside: I mostly program in C++ where there are some better options, but I get to mess with a lot of legacy code, so the consequences of someone else making this choice for a code-base I work on are still with me. I might get around to a followup post on ways to adapt an existing legacy library for cleaner inter-operation with new C++ code, but that's for another day: first we must understand the root problem.

C offers us an obvious choice with two painful drawbacks (or perhaps it's one underlying drawback that rear's it's ugly head in two contexts) and a clever way to avoid that issue at the cost of having your soul slowly nibbled to death by syntactic ducks. Nice, huh?

Arrays

The obvious choice is the built-in array type: double vec[3];, right?

The underlying problem is that array are only sort-of first-class types. Consider this code:

#include <stdio.h>

void pass_ptr(double* p)
{
    printf("Passed pointer: %lu\n", sizeof(p)/sizeof(double));
}

void pass_ary(double a[3])
{
    printf("Passed 'array': %lu\n", sizeof(a)/sizeof(double));
}

void pass_c99(double a[static 3]) // Syntax added in c99
{
    printf("Staticly sized: %lu\n", sizeof(a)/sizeof(double));
}

int main()
{
    double vec[3];
    
    printf("In local scope: %lu\n", sizeof(vec)/sizeof(double));

    pass_ptr(vec);
    pass_ary(vec);
    
    return 0;
}
Each of the printf statements is executed once on the same variable, but they generate two results: one says 3 and the others say 1.

This is a classic trap for C newbies. Arrays are not, as is sometimes said, "just pointers" because the symbol table knows how big they are when declared at static or automatic scope. But most things that you can do with them drop that knowledge at which point all that is left is a pointer to the start.3

The other manifestation of the limitation is that you can't assign or perform operations on arrays. That is, this is not legal code:

double v1[3] = {1, 2, 3}
      double v2[3];
      v2 = v3;       // Error! Even when the compiler *does* know the sizes!
and as a result you end up writing your library functions with signatures like cross(double *result, const double *v1, const double *v2) which makes you write particualrly clunky code to use the library:
double v1[3] = {1, 2, 3}
      double v2[3] = {4, 5, 6};
      double cp[3];
      crossProduct(cp, v1, v2);
Ugh. And you get to do it over and over again.

Array-in-struct

Oddly it is amazingly easy to solve both these problems: you just wrap the array declaration in a structure declaration (and typedef it for convenience):

typedef struct {
    double a[3];  // a for "array"
} vector;
This takes up exactly as much memory as before, still knows how many elements are involved when you pass it to a function, and can be assigned! Yeah! It's like magic.

Of course, to access an elements you now write vec.a[2] instead of vec[2], but that's a small price to pay. Right? It's not like your soul will die a little bit each time or anything.

Seasoning with unions

Once you're drunk that CoolAid there is no reason not to go a little further: maybe sometimes you'll want to talk about the coordinates of these things, right? So you make that possible, too:4

typedef struct {
    union {
        double a[3];               // a for "array"
        struct {double x, y, z} c; // c for "coordinate"
    }
} vector;

You still need to write a full set of library routines, but now they can have signatures like5 vector cross(const vector v1, const vector v2) and you can call them like double v1[3] = {1, 2, 3} double v2[3] = {4, 5, 6}; double cp[3] = crossProduct(v1, v2); which is much better.


1 Even if you're confident that you won't be writing in C, you may find it necessary to deal with the limits of the language while planning a binary interface.

2 Those are great tools and worthy of your attention if you have a computationaly demanding task, but beyond the scope of this post.

3 Note that as far as the compiler is concered the declaration of pass_ary is identical to that of pass_ptr: the array-like notation is accepted as syntactic sugar only and the compiler pays no attention to the 3. The _c99 variant is a little more subtle but it still doesn't really know the size of the array, it just assumes a minimum for the sake of optimization (and compilers can complain if static analysis show the assumption is violated). Some folks like to use the array form because it makes the declarion express intent to the human reader (though, like comments, it can lie). Others are not so enamored of it, with Linus famously coming out strongly against it.

4 Type punning with unions this way is strictly forbidden in C++ (because lifetime-model and invariant-enforcement, that's why; and don't give me any guff about POD types either, sonny, the divine gave you memcpy and std::bit_cast for a reason!), but C programmers are rugged, self-reliant individualists who carry Colt-45 six pointers (colt45****** shootin_iron;) on their hips and ain't afraid of no Endian no how.

5 Of course, you might pass pointers for the in-parameters to save a little copying at the cost of writing & all over the place. C sure seems have that same issue come up a lot, eh?.

In which the author sits on his porch and shakes his cane at passersby

Is it just me or have UI designers taken the idea of clean and unobtrusive interfaces to the point of not actually offering any affordances at all? And if so, am I justified in suggesting that they might have missed an important point somehwere?