2024-05-08

Vectors in C: an all-you-can-eat buffet of so-so choices

Let's say you want to model a physical system. Often you'll want to represent spacial vectors (members of the space $\mathbb{R}^N$) somehow. In fact there are many use case for Cartesian 2-, 3-, or 4-vectors, and for Lorentz 4-vectors as well as less common uses for other dimensionalities, but let's stick with Cartesian 3-vectos for the sake of definiteness because I want to focus on the programming tradeoffs you face if you want to use C (or a C interface1) for this purpose. Nor do we want to worry about manually optimising for the SIMD module of our chips (compilers are smart these days) or worse still laying things out for the benefit of the GPU.2

Aside: I mostly program in C++ where there are some better options, but I get to mess with a lot of legacy code, so the consequences of someone else making this choice for a code-base I work on are still with me. I might get around to a followup post on ways to adapt an existing legacy library for cleaner inter-operation with new C++ code, but that's for another day: first we must understand the root problem.

C offers us an obvious choice with two painful drawbacks (or perhaps it's one underlying drawback that rear's it's ugly head in two contexts) and a clever way to avoid that issue at the cost of having your soul slowly nibbled to death by syntactic ducks. Nice, huh?

Arrays

The obvious choice is the built-in array type: double vec[3];, right?

The underlying problem is that array are only sort-of first-class types. Consider this code:

#include <stdio.h>

void pass_ptr(double* p)
{
    printf("Passed pointer: %lu\n", sizeof(p)/sizeof(double));
}

void pass_ary(double a[3])
{
    printf("Passed 'array': %lu\n", sizeof(a)/sizeof(double));
}

void pass_c99(double a[static 3]) // Syntax added in c99
{
    printf("Staticly sized: %lu\n", sizeof(a)/sizeof(double));
}

int main()
{
    double vec[3];
    
    printf("In local scope: %lu\n", sizeof(vec)/sizeof(double));

    pass_ptr(vec);
    pass_ary(vec);
    
    return 0;
}
Each of the printf statements is executed once on the same variable, but they generate two results: one says 3 and the others say 1.

This is a classic trap for C newbies. Arrays are not, as is sometimes said, "just pointers" because the symbol table knows how big they are when declared at static or automatic scope. But most things that you can do with them drop that knowledge at which point all that is left is a pointer to the start.3

The other manifestation of the limitation is that you can't assign or perform operations on arrays. That is, this is not legal code:

double v1[3] = {1, 2, 3}
      double v2[3];
      v2 = v3;       // Error! Even when the compiler *does* know the sizes!
and as a result you end up writing your library functions with signatures like cross(double *result, const double *v1, const double *v2) which makes you write particualrly clunky code to use the library:
double v1[3] = {1, 2, 3}
      double v2[3] = {4, 5, 6};
      double cp[3];
      crossProduct(cp, v1, v2);
Ugh. And you get to do it over and over again.

Array-in-struct

Oddly it is amazingly easy to solve both these problems: you just wrap the array declaration in a structure declaration (and typedef it for convenience):

typedef struct {
    double a[3];  // a for "array"
} vector;
This takes up exactly as much memory as before, still knows how many elements are involved when you pass it to a function, and can be assigned! Yeah! It's like magic.

Of course, to access an elements you now write vec.a[2] instead of vec[2], but that's a small price to pay. Right? It's not like your soul will die a little bit each time or anything.

Seasoning with unions

Once you're drunk that CoolAid there is no reason not to go a little further: maybe sometimes you'll want to talk about the coordinates of these things, right? So you make that possible, too:4

typedef struct {
    union {
        double a[3];               // a for "array"
        struct {double x, y, z} c; // c for "coordinate"
    }
} vector;

You still need to write a full set of library routines, but now they can have signatures like5 vector cross(const vector v1, const vector v2) and you can call them like double v1[3] = {1, 2, 3} double v2[3] = {4, 5, 6}; double cp[3] = crossProduct(v1, v2); which is much better.


1 Even if you're confident that you won't be writing in C, you may find it necessary to deal with the limits of the language while planning a binary interface.

2 Those are great tools and worthy of your attention if you have a computationaly demanding task, but beyond the scope of this post.

3 Note that as far as the compiler is concered the declaration of pass_ary is identical to that of pass_ptr: the array-like notation is accepted as syntactic sugar only and the compiler pays no attention to the 3. The _c99 variant is a little more subtle but it still doesn't really know the size of the array, it just assumes a minimum for the sake of optimization (and compilers can complain if static analysis show the assumption is violated). Some folks like to use the array form because it makes the declarion express intent to the human reader (though, like comments, it can lie). Others are not so enamored of it, with Linus famously coming out strongly against it.

4 Type punning with unions this way is strictly forbidden in C++ (because lifetime-model and invariant-enforcement, that's why; and don't give me any guff about POD types either, sonny, the divine gave you memcpy and std::bit_cast for a reason!), but C programmers are rugged, self-reliant individualists who carry Colt-45 six pointers (colt45****** shootin_iron;) on their hips and ain't afraid of no Endian no how.

5 Of course, you might pass pointers for the in-parameters to save a little copying at the cost of writing & all over the place. C sure seems have that same issue come up a lot, eh?.

In which the author sits on his porch and shakes his cane at passersby

Is it just me or have UI designers taken the idea of clean and unobtrusive interfaces to the point of not actually offering any affordances at all? And if so, am I justified in suggesting that they might have missed an important point somehwere?

2024-04-14

Neal Stephenson has it wrong

For no reason I can identify I suddenly noticed something today:

They're called "emojis", not "mediaglyphics".

Of course, he pretty much nailed everything except the name.

2024-04-04

Yeah, well, you know, that’s just, like, your workflow, man.

I caught some flack at work this week: I circulated an early draft of a document that I was struggling with in plain text1 and my boss was very clear that he wanted me to use Word in the future so there would be change tracking and out-of-band comments. On the plus side those remarks came packaged up with some useful suggestions for the piece.

Once I tamped down my reflexive defensiveness and the basic anxiety that comes with screwing up at work, I pulled up my big kid underwear and moved on. Then, having decided to be an adult about this, I ran smack dab into a counter example for $BOSS's point. I received a second set of highly useful changes in the same document. Conflicting changes. I'm not aware of any good tooling to handle conflicting changes in Word, but it was no problem for me to handle the conflicts in my text document: I just opened the files in my favorite visual merge tool.2 and got on with it.

Caveat time. To take the "plain text means we can use good tools" thing seriously we'd want to put all our draft work in VC repositories, and when that occurred to me my first reaction was "Who'd want to do that?" I mean, yeah that makes sense for major pieces of writing, but it's not obvious that you want to maintain a full history on every minor document you bang out day in and day out.

But then I had another thought...

Caveat on the caveat. Which was "Hey, how do people who are really committed to Word deal with the possibility of conflicting changes, anyway?" A little poking around the web suggest that my employer's answer is completely mainstream. At the management tier we put everything in SharePoint and let it enforce serialized editing, so they're already putting all their work in a repository. Maybe the whole idea isn't so silly after all.


1 Now, I would never send plain text to the clients, but I often do my initial composition in text because the sense of informality helps me feel safe trying out different formulations in search of a natural arc through complex subjects.

2 Meld as it happens. But not because I've tried all the options: it was just the first one I spent any time with and it's been consistently available.

2024-03-21

Algorithms header knowledge check

Note some late additions marked with *.

* After posting this I began to feel it needed a little bit more detail. And then that it needed quite a bit more detail.


The C++ standard library's algorithm header has a routine sutiable for counting the number of places of disagreement between two equal sized collections of elements. What's it called? Hint: it's not called "count_differences". *Nor is it called count_if (unless you happen to be using C++-23 because count_if doesn't have an overload for parallel containers.0) My work projects are in C++-17 and my home projects in'17 or '20.

Answer
inner_product

*One approach is to write a "sum" function that adds just as in the usual inner product, but the make the one that would do element multiplication in the standard inner produt return zero when the two values are equal and one if they differ.

For that matter, what educational backgrounds would prepare you to recognize that as the routine you want?1 How does this compare to Kate Gregory's story about partial_sort_copy and how it would be better called top_n?


*0 C++23 doesn't introduce a parallel overload either, but it does introduce zip_view and zip which will allow you to efficiently produce on single container of apparent pairs from the parallel containers. Then you can use the single-container version of count_if. Obvious. Right?

1 My combination of physics and prior experience with the algorithm header's love affair with having a user-supplied-predicate-to-change-the-behavior overload meant that I spotted it as soon as I read the name, but ... that's a rather esoteric requirement for user's to know what they're seeing.

2024-03-05

Voice-assistant fails

Accumulated over the years, but I got another one the other day that triggered me.

Me:
[Navigating to a business in the US sowthwqest]
Creppy voice assistant (CVA):
In five-hundred feet, turn left on El Camino Real Street1.
Me:
[::Sighs::]


CVA:
[Interrupts a conversation in the car]
Me:
Hold your horse, [CVA].
CVA:
[Starts reading the Wikipeia article on the idiom]


Me:
[CVA], play 2112.
CVA:
Now playing two-thousand-one-hundred-twelve.
Me:
[::Fumes:: until the music sweeps me away]

It's all about context seneitivity. Or the lack thereof.


1 With "Real" pronounced as a single sylable. Of course.

2024-01-28

Career opportunity

Desperately seaking a licensned professional to tell us that we're making good parenting choices.