No Swamp Coolers: Little surprises #10: iterators, half-open intervals, and git range selections

Let's talk about for loops. That's not actually the subject of the post, but we have to start there, hit a couple of stops along the way, and only then complain about git.¹

Back in the misty depths of time, there were programming languages that indexed their arrays starting with one. If you write that in a c-like syntax you end up with something like

int array[ARRAY_SIZE];
// fill it somehow
for (size_t i=1; i<= ARRAY_SIZE; i++)
	print(array[i]);

where the end condition is controlled by <=. That has never really gone away—despite the efforts of some influential people—but I want to direct your attention briefly to the same for-loop in a language that uses zero-indexed arrays. Now idomatic code uses a strict less-than comparison:

int array[ARRAY_SIZE];
// fill it somehow
for (size_t i=0; i<ARRAY_SIZE; i++)
	print(array[i]);

but you could also use a not-euqal-to (!=) because the upper limit of the loop no longer represents a cell of the array that is accessed. Instead it represents where a cell would be if it were one-past-the-current-end.

Pointer and Iterator loops

Of course, you don't have to use an index for a loop counter. K&R is full of examples where they use a pointer for processing a (null terminated) string:

const char *s = "Yellow whirled!";
for (char *p=s; *p != '\0'; p++)
	print(*p);

or a linked list:

LIST_NODE *list;
// fill it somehow
for (LIST_NODE *p=list; p != NULL; p=p->next)
	print(p->payload);

In those cases the termination condition is a not-equal comparison to something that is (again) not processed.

And that is the pattern than many languages (especially the ones I use regularly) use for iterating a (possible subset of a) collection. The beginning iterator designates something that is to be processed and the ending iterator either signals the lack of further data or designates data to not process. In C++ this is the pattern for the algorithms library (where you hand the routines explicit iterator pairs), and by ranged-for loops where the syntactic sugar calls std::begin and std::end on the object to be iterated. The end-marker does not represent data to be processed.

On to git

Some git operation let you interact with groups of prior commits. To inspect a sub-set of the history, run a partially automated bisection process for finding where an issue was introduced, or (as I was doing when I discovered² this behavior) to select a set of commits to "cherry-pick" to another branch. Unsurprisingly the command line syntax needs you to say what commits you want, and listing more than a few hashes explicitly is a pain. Luckily there is a "range" notation: [Early commit reference]..[later commit reference].

Now, if you use this notation with cherry-pick it will apply your selected commits to the new branch in time order: starting with the earliest selected commit and working it's way steadily toward the latest. Which is exactly what you would expect.

What you probably wouldn't expect (unless you, ya'know, actually read the documentation), is that it will exclude the earliest commit you entered and included the latest. This was so surprising to me that I actually did it wrong a second time before I started scouring the manual for an explanation.

That is a half-open range like we find in the discussion of iterator for-loop above. But instead of being [start, end) it is (start, end]. What. The. Absolute. Heck?!?

That said...

Interestingly, if you look a little deeper, there is a reason. Or at least a way in which this behavior arguably conforms to my description above.

Start by understanding how a git repository is structured. Each commit contains a record (effectively a pointer-to) all the commits that act as its parent(s) (there might be zero, one, two...). But commits are immutable once created, so they can't be amended with pointers to their descendants. The effect is that (assuming your range selection represents a branchless part of the graph³) you're looking at a linked list with the last commit as the head and the earliest commit at the end (of the range you care about, anyway).

So, even though commits are applied in time-order, they are found starting from the last event and working back in time. The docs make this clear: the selection is all the nodes reachable from the final commit, but not reachable from the first commit.⁴

And in that view, this is exactly like the linked list example I wrote above.

¹ A little like eating your vegetables before dessert.

² The mere fact that it is clearly documented if you bother to read doesn't affect my right to use that word. Does it?

³ I haven't actually tried it yet, but I imagine that selecting a span with this syntax that has a merge in it would include a large historical sub-tree you don't want.

⁴ And a commit is reachable from itself, naturally.

No Swamp Coolers

Pages

2025-12-19

Little surprises #10: iterators, half-open intervals, and git range selections

Pointer and Iterator loops

On to git

That said...

No comments:

Post a Comment