Let's talk about for loops. That's not actually the subject of the post, but we have to start there, hit a couple of stops along the way, and only then complain about git.1
Back in the misty depths of time, there were programming languages that indexed their arrays starting with one. If you write that in a c-like syntax you end up with something like
int array[ARRAY_SIZE];
// fill it somehow
for (size_t i=1; i<= ARRAY_SIZE; i++)
print(array[i]);
where the end condition is controlled by <=. That
has never really gone
away—despite the
efforts of some influential people—but I want to direct your
attention briefly to the same for in language that use zero-indexed
arrays. Now the above code uses a strict less-than comparison:
int array[ARRAY_SIZE];
// fill it somehow
for (size_t i=0; i<ARRAY_SIZE; i++)
print(array[i]);
and the upper limit of the loop no longer represents a cell of the array that is accessed. Instead it represents where a cell would be if it were one-past-the-current-end.
Pointer and Iterator loops
Of course, you don't have to use an index for a loop counter. K&R is full of examples where they use a pointer for processing a (null terminated) string:
const char *s = "Yellow whirled!";
for (char *p=s; *p != '\0'; p++)
print(*p);
or a linked list: LIST_NODE *list;
// fill it somehow
for (LIST_NODE *p=list; p != NULL; p=p->next)
print(p->payload);
In those cases the termination condition is a not-equal comparison
to something that is (again) not processed.
And that is the pattern than many languages, and especially the ones I use
regularly use for iterating a (possible subset of a) collection. The beginning
iterator designates something that is to be processed and the ending iterator
either signals the lack of further data or designates data to not process. In
C++ this is the pattern for the algorithms library (where you hand the routines
explicit iterator pairs), and by ranged-for loops where the syntactic sugar
calls std::begin and std::end on the object to be
iterated. The end-marker does not represent data to be processed.
On to git
Some git operation let you interact with groups of prior commits. To inspect
a sub-set of the history, run a partially automated bisection process for
finding where an issue was introduced, or (as I was doing when I
discovered2 this behavior) to select a set of commits to
"cherry-pick" to another branch. Unsurprisingly the command line syntax needs
you to say what commits you want, and listing more than a few hashes
explicitly is a pain. Luckily there is a "range" notation: [Early commit
reference]..[later commit reference].
Now, if you use this notation with cherry-pick it will apply
your selected commits to the new branch in time order: starting with the
earliest selected commit and working it's way steadily toward the latest. Which
is exactly what you would expect.
What you probably wouldn't expect (unless you, ya'know, actually read the documentation), is that it will exclude the earliest commit you entered and included the latest. This was so surprising to me that I actually did it wrong a second time before I started scouring the manual for an explanation.
That is a half-open range like we find in the discussion of iterator
for-loop above. But instead of being [start, end) it
is (start, end]. What. The. Absolute. Heck?!?
That said...
Interestingly, if you look a little deeper, there is a reason. Or at least a way in which this behavior arguably conforms to my description above.
Start by understanding how a git repository is structured. Each commit contains a record (effectively a pointer-to) all the commits that act as its parent(s) (there might be zero, one, two...). But commits are immutable once created, so they can't be amended with pointers to their descendants. The effect is that (assuming your range selection represents an branchless part of the graph3) you're looking at a linked list with the last commit as the head and the earliest commit at the end (of the range you care about, anyway).
So, even though commits are applied in time-order, they are found starting from the last event and working back in time. The docs make this clear: the selection is all the nodes reachable from the final commit, but not reachable from the first commit.4
And in that view, this is exactly like the linked list example I wrote above.
1 A little like eating your vegetables before dessert.
2 The mere fact that it is clearly documented if you bother to read doesn't affect my right to use that word. Does it?
3 I haven't actually tried it yet, but I imagine that selecting a span with this syntax that has a merge in it would include a large historical sub-tree you don't want.
4 And a commit is reachable from itself, naturally.
No comments:
Post a Comment