2022-11-25

But ... I knew that. I just didn't understand the importance of what I knew.

Still more Randall Monroe greatness.

What with one thing and another I learned about the TEA project recently. Ambitious stuff, and I'm completely unprepared to talk about the project as a whole, but being a ex-academic, I wasted no time downloading the white-paper, and brought along a dead-trees version for flight as we took our spawn off to visit one set of Grandparents.1 The paper concerns itself with a distributed crytographic ledger to be used for archiving and authenticating the open-source ecosystem but also to provide a framework for distributing donations across that ecosystem.

The point is that much of the open-source ecosystem is un-sexy tooling and infrastructure that only excites the people who use it to write other open-source software, but donation come from either users (who tend to direct them to sexy top-of-the-stack end-user stuff) and companies (who direct them to things they are specifically trying to improve). Very little money flows to underlying stuff that "just works" even if it requires on-going maintenance to preserve that state of operability. There is a reason Randall uses the word "thanklessly" in the comic. Anyway, the paper observes that to perform such a distribution you would need to know how much each project contributes to the ecosystem, and then that a package manager (such as apt, pip, npm, or homebrew) is exactly an encoded version of that knowledge.

I'm pretty sure my jaw hung slack for the several seconds it took for my mind to re-orient on my freshly remade understanding of the open-source ecosystem. Of course, the package managers already know that stuff. At least collectively.2 And maybe missing the dependency links implicit in rarely used build options. But for the most part it's there.


1 Between the pandemic and our commitments taking care of honorary Grandma we'd been able to avoid this for some time. Now we will, no doubt, have to do the other Grandparents soon as well.

2 It is worth noticing that there are multiple classes of package managers and build dependency systems out there, handling independent (well, largely independent) and distinct (again, largely) ecosystems. To perform as envisioned the proposed tea protocol will need to be flexible enough to express relationships within and between all such groupings. Not easy stuff.

2022-11-20

Not actually that bad (AKA git submodules part 2)

In my last post, I vented some frustration related to a work project. At this point, enough progress has been made to walk back the most wild speculations. Unsurprisingly, part of the problem was me, though that leaves the tool to take some of the blame.

At the end of the last episode we had explored the technical reason you can't simply chain a series of clones of a repository using submodules. Depending on the way submodules are identified, you may be able to work around the limitation (what I've done)1 and you may be able have all clones use the same (possibly thrid-party) master repository for some submodules. Depending on your use case these two options may be sufficient, and in the case of third party modules the latter may the Right Thing (tm).

At that point I actually had my local server copies in place but it wasn't working right. I'd started my investigation and gotten far enough to write

Only now there is the matter of branches and tags.
without having quite solved it. The symptom I had noticed is that there was a branch, call it develop, on the central server that I couldn't checkout from the local server. If I drilled down on the hosting website to find hashes for the version of the files I wanted to look at I found that those hashes were present on the local server. But the branch wasn't.

So, what is a branch, where do they come from, and how do I make sure that the local server has the ones that are on the central server?

A branch (and indeed a tag) is just the associate of a textual name with a particular commit. This class of objects are called "refs".2 And while cloning copies all the contents (commits, trees, and blobs) of the cloned repository it performs some bookkeeping on refs. Moreover exactly what bookkeeping is performed depends on how you run your clone. The gory details are available thanks to stackoverflow user Cascabel and editors, but the long and short of it is I had created the repositories on the local server using git clone --bare when I should have used git clone --mirror.

Sigh.

Thankfully, stackoverflow user onionjake knows the incantation to fix it up in place.


1 In the work-around the non-terminal repositories are in no way unified. You check out the top-level without recursing into the sub-modules and then check out each submodule separately and carefully locate them on your file-system relative the top-level in the way that the terminal repositories are going to expect. The downside of this is that there is no tooling for keeping them in-sync. I suppose I'll write a script. In python, perhaps, because I'm trying to get away from using unix-specific tools for things that could be cross-platform.

Verbing weirds language.
Calvin
2 The main difference between branches and tags is how the associated behaves when you git commit. Tags simply don't care, once you set them up they are fixed and always point to the same commit, but branches can move. Git keeps track of what branch you are "on" and when you perform a commit action, it moves that branch to point to the newly created commit object. Of course, using the same word for the noun and verb is not in the least confusing.

2022-11-16

Is git submodules really this bad? (part 1)

As I mentioned, I'm "getting" to use git submodules, and my frustration level is making a bid for a new personal best. My sense is it's a little clunky even when you use it exactly as envisioned and breaks completely as soon as you stress it. I hope I'm exagerating or outright wrong because I have no choice but to work wih it.

Here's the problem: we're working on one small piece of a larger project (a plugin, as it happens), and the project management is security conscious enough to:1

  • Use submodules as part of a system to selectively limit access to the repository: I can see all main API headers and only those implementation details I will be working directly with. I don't "need" the rest because I can test my plugin against a binary disribuion of the core program.
  • Make it quite a gaunlet to get individual credentials for direct access to the central repository.

To avoid sending each member of my team through the gaunlet as they join I thought "Oh, git is a distributed system,2 right? I'll just create a local working repository for my team and we can push back upstream when we're happy."3 Which is, evidently, not something the designers of submodules anticipated.

The core issue is submoules are found by follwing a either a path or a url which has implications for how a clone of a clone works in projects that use submodules. Look at the contents of a .gitmodules file: for each module there will be a url tag. That tag may be formatted as a filesystem path telling git where to look for the sub-repository on the filesystem where it found the super-repository, or as a url telling git where to find the repository on the wider network.

Image that there exists a project on a central.server: central.server:/repos$ ls compoundproject.git includedproject.git utilityproject.git central.server:/repos$ cat compoundproject.git/.gitmodules [submodule "IncludeProject"] path = IncludeProject url = ../includeproject.git [submodule "UtilityProject"] path = UtilityProject url = http://central.server/repos/utilityproject.git and note that I've rigged the two submoules to use different logic about finding their related repos, but that they will both find the one on the central server.

Now I create the repository for my team (there are slight differences if we make this a bare repository): local.server:/home/git$ git clone --recurse-submoules http://central.server/repos/compoudproject.git [...various git output that looks good...] local.server:/home/git$ ls compoundproject local.server:/home/git$ ls -A compoundproject .git .gitmodules IncludeProject UtilityProject [...some top-level contents...] and if we peak in the sub-project directories we'll see the expected contents.

Next a member of my team tries to set up a working repository developerworkstation:/home/developer/Projects$ git clone --recurse-submoules http://local.server/home/git/repos/compoudproject but this is going to fail when it tries to get IncludeProejct because it is going to look for it at http://local.server/home/git/includeproject.git instead of at http://local.server/home/git/compoundproject/IncludeProject, and if we assume that the developer does not have credentials for central.server then it would also fail when trying to get UtilityProject because it gets that from the central source.

Now, I can solve the first problem by (bare) cloning includeproject.git to local.server beside compoundproject.get. The second problem can only be overcome by getting the developers credentials for central.server.

Okay, backup and replace the subjunctive above with my actuall situation. In effect utilityproject.git is actaully reached by relative reference just like includeproject.git. Consequently I have made bare clones of all three projects on local.server and my developers can do a git clone --recurse-submoules http://local.server/home/git/repos/compoudproject.git and get all three. Yeah! Go me!

Only now there is the matter of branches and tags. I'm not sure I understand this, so the saga will have to continue another day...


1 Coming in heavily on the side of security in the security-versus-getting-things-done trade-off is par for the course in my industry. I've been sighing a lot about this but I'm not at all surprised.

2 Big selling point, right? Every repository is equivalent and you can move updates from any repository to any other repositorye. Of course, the way people actually use DVCS there is a (or are a few) repositories that are central to the workflow even if they are not special to the underlying software. For that matter in git those ones are usually configured as bare repositories so there's feature support in the tool for the distinction. But it is still better than using SVN.

3 Bear in mind that this plan would work just fine for a plain boring project that used git without any sub-whatevers.

2022-11-08

Mystery of the Month Club

Git.

The git parable does a pretty good job of explaining why git makes sense as an abstract tool, and of preparing you to understand other articles on why you should or should not git in various ways.1

But git will still surprise you. I think I sussed out the bit that confused me today. Maybe. It's all to do with submodules. We've opted not to use them locally because we were convinced we didn't understand all the implications. I think we were right but I've gotten involved in a project to write a plugin for a third party tool which does use submodules, so I'm getting to learn.

Of course, submodules is just one option in the mix-n-match-repositories marketplace (along with subrepos and subtrees as well as various wrappers). Probably because the use case was not part of the original design and all the patch-it-up-afer-the-fact schemes have realy drawbacks.


1 Did you notice that the cherry picking article comes in 10 (!) parts. That's because, elegant though it is, git's underlying graph theoretical model requires you to keep track, not just of a DAG, but of an evolving DAG. And also because the series actually explored multiple issues. The first four or five articles (if I recall correcly) treat the main subject and the rest explore extentions to the basic idea.

2022-11-07

Not an advertised feature

The child has a tablet and likes to stream stuff on it in the car, which means tethering it to one of our phones (yes, we're wondering about making the next one cellular enabled). You can imagine the various minor inconveniences that implies, but it also means that when my wife set off on the pre-school delivery run leaving her phone in the upstairs bathroom (where the tablet had no trouble connecting to it), she rapidly learned that something was wrong.

Back they came and (using her smartwatch) she called me from the driveway to tell me she didn't have her phone. Cue double take.

New use for the table: phone abandonment detector.

2022-11-01

Merge tool supporting on-the-fly revision history?

There is this internal library where we've been working on four branches. Call them devel, feature-a, feature-b, and feature-c. All three feature branches are significant and long running, but they have at least had the incremental improvements and bug fixes from devel regularly merged back in.

Recently feature-a and feature-b were merged to devel (by someone else, yeah!). Which means that devel now has large changes relative feature-c. Now, I'm one of the folks working on feature-c and I spent a little while today trying a "what-if" merge of devel into that branch. It wasn't as bad as I feared (because much of this branch is working on different parts of the code than the others), but it was bad.

While I was staring at the merge tool showing another set of conflicting changes without (a) recognising either change as one I had done or (b) understanding the intent behind either occured to me that it would be really nice to be able to ...

Frob1 a highlighted (i.e. changed) section of text to obtain a quick peak at the relevant change logs.

In the simplest form you would see the change summaries from the current commit back to the common anscenstor for each branch; better would be filtering only those commits that touched the current file, and best would be filtering those that involve the lines in question. Being able to drill down on interesting summaries is a bonus feature.

At that point—assuming your team writes reasonable commit messages—you have a fighting chance of sussing out the intent of the changes and thereby a better chance of choosing the right merge action. Of course, I can (and do) check those logs in a couple of terminals kept handy, but UI sugar can improve the experience.

This seemed like something someone might already have implemented, but my Google-fu wasn't up to finding it (or reliably eleminating the possibility). Anyone know?


1 Hover over, center click on, or something. Ask a UI specialist to help.