2022-11-16

Is git submodules really this bad? (part 1)

As I mentioned, I'm "getting" to use git submodules, and my frustration level is making a bid for a new personal best. My sense is it's a little clunky even when you use it exactly as envisioned and breaks completely as soon as you stress it. I hope I'm exagerating or outright wrong because I have no choice but to work wih it.

Here's the problem: we're working on one small piece of a larger project (a plugin, as it happens), and the project management is security conscious enough to:1

  • Use submodules as part of a system to selectively limit access to the repository: I can see all main API headers and only those implementation details I will be working directly with. I don't "need" the rest because I can test my plugin against a binary disribuion of the core program.
  • Make it quite a gaunlet to get individual credentials for direct access to the central repository.

To avoid sending each member of my team through the gaunlet as they join I thought "Oh, git is a distributed system,2 right? I'll just create a local working repository for my team and we can push back upstream when we're happy."3 Which is, evidently, not something the designers of submodules anticipated.

The core issue is submoules are found by follwing a either a path or a url which has implications for how a clone of a clone works in projects that use submodules. Look at the contents of a .gitmodules file: for each module there will be a url tag. That tag may be formatted as a filesystem path telling git where to look for the sub-repository on the filesystem where it found the super-repository, or as a url telling git where to find the repository on the wider network.

Image that there exists a project on a central.server: central.server:/repos$ ls compoundproject.git includedproject.git utilityproject.git central.server:/repos$ cat compoundproject.git/.gitmodules [submodule "IncludeProject"] path = IncludeProject url = ../includeproject.git [submodule "UtilityProject"] path = UtilityProject url = http://central.server/repos/utilityproject.git and note that I've rigged the two submoules to use different logic about finding their related repos, but that they will both find the one on the central server.

Now I create the repository for my team (there are slight differences if we make this a bare repository): local.server:/home/git$ git clone --recurse-submoules http://central.server/repos/compoudproject.git [...various git output that looks good...] local.server:/home/git$ ls compoundproject local.server:/home/git$ ls -A compoundproject .git .gitmodules IncludeProject UtilityProject [...some top-level contents...] and if we peak in the sub-project directories we'll see the expected contents.

Next a member of my team tries to set up a working repository developerworkstation:/home/developer/Projects$ git clone --recurse-submoules http://local.server/home/git/repos/compoudproject but this is going to fail when it tries to get IncludeProejct because it is going to look for it at http://local.server/home/git/includeproject.git instead of at http://local.server/home/git/compoundproject/IncludeProject, and if we assume that the developer does not have credentials for central.server then it would also fail when trying to get UtilityProject because it gets that from the central source.

Now, I can solve the first problem by (bare) cloning includeproject.git to local.server beside compoundproject.get. The second problem can only be overcome by getting the developers credentials for central.server.

Okay, backup and replace the subjunctive above with my actuall situation. In effect utilityproject.git is actaully reached by relative reference just like includeproject.git. Consequently I have made bare clones of all three projects on local.server and my developers can do a git clone --recurse-submoules http://local.server/home/git/repos/compoudproject.git and get all three. Yeah! Go me!

Only now there is the matter of branches and tags. I'm not sure I understand this, so the saga will have to continue another day...


1 Coming in heavily on the side of security in the security-versus-getting-things-done trade-off is par for the course in my industry. I've been sighing a lot about this but I'm not at all surprised.

2 Big selling point, right? Every repository is equivalent and you can move updates from any repository to any other repositorye. Of course, the way people actually use DVCS there is a (or are a few) repositories that are central to the workflow even if they are not special to the underlying software. For that matter in git those ones are usually configured as bare repositories so there's feature support in the tool for the distinction. But it is still better than using SVN.

3 Bear in mind that this plan would work just fine for a plain boring project that used git without any sub-whatevers.

No comments:

Post a Comment