2023-03-31

How Tabs versus Spaces affects the display of code and the tool chain

I know, why don't I just reintroduce emacs versus vi, right? But bear with me, I'm going somewhere with this.

I'm getting to do another round of "generate a format to fit an existing code base". At least this time I'm not (entirely) responsible for starting the project without having this stuff in place: we recieved the starting code from upstream without auto-formatting support. Anyway, when I ran whatstyle over the existing code it told me we're using tabs. Which came as a total surpise to me. But the project is being built natively in Visual Studio and that thing seems to default to tabs. After cursing for a few minutes I calmed down enough to examine why I have a strong preference for spaces and decided that I didn't, in point of fact, have a reason. Just a habit. Which sent me off to the web to do some reading.

My reading showed me several things in a broad scope:

  • Spaces are the dmoninate choice in most programming communities (even python which seems strange to me).
  • Tab proponents are passionate.
  • Lots of space people seem to be interested in giving the author control of the code presentation.
  • Tab people seem to be interested in giving the viewer control, but there is also a streak of pedantic focus on meanig ("tabs are for indentation, spaces for alignment" is an idea I saw several places).

I also came across a really interesting argument: tabs are an accessibility issue. That is to say that folks with perceptual difference may be better served by controlling the representation. This might be people with sever focus issues wanting to increasing the visual indentation or a braile display using one explicit tab per indent to reduce wasted space. Not a point to be blithely ignored.1

So lets assume, arguendo, that I am convinced. I going to start using tabs in all my new code bases. What does that mean?

Implications of using tabs

  1. Line-length limits Many sets of coding guidelines include line length limits. These may be rigid, modestly flexible (you can overun by up to $N$ spaces to prevent other formatting ugliness), or purely advisory. In any case, they are suppose to provide a limit on how much horizontal space is needed to display the code in it's entirety. Only now different coders are starting different distances in on the same code. I don't see a simple solution to this issue that doesn't require going to the "third way" mentioned below, though I will conceded that it is less bad than the implications that follow.
  2. Alignment of broken lines If you have line-length limits you may have to break long statements or expressions across more than one line. Typically the "extra" line(s) are displayed indented relative the initial line often taking their alignment from operators on the first line. If that alignment indentation is also achieved using tabs, then when viewer changes the tab-width they will mess up the aligned formatting. The fix for this is mentioned above: you use tabs at the start of the line to indicate indentation (and only indetatation) and then do alignment beyond the indentation with spaces.2
  3. Avoid mixed cases at all costs With the exception of the post-indentation alighnment spacing mentioned above any mixed case is a nightmare in which almost no one sees anything reasonable. Automatic tooling should be provided to prevent either spaces at the start of a line or tabs after a space.3

A third way

Honestly, most of the problems identifed above are caused by mixing levels of control: the viewer is given control of the indentation but not of other aspects of the presentation. I'm not the first one to notice that, and not the first to suggest that the optimal solution is to give the viewer complete (or almost complete) control of the presentation. Editors should simply autoformat the incoming code to match the viewers preferences.

This isn't without it's own issues, of course:

  1. Communicating about position in the code At least some of the time, coders discuss position in a file using line numbers, which will break if two programmers are looking at the code in different views. You can, of course, use references to named entities such as method definitions for many things, but that isn't always fine enough. Perhaps the tooling can provide a notion of addressible units (statements?) and the editor can display them in the view. Or you can display the "as stored" line number. But you need something.
  2. Controlling churn on the repository You don't want formatting changes to generate activity in the repository, which means you can't let programmers check-in code formatted to their own preferences. You need to enforce a rigidly defined formatting for the purposes of storage.

Given the power of tools like clang-format, the stored-format/viewed-format part of the idea is entirely feasible, but I'm not aware of a tool that supports the addressibility requirement.

What will I do

I really think I need to talk to my team on this one, but we have tabs in the repository and a tool that can do something reasonable with them. We may be stuck with them.


1 The accessibility argument is the viewer control argument, only with the weight of "we need to be fair to people having a hard time" behind it.

2 And how does this interact with the accessibility argument?

3 Emacs provides something like this in Makefile mode

2023-03-29

A Nudge too far

There is a fine line between being a clever, benevolent technocrat or entrepreneur on one hand and a controlling, supercilious asshole on the other.

Title in reference to the book, of course.

It's hard to define the line (or more likely, several lines) that you can cross to get from one to the other because, well, people. But I can tell you one way to get to the wrong side of the line: slide from making the choice you prefer easy into actively standing between me and the choice I want. That's asshole territory plain and simple.

DOS?

My calendar for tomorrow appears to have been the victim of a denial of service attach against my having a life (or even a little breathing room). How the heck did all that stuff pile up?

2023-03-22

Don't save!

I do a lot of programming at work in Qt Creator which is a perfectly acceptable IDE,1 and up until recently we used qmake to manage the build for basically the same reason we use Qt Creator: because these are Qt projects and the native tools understand them.

But several things have happened recently2 that collectively have made us move to cmake for most projects. And that means we've bumped into a quirk of the IDE: there is a big time-cost to changing the CMakeLists.txt files. Not I hasten to add because its any slower than anything else in running cmake. It does that just fine. But after it re-configures the build, it re-scans the project to locate files that should be listed in the project tree-view. For more than thirty second in the case of my main project. And for some reason it does this as a synchronous operation: preventing you from taking any other action in the IDE.

Now, most of the time this is a non-issue: I'm working on the code not the build, so it rarely comes up and is a minor annoyance.

But if you are working on the build (as I was this afternoon), it is critical to overrule that "hit the save key-combo everytime you pause to think" reflex that makes so much sense at other times.


1 I mean, it can't refactor template code very well, but you can at least learn how it fails and know how to fix up the results.

2 Qt has deprecated qmake with Qt6, we've started using some of our code (including a underlying library) with third party projects that use cmake, and we have found that cmake is slightly better at parallelizing bigish builds resulting in slighlty but noticiably faster edit-compiler-test cycles.