2023-03-31

How Tabs versus Spaces affects the display of code and the tool chain

I know, why don't I just reintroduce emacs versus vi, right? But bear with me, I'm going somewhere with this.

I'm getting to do another round of "generate a format to fit an existing code base". At least this time I'm not (entirely) responsible for starting the project without having this stuff in place: we recieved the starting code from upstream without auto-formatting support. Anyway, when I ran whatstyle over the existing code it told me we're using tabs. Which came as a total surpise to me. But the project is being built natively in Visual Studio and that thing seems to default to tabs. After cursing for a few minutes I calmed down enough to examine why I have a strong preference for spaces and decided that I didn't, in point of fact, have a reason. Just a habit. Which sent me off to the web to do some reading.

My reading showed me several things in a broad scope:

  • Spaces are the dmoninate choice in most programming communities (even python which seems strange to me).
  • Tab proponents are passionate.
  • Lots of space people seem to be interested in giving the author control of the code presentation.
  • Tab people seem to be interested in giving the viewer control, but there is also a streak of pedantic focus on meanig ("tabs are for indentation, spaces for alignment" is an idea I saw several places).

I also came across a really interesting argument: tabs are an accessibility issue. That is to say that folks with perceptual difference may be better served by controlling the representation. This might be people with sever focus issues wanting to increasing the visual indentation or a braile display using one explicit tab per indent to reduce wasted space. Not a point to be blithely ignored.1

So lets assume, arguendo, that I am convinced. I going to start using tabs in all my new code bases. What does that mean?

Implications of using tabs

  1. Line-length limits Many sets of coding guidelines include line length limits. These may be rigid, modestly flexible (you can overun by up to $N$ spaces to prevent other formatting ugliness), or purely advisory. In any case, they are suppose to provide a limit on how much horizontal space is needed to display the code in it's entirety. Only now different coders are starting different distances in on the same code. I don't see a simple solution to this issue that doesn't require going to the "third way" mentioned below, though I will conceded that it is less bad than the implications that follow.
  2. Alignment of broken lines If you have line-length limits you may have to break long statements or expressions across more than one line. Typically the "extra" line(s) are displayed indented relative the initial line often taking their alignment from operators on the first line. If that alignment indentation is also achieved using tabs, then when viewer changes the tab-width they will mess up the aligned formatting. The fix for this is mentioned above: you use tabs at the start of the line to indicate indentation (and only indetatation) and then do alignment beyond the indentation with spaces.2
  3. Avoid mixed cases at all costs With the exception of the post-indentation alighnment spacing mentioned above any mixed case is a nightmare in which almost no one sees anything reasonable. Automatic tooling should be provided to prevent either spaces at the start of a line or tabs after a space.3

A third way

Honestly, most of the problems identifed above are caused by mixing levels of control: the viewer is given control of the indentation but not of other aspects of the presentation. I'm not the first one to notice that, and not the first to suggest that the optimal solution is to give the viewer complete (or almost complete) control of the presentation. Editors should simply autoformat the incoming code to match the viewers preferences.

This isn't without it's own issues, of course:

  1. Communicating about position in the code At least some of the time, coders discuss position in a file using line numbers, which will break if two programmers are looking at the code in different views. You can, of course, use references to named entities such as method definitions for many things, but that isn't always fine enough. Perhaps the tooling can provide a notion of addressible units (statements?) and the editor can display them in the view. Or you can display the "as stored" line number. But you need something.
  2. Controlling churn on the repository You don't want formatting changes to generate activity in the repository, which means you can't let programmers check-in code formatted to their own preferences. You need to enforce a rigidly defined formatting for the purposes of storage.

Given the power of tools like clang-format, the stored-format/viewed-format part of the idea is entirely feasible, but I'm not aware of a tool that supports the addressibility requirement.

What will I do

I really think I need to talk to my team on this one, but we have tabs in the repository and a tool that can do something reasonable with them. We may be stuck with them.


1 The accessibility argument is the viewer control argument, only with the weight of "we need to be fair to people having a hard time" behind it.

2 And how does this interact with the accessibility argument?

3 Emacs provides something like this in Makefile mode

No comments:

Post a Comment