2020-10-03

"Safe" string building in c

The deficiencies of the c standard library where strings are concerned have been discussed at length in many places, but I'd like to talk about one issue in particular: building up strings from pieces.

This week I wanted to build up some strings from pieces. Lots of little pieces. In a language with "real" strings this would be easy, you'd just do something like string result = substring1 + substring2 + substring3;, but c does not support that in any general way.1 There are really only two classes of tools available: strcat/strcpy functions from string.h and sprintf functions from stdio.h. Essentially all the "plain" functions will buffer overrun, and the "n" variants only protect you if you pass the right length (where "right" means paying attention to which functions count the terminating '\0' and which don't). Let's take a closer look at some of them .

strcat, strncat (and strcpy, strncpy)

On the face of it this is the "obvious" thing to do. After all it concatenates strings, right? The main issue is that the first string needs to occupy a sufficiently large buffer or you run into trouble. Which means knowing the desired length when you create the buffer.

If you have a separate buffer you use a "copy" function to move the first string into it and then concatenate onto the end.

As a side note, if you are going to repeatedly apply strcat, do capture the return pointers from each call to avoid a Shlemiel the Painter algorythm.

sprintf, snprintf

Same basic problem: the target buffer needs to be big enough. On the up-side, you don't need extra elements for any little connector text you want to stick in between the substrings that you are recieving from the caller, the environment, the database, or whatever.

And that is really it.

Variable length arrays or malloc

So, you need to wait until you have all the parts, find the desired length, and then create a buffer. Fine. You have two choices: a dynamic allocation or variable length arrays.

The issue with variable length arrays is that they weren't stanadrdized until 1999 and then were made optional in 2011 (and at least one major compile does not support them). Code that depends on VLAs is going to have limited poratability.

So you're going to have to put it on the heap with all the hassle and risks that entails. Great. Be sure to check the return value.

Recipe

All this is old hat and there is a well known hack to accomplish it. You use the "returns the length that would have been written" feature of snprintf functions like this:

nonnullcount = snprintf(NULL,0,...);
buf = malloc(nonnullcount+1); /* watch out for the with/without '\0' issue! */
/* check for errors */
snprintf(buf,nonnullcount+1,...); /* watch out for the with/without '\0' issue! */

It's not hugely time efficient; the two passes through the formatting engine are inellegant, but it's very flexible and does the job.

The three lines of code I exhibited there are idomatic and easy to recognise, so sprinkling them through your code wouldn't be too bad. Except that the error checking is pretty improtant and it will add to the length and break up the visual block. Not nice. So I'd like to encapsulate all that in a function. Which is where it gets technical; how are you with c's variadac function support?

Wrapped up and made pretty I get something like (written to be compatible with c89 compilers even if they would require a more up-to-date libc):

/* A "safe" auto-allocating sprintf.
 *
 * Returns a pointer to a malloc'ed buffer containing the resulting string or
 * NULL if an error occurs. Leaves errno set following the error.
 *
 * It is the caller's responsibility to free the buffer.
 */
#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>

char * smallocprintf(const char * format, ...)
{
    char * buf = NULL;
    size_t nonnullcount = 0;
    va_list args;
    va_start(args,format);

    nonnullcount = vsnprintf(NULL,0,format,args);
    va_end(args);
    buf = malloc(nonnullcount+1); /* +1 for the '\0' */

    if (buf == NULL)
        return NULL; /* Leave errno intact for the caller to deal with */

    va_start(args,format);
    vsnprintf(buf,nonnullcount+1,format,args); /* +1 for the '\0' */
    va_end(args);

    return buf;
}

Frankly I imagine that code like this exists in private libraries all over the place, but I hadn't seen it myself, and it does what I need. Late edition: In particular this seems to be a close analog to the GNU c library function asprintf though there are some interface differences.


1 String litterals may be simply concatenated in code, but not null terminated character arrays and buffers.

No comments:

Post a Comment