TransWikia.com

Format string processing similar to date or printf()

Code Review Asked by domsson on November 11, 2021

I wrote a function that is supposed to take a format string similar to what date accepts and replace all format specifiers with strings provided by a callback function.

  • Format specifiers are sequences of a % plus another character; %% for a literal %
  • Replace the format specifier with the string provided by the callback function
  • If the callback function returns NULL, no replacement should take place (leave the specifier in)
  • The callback function receives the character following % as argument
  • An optional pointer can be passed through to the callback function
  • If the last character of the format string is %, it should be left as-is
  • The result of the processing is to be placed in the provided buffer
  • If the provided buffer isn’t big enough, simply stop processing
  • Always null-terminate the provided buffer, even if it isn’t big enough

The function works fine as far as I can tell. I’m looking for any feedback I can get.

char*
format(const char* format, char *buf, size_t len, char* (*cb)(char c, void* ctx), void *ctx)
{
    const char *curr;  // current char from format
    const char *next;  // next char from format

    size_t i = 0;      // index into buf
    char *ins = NULL;  // string to insert

    // iterate `format`, abort once we exhaust the output buffer
    for (; *format && i < (len-1); ++format)
    {
        curr = format;
        next = format+1;

        if (*curr == '%' && *next) 
        {
            if (*next == '%') // escaped %, copy it over and skip
            {
                buf[i++] = *format++;
                continue;
            }
            if ((ins = cb(*next, ctx))) // get string to insert
            {
                // copy string, again aborting once buffer full
                while (*ins && i < (len-1))
                {
                    buf[i++] = *ins++;
                }
                ++format;
                continue;
            }
        }
    
        // any other character, just copy over
        buf[i++] = *curr;
    }

    // null terminate
    buf[i] = '';
    return buf;
}

Examples for (tested) input and output, assuming the callback function always returns FOO:

  • (empty string): (empty string)
  • %: %
  • %%: %
  • %f: FOO
  • %%f: %f
  • %%%f: %FOO
  • %%%%f: %%f

If so requested, I can provide a link to the context/project where this is used.

2 Answers

Just a small idea:

Handle len==0

Just add a little code to gracefully cope with len==0 rather than UB.

char *format(const char* format, char *buf, size_t len, ...) {
  ...
  //                         0 -1 is SIZE_MAX!
  // for (; *format && i < (len-1); ++format) {
  for (; *format && i + 1 < len); ++format) {
  ...
  // buf[i] = '';
  if (i < len) buf[i] = '';
}

Answered by chux - Reinstate Monica on November 11, 2021

Return the number of bytes written to buf

You will notice that functions like sprintf() and strftime() don't return a pointer, but rather an integer that says something about the number of bytes that (would) have been written to the output buffer. This is much more useful than just copying the pointer to buf, which doesn't give the caller any new information.

Where is the string returned by the callback function allocated?

The callback function returns a pointer to a string. But where is this allocated? Your format() function doesn't call free(), so either the string should be stored in some statically allocated array, or it is allocated on the heap. In the former case, unless you return a pointer to a string literal, your format() function can only be used from one thread at a time. If you return memory that is allocated on the heap, then you have to keep track of it so the caller can clean up all the allocated memory once format() returns.

Consider having the callback function write into buf directly

To solve the above issue, and to avoid an unncessary copy, you can pass a pointer into the buffer and the remaining size to the callback function, and have the callback function write directly to the buffer. For example:

char*
format(const char* format, char *buf, size_t len, size_t (*cb)(char c, void* ctx, char *buf, size_t len), void *ctx) {
    ...
        if (*curr == '%' && *next) 
        {
            if (*next == '%') // escaped %, copy it over and skip
            {
                buf[i++] = *format++;
                continue;
            }
            i += cb(*next, ctx, buf + i, len - i - 1);
            ++format;
            continue;
        }
    ...
}

And then your callback function can look like:

size_t example_cb(char c, void *ctx, char *buf, size_t len) {
    if (c == 'f') {
        if (len > 3)
            len = 3;
        memcpy(buf, "FOO", len);
        return len;
    }

    return 0;
}

You can create a helper function to avoid repeating the above construction, and to safely write any string to the buffer:

size_t emplace_string(const char *str, char *buf, size_t max_len) {
    size_t len = strlen(str);
    if (len > max_len)
        len = max_len;
    memcpy(buf, str, len);
    return len;
}
        
size_t example_cb(char c, void *ctx, char *buf, size_t len) {
    switch (c) {
    case 'f':
        return emplace_string("FOO", buf, len);
    case 'B':
        return emplace_string("bar", buf, len);
    ...
    default:
        return 0;
    } 
}

Answered by G. Sliepen on November 11, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP