Software Engineering – Suspension of Disbelief

Error Handling in C with goto

In higher-level languages, the standard approach to handle errors are exceptions. C, however, does not have such feature. Even so, there is this pattern that simplifies handling scenarios where many operations have to be undone. And this uses the all too polemic goto command…

Recently, a discussion started on the Python Brasil mailing list about the reasons for using exceptions. At one point, a notably competent participant commented on how difficult it is to handle errors through function returns, as in C.

When you have a complex algorithm, each operation that can fail requires a series of ifs to check if the operation was successful. If the operation fails, you need to revert all previous operations to exit the algorithm without altering the program’s state.

Let’s look at an example. Suppose I have the following struct to represent arrays:

typedef struct {
    int size;
    int *array;
} array_t;

Now, I’m going to write a function that reads from a text file the number of elements to be placed in one of these arrays and then reads the elements themselves. This function will also allocate the array struct and the array itself. The problem is that this function is quite prone to errors, as we might fail to:

Open the given file;
Allocate the struct;
Read the number of elements from the given file, either due to input/output error or end of file;
Allocate memory to store the elements to be read;
Read one of the elements, either due to input/output error or end of file.

Complicated, right? Note that if we manage to open the file but fail to allocate the struct, we have to close the file; if we manage to open the file and allocate the struct but fail to read the number of elements from the file, we have to deallocate the struct and close the file; and so on. Thus, if we check all errors and adopt the tradition of returning NULL in case of an error, our function would look something like this:

array_t *readarray(const char *filename) {
    FILE *file;
    array_t *array;
    int i;

    file = fopen(filename, "r");
    if (file == NULL) return NULL;

    array = malloc(sizeof(array_t));
    if (array == NULL) {
        fclose(file);
        return NULL;
    }

    if (fscanf(file, "%d", &(array->size)) == EOF) {
        free(array);
        fclose(file);
        return NULL;
    }

    array->array = malloc(sizeof(int) * array->size);
    if (array->array == NULL) {
        free(array);
        fclose(file);
        return NULL;
    }

    for (i = 0; i < array->size; i++) {
        if (fscanf(file, "%d", array->array + i) == EOF) {
            free(array->array);
            free(array);
            fclose(file);
            return NULL;
        }
    }
    return array;
}

Indeed, quite laborious, and with a lot of repeated code…

Note, however, that there are two situations in the code above.

In one, when I have two operations to revert, I need to revert the last executed one first, and then the previous one. For example, when deallocating both the struct and the integer array, I need to deallocate the integer array first and then the struct. If I deallocate the struct first, I may not be able to deallocate the array later.
In the other situation, the order doesn’t matter. For example, if I am going to deallocate the struct and close the file, it doesn’t matter in which order I do it. This implies that I can also revert the last executed operation first and then the first operation.

What’s the point of this? Well, in practice, I’ve never seen a situation where I have to revert the first executed operation first, then the second, and so on. This means that, when performing the operations a(), b(), c(), etc., the “natural” way to revert them is to call the revert functions in reverse order, something like:

a();
b();
c();
/* ... */
revert_c();
revert_b();
revert_a();

Now comes the trick. In the code above, after each operation, we’ll place an if to check if it failed or not. If it failed, a goto will be executed to the revert function of the last successful operation:

a();
if (failed_a()) goto FAILED_A;
b();
if (failed_b()) goto FAILED_B;
c();
if (failed_c()) goto FAILED_C;
/* ... */
revert_c();
FAILED_C:
revert_b();
FAILED_B:
revert_a();
FAILED_A:
return;

If a() fails, the algorithm returns; if b() fails, the algorithm goes to FAILED_B:, reverts a() and returns; if c() fails, the algorithm goes to FAILED_C, reverts b(), reverts a(), and returns. Can you see the pattern?

If we apply this pattern to our readarray() function, the result will be something like:

array_t *readarray(const char *filename) {
    FILE *file;
    array_t *array;
    int i;

    file = fopen(filename, "r");
    if (file == NULL) goto FILE_ERROR;

    array = malloc(sizeof(array_t));
    if (array == NULL) goto ARRAY_ALLOC_ERROR;

    if (fscanf(file, "%d", &(array->size)) == EOF)
        goto SIZE_READ_ERROR;

    array->array = malloc(sizeof(int) * array->size);
    if (array->array == NULL) goto ARRAY_ARRAY_ALLOC_ERROR;

    for (i = 0; i < array->size; i++) {
        if (fscanf(file, "%d", array->array + i) == EOF)
            goto ARRAY_CONTENT_READ_ERROR;
    }
    return array;

    ARRAY_CONTENT_READ_ERROR:
    free(array->array);
    ARRAY_ARRAY_ALLOC_ERROR:
    SIZE_READ_ERROR:
    free(array);
    ARRAY_ALLOC_ERROR:
    fclose(file);
    FILE_ERROR:
    return NULL;
}

What are the advantages of this pattern? Well, it reduces the repetition of operation reversal code and separates the error handling code from the function logic. In fact, although I think exceptions are the best modern error handling method, for local error handling (within the function itself), I find this method much more practical.

(This post is a translation of Tratamento de errors em C com goto, originally published in Suspensão de Descrença.)

Implementing malloc() e free() — reducing the heap even more

In our journey implementing malloc() and free(), we learned to reuse memory blocks. Today, we will make a very simple optimization: reduce the heap size as much as possible.

This post is part of a series on implementing the malloc() and free() functions. In the previous article, we learned how to reuse memory blocks. It was a significant advancement, but there’s much more room for improvement.

One example is reducing the size of the heap, as explained in the first post. When we free the last memory block, we move the top of the heap to the end of the previous block. However, this previous block might also be free, as well as others. Consider the scenario below:

void *ptr1 = abmalloc(8);
void *ptr2 = abmalloc(8);
abfree(ptr1);
abfree(ptr2);

In this case, when we free the block pointed to by ptr2, we make ptr1 the last block. However, ptr1 is also free, so we could further reduce the heap size.

To achieve this, we’ll iterate over the pointers from the end of the list until there are no more free blocks. If the header of the received pointer points to the last block and the previous block is free, we move the header pointer to it. We repeat this process until we reach an available block whose previous block is in use (or NULL if it’s the first block). Then, we execute the heap reduction procedure:

if (header == last) {
  while ((header->previous != NULL) && header->previous->available) {
    header = header->previous;
  }
  last = header->previous;
  brk(header);
} else {

Now, though, we need to fix a bug in abfree(). According to the specification, the free() function should accept a null pointer and do nothing. However, if abfree() receives NULL, we will have a segmentation fault! Fortunately, it is easy to fix by adding a check at the beginning of the function:

void abfree(void *ptr) {
   if (ptr == NULL) {
     return;
   }
   Header *header = (Header*) ptr - 1;

So, here’s our abfree() function at the moment:

void abfree(void *ptr) {
   if (ptr == NULL) {
     return;
   }
   Header *header = (Header*) ptr - 1;
   if (header == last) {
     while ((header->previous != NULL) && header->previous->available) {
       header = header->previous;
     }
     last = header->previous;
     brk(header);
   } else {
     header->available = true;
   }
 }

Reducing the size of the heap is a simple optimization, but there are still challenges ahead. For example, the way we choose blocks to reuse can lead to larger than necessary heaps. We will why, and how to solve that, in the next post.

(This post is a translation of Implementando malloc() e free() — reduzindo ainda mais o heap, first published in Suspensão de Descrença.)

Tiny Ticket Types

Tickets in Jira tend to accumulate redundant and optional fields, becoming complex and confusing. I like Jira, but I understand the frustration it causes.

A plausible solution could be inspired by software development. We programmers are used to finding massive source files, and we know that breaking them into smaller files drastically improves code comprehension. Therefore, inspired by coding best practices, I suggest creating smaller tickets.

Only three states

One way to limit the size of tickets is to simplify the workflow by restricting the number of states. For example, we can define that each type of ticket would have, at most, three states:

To do
In progress
Done

To represent other stages, we can create new types of tickets, such as sub-tasks.

A moderately complex ticket type

Let’s look at an example. Consider the ticket below:

Key: XYZ-1234. Status: Testing. Title: Nasal demons. Description: Calling free() on a previosly dealocaded pointer results in demons coming out of the nose. Technical analysis: The root cause is an undefined behavior. Test results: The patch does not work, now ghosts pop out of the user’s ears. Release date: 2023-12-22

It would follow this workflow:

Open ⇨ To do ⇨ In Analysis ⇨ Doing ⇨ Testing ⇨ Release ⇨ Done

How could we reduce the number of phases?

We can start by removing the “In Analysis” stage. In its place, we create a new type of ticket called “Technical Analysis.” This way, the original task remains in progress (“Doing”) while the technical analysis is underway.

Fewer fields in a ticket

An advantage of this would be transferring fields to sub-tasks. Fields that would clutter the original ticket can appear only in tasks where they are relevant.

Consider the “Release date” field, which only makes sense in the “Release” phase. If developers, testers, etc., are not responsible for the release, this field is confusing and pollutes the original task. With a new task type called “Release,” this field would be in the most appropriate place, keeping the original ticket concise.

Repeating stages without regressing

Another advantage is that the original ticket can go through the same stage multiple times. It’s common for a ticket to have a development phase followed by quality tests, for example. However, if a problem arises in the evaluation, it’s not advisable to revert to the development phase. How to deal with this?

By working with sub-tasks, we can mark validation as completed and create a new implementation ticket. In our ticket, for example, we can remove the “Testing” phase and create a sub-task of type “Test,” as well as another one called “Development.” Every time the test fails, we close testing and open a new development task.

Result

Following this strategy, our ticket would look like this:

And the workflow would be much simpler:

Naturally, this strategy is flexible. In our case, for example, we haven’t removed the “To do” phase yet. Restricting it to five (including backlog and validation) is another possibility. The core idea is to limit the number of stages to a small value for all tickets.

Conclusions

In programming, it’s common to encounter the so-called “God objects,” huge objects responsible for various different functions. Breaking them down is a surefire way to achieve code quality. Therefore, I suspect the same principle can apply to tickets in Jira.

I’m not a project manager, but as a programmer, I believe that limiting the size and steps of tickets can be an effective idea. I’m curious to know if anyone has tried this and how it went.

Billing the technical debt

I really like to work on technical debt issues only when they affect demands. But why? Well, here are some reasons…

Some time ago the man, the myth, the legend Fabrício Buzeto asked this interesting question:

Out of curiosity. Does your team keep a list of technical debt? Does it make you feel joy?

It brought me some memories back. I was for a few years responsible for the Liferay Calendar and Kaleo Designer portlets. These were complex single-page apps, built in a fast pace when the concept of SPAs was still evolving: many choices called for a review.

So I started writing JIRA tickets for technical debt. When one of those health issues made a bug fix or feature harder to implement, I’d convert that technical debt ticket into a sub-task of the demand. As I like to say, I was “billing the debt from the feature.”

I commented that and he asked me a crucial question:

Why not treat them like any other card in the backlog then?

Why, indeed?

Well, at first, we tried! I would present the debt issues in our prioritization meetings. Having the problems written helped a lot to caught the managers’ attention, by the way.

Technical debt is a hard sell, though. People are understandably wary about buying into something whose value they could not see. Nonetheless, changes took increasingly more time to deliver and regression bugs kept popping up. We needed to fix these health problems.

That’s why I started to work on debt as part of value-adding tasks. Working on the debt to make a demand easier was a great evidence that extra work was worth it. It was not just some random idea we worked on to postpone duties: it delivered value.

That is the first reason for handling technical debt as sub-tasks of value issues: By binding the debt to a value-adding task, it is easier to justify the extra effort to stakeholders.

At first, this debt-billing was only a communication device. But there was a cool side effect: the most glaring issues kept being solved first. That makes sense: since we worked on them when they caused problems, the the ones causing more problems were solved first. Since prioritization is always a challenge (and prioritizing technical debt is even harder) it was a huge help.

We still had a pile of technical debt tasks, but many of the pending tasks were not relevant. Some, already solved. Others were elegant ideas back then, but didn’t make sense anymore. In hindsight, a good part of the “debt” were personal preferences, or assumptions that weren’t true anymore after some product evolution.

This is the second reason for debt-billing: Working on health issues as part of demand is an effective way to prioritize which technical debt to work on.

See how great it is! Had we worked on technical debt by themselves — for example, in a task force —, we might apply changes that could actually make future evolution harder. Debt-billing let us confirm which requests were fit for our goals. And it has a subtler, more important consequence.

We developers are are an opinionated lot, and this is good. We usually try to make these opinions into a goal. But it is hard to know if a goal is right. Once we use these ideas as helpers for something more clearly relevant, that goal turns into a tool. Tools are much easier to evaluate!

This is a third reason for debt-billing: when technical debt is linked to value delivery, the creative force from the team works together with the organization’s objectives.

Our experience is that this strategy was quite effective. Everybody knew their suggestions would be evaluated: health tasks wouldn’t be a chore to prioritize anymore, but a toolset that our colleagues would look for to help with their challenges. The debt backlog was not a wishing well anymore.

The apps got better, too. When I started working on the Calendar, for example, it was usually seen as a especially problematic portlet. The first release couldn’t schedule events! When I left that team, the Calendar had no bug of priority 3 or higher (the levels we have to fix). And we delivered quite a good amount of features, even some missing in leader competitors. Not bad for a product that was an example of a non-working feature!

It felt right to bill the technical debt from the demands, but I never thought deeply about why it felt right. So, thank you for asking that, Fabricio! It was a joy to think about it.

EDIT: I just recalled Ron Jeffries wrote a great post about his approach to refactoring, which the one here is similar to, although advocating against a specific point. Totally worth reading!