brandizzi

Implementing malloc() and free() — memory alignment

Our implementation of malloc() and free() has progressed significantly. In the last post, we saw, for example, how to unify smaller memory blocks to avoid fragmentation. To avoid even more problems in this area, we’re going to address an important performance detail that we’ve been neglecting for a while: memory alignment.

What is memory alignment?

Modern computers read fixed-size memory blocks, corresponding to the processor’s word size. A 32-bit processor, for example, reads four bytes at a time, while a 64-bit processor reads eight. However, this reading only occurs from an aligned address—that is, a 32-bit processor can only read its four bytes if the memory address is a multiple of four.

If the address is not a multiple of the word size, the processor will need to access memory twice: first to fetch the first part of the word and then for the second. This becomes even worse if the allocator returns misaligned pointers, as all subsequent nodes will also be misaligned, doubling the number of memory accesses on the heap.

Ensuring allocated blocks’ alignment

To avoid this problem, it is necessary to allocate a multiple of the processor’s word size. However, determining this size can be complicated. A simple and effective solution for our case is to adopt multiples of 8 bytes, since the most common bus currently is 64 bits. Thus, all allocated block addresses must be multiples of 8. Let’s then define a constant for the alignment size:

define _ABMALLOC_ALIGNMENT_SIZE 8

In the abmalloc() function, we check if the requested size is aligned, that is, if it is a multiple of _ABMALLOC_ALIGNMENT_SIZE. If it is already aligned, we don’t need to do anything:

void *abmalloc(size_t size) {
  if (size == 0) {
    return NULL;
  }
  size_t rest = size % _ABMALLOC_ALIGNMENT_SIZE;

If the requested size is not aligned, we need to increase it. First, we calculate the remainder of dividing the requested size by _ABMALLOC_ALIGNMENT_SIZE and subtract this value from the originally requested size. Then, we add _ABMALLOC_ALIGNMENT_SIZE. This ensures that the new size is the smallest possible multiple of the required alignment.

if (rest != 0) {
  size = size - rest + _ABMALLOC_ALIGNMENT_SIZE;
}

Done! Now, abmalloc() always returns aligned addresses.

A question and a perk

A valid question is: won’t this result in wasted memory? On platforms with 32-bit buses, for example, this could lead us to allocate 7 unused bytes when we could allocate only 3 unused bytes. Here, we assume this isn’t a problem, but there’s a relatively simple solution: just change the value of the constant to 4, for example, and recompile our code.

This adjustment brings another advantage: very small blocks cease to be allocated. For example, if someone requests a block of only 1 byte, it would hardly be reused, since a single byte is rarely needed. (Ideally, such small blocks should never be allocated, but this is a user decision.) With a minimum size of 8 bytes, these smaller blocks, when released, will have a greater chance of being reused.

Conclusion

With that, we conclude our experiment. This is not a very good implementation of malloc(). For starters, it’s not thread-safe. Furthermore, modern implementations utilize much more efficient techniques and heuristics, such as binning. There are also some very interesting advancements in research, such as Mesh.

Our goal was simply to learn a little more, through practice and gradually evolving code, about how memory allocation works. Let us know in the comments if you gained a better understanding of the topic! And if you’d like to explore further, you can check out the final result in the GitHub repository.

Implementing malloc() and free() — merging small blocks

In the previous post, we learned how to split blocks to make better use of memory. However, this brings a new challenge: memory fragmentation. Smaller blocks can accumulate, forming chains of free blocks that, if unified, would serve larger requests. As blocks are split, they become smaller and smaller, making it difficult to reuse them.

How small blocks can increase memory consumption

Consider the snippet below:

void *ptr1 = abmalloc(128);
void *ptr2 = abmalloc(8);
abfree(ptr1);
void *ptr3 = abmalloc(8);
abfree(ptr3);
void *ptr4 = abmalloc(128);

In it, we allocated 128 bytes of memory, and then 8 bytes. The memory blocks would have a layout similar to the figure below:

Then we release the larger block:

…and we allocated 8 bytes. Due to the latest changes in the function abmalloc(), these 8 bytes will be extracted from the larger block:

Now, by freeing the newly allocated 8 bytes, we got two free blocks:

When we allocate a new 128-byte block, we face a problem: neither of the two free blocks is large enough individually, although together they could satisfy the request. As a result, a new block is allocated at the end of the
stack , which continues to grow.

One solution to this problem is to merge adjacent free blocks.

Refactoring for reuse and readability

Before we move on, we will do one more round of refactoring, as there are small but complex pieces of code that will be reused.

First, let’s create a function that, when receiving a header, returns a pointer to the memory area to be returned to the user. So far, we have obtained this result by adding one unit to the header pointer . Let’s encapsulate this logic in a function to simplify and improve code reuse:

void *header_user_area(Header *header) {
  return header + 1;
}

Consequently, in all the places abmalloc()where we previously used the line below…

return header + 1;

…let’s use the new function:

return header_user_area(first);

We also often need the pointer to the memory location following the block. Currently, we calculate this with a somewhat complicated formula: we cast the header pointer to void*, add the size of the structure Headerthe size of the allocated block. This logic can be seen in the function header_split():

Header *new_header = ((void*)header) + sizeof(Header) + header->size;

Let’s create a function to perform this operation, called header_address_after_block():

void *header_address_after_block(Header *header) {
  return ((void*)header) + sizeof(Header) + header->size;
}

Since the function header_user_area() already performs basically the same operation as the first part of the formula, we can use it to simplify header_split():

void *header_address_after_block(Header *header) {
  return header_user_area(header) + header->size;
}

After that, we will replace the formula with our function in
header_split():

Header *new_header = header_address_after_block(header);

Finally, it will be necessary to check whether, given a header, there is a free block before or after it. The expression for this check is intuitive, but extensive. For example, the formula below is used to determine whether there is a free block before it:

(header->previous != NULL) && header->previous->available

To make reading a little easier, let’s put this expression in a function, just like the other one, corresponding to later blocks:

bool header_previous_available(Header *header) {
  return (header->previous != NULL) && header->previous->available;
}

bool header_next_available(Header *header) {
  return (header->next != NULL) && header->next->available;
}

Okay, now we have four new tools that will help us merge available blocks. Let’s get started!

Searching for adjacent free blocks

To join adjacent free memory blocks, we need to check whether the previous or the next block is available at deallocation time. We can do this by creating two variables: one to point to the first free block and the other to the last free block. Initially, both will point to the block that is being deallocated.

Header *header_merge(Header *header) {
  Header *first_available = header;
  Header *last_available = header;

If the block before the current one is free, first_available point to it.

   if (header_previous_available(header)) {
     first_available = header->previous;
   }

Similarly, if the next block is available, last_available it should point to it.

  if (header_next_available(header)) {
    last_available = header->next;
  }

When there are no free blocks

If there are no free blocks either in front of or before the newly deallocated block, there is nothing to merge. In this case, the variables first_available and
last_available will be equal to header, and our function will simply return the pointer to the block that was passed to it.

  if ((first_available == header) && (last_available == header)) {
    return header;
  }

We do not need to check the block before the previous one, or after the next one, and so on. If, in each call of abfree(), we merge the block with any free neighbor, it is impossible to have two consecutive free blocks. Therefore, it is sufficient to check only the immediate neighbors.

Updating size when there are free blocks

If any of the variables first_available or last_available is not equal to header, we will need to merge the blocks. To do this, we need to perform two actions.

First, we need to update the size of the first available block to include the last block. To do this, we get the pointer that points just past the last available block.

  void *end = header_address_after_block(last_available);

The size of the new block will be the difference between this pointer and the user block address:

  size_t size = end - header_user_area(first_available);E

Once we have the size, we just need to update the first available block:

  header_init(first_available, size, true);

Updating merged block pointers

With the block size updated, we need to adjust its pointers. For the backward pointer, it should now point to the block before the first available block:

  Header *previous = first_available->previous;

The next one should be the one after the last available block:

  Header *next = last_available->next;

Once we have these values, we just need to connect them properly using the function header_plug():

  header_plug(first_available, previous, next);

Edge case: the last block

One last thing to keep in mind is that the new block may become the last one. In this case, the block we created should now become the last block.

  if (last_available == last) {
    last = first_available;
  }
  return first_available;
}

Using `header_merge()` in `abfree()`

Now that we have a function that joins free blocks, we change it abfree()to call it in the header of the block to be deallocated. Once this is done, we proceed with freeing memory.

  Header *header = (Header*) ptr - 1;
  header = header_merge(header);

  if (header == last) {

Conclusion

Here is our complete abfree() function for now:

void abfree(void *ptr) {
  if (ptr == NULL) {
    return;
  }
  Header *header = (Header*) ptr - 1;

  header = header_merge(header);

  if (header == last) {
    while (header_previous_available(header)) {
      header = header->previous;
    }
    last = header->previous;
    if (last != NULL) {
      last->next = NULL;
    } else {
      first = NULL;
    }
    brk(header);
  } else {
    header->available = true;
  }
}

Header *header_new(Header *previous, size_t size, bool available) {
  Header *header = sbrk(sizeof(Header) + size);
  header_init(header, size, available);
  header_plug(header, previous, NULL);
  return header;
}

void header_init( Header *header, size_t size, bool available) {
  header->size = size;
  header->available = available;
}

void header_plug(Header *header, Header *previous, Header *next) {
  header->previous = previous;
  if (previous != NULL) {
    previous->next = header;
  }
  header->next = next;
  if (next != NULL) {
    next->previous = header;
  }
}

void *header_user_area(Header *header) {
  return header + 1;
}

void *header_address_after_block(Header *header) {
  return header_user_area(header) + header->size;
}

bool header_previous_available(Header *header) {
  return (header->previous != NULL) && header->previous->available;
}

bool header_next_available(Header *header) {
  return (header->next != NULL) && header->next->available;
}

Header *header_merge(Header *header) {
  Header *first_available = header;
  Header *last_available = header;

  if (header_previous_available(header)) {
    first_available = header->previous;
  }
  if (header_next_available(header)) {
    last_available = header->next;
  }

  if ((first_available == header) && (last_available == header)) {
    return header;
  }

  void *end = header_address_after_block(last_available);
  size_t size = end - header_user_area(first_available);
  header_init(first_available, size, true);

  Header *previous = first_available->previous;
  Header *next = last_available->next;
  header_plug(first_available, previous, next);

  if (last_available == last) {
    last = first_available;
  }

  return first_available;
}

Header *header_split(Header *header, size_t size) {
  size_t original_size = header->size;
  if (original_size > size + sizeof(Header)) {
    header->size = original_size - size - sizeof(Header);
    Header *new_header = header_address_after_block(header);
    header_init(new_header, size, true);
    header_plug(new_header, header, header->next);
    if (header == last) {
      last = new_header;
    }
    return new_header;
  } else {
    return header;
  }
}

With the changes made in this and the last post, the size of free blocks is no longer an aggravating factor in memory consumption, and our function malloc() is getting closer to being ready. However, there is one last problem to be solved: the alignment of memory blocks. We will examine this in the next
post .

Implementing malloc() and free() — splitting large blocks

When implementing malloc(), it is important to consider the size of the allocated blocks. For example, reusing blocks too big for smaller requests cause memory waste. How to solve that? Let’s see on this post of our series on malloc() and free().

In the previous post of this series, we saw how the order in which we choose memory blocks to reuse can lead to greater or lesser memory consumption, and we changed our functions to avoid this waste. But we need to solve another, even more serious, problem: sometimes, a very large memory block can occupy the space that several smaller blocks could use. Consider the case below, where we allocate a large chunk of memory, deallocate it, and then allocate two much smaller blocks:

void *ptr1 = abmalloc(128);
void *ptr2 = abmalloc(8);
abfree(ptr1);
void *ptr3 = abmalloc(8);
void *ptr4 = abmalloc(8);

Here, we have a free 128-byte memory block, and when we allocate a block of just 8 bytes, all 128 bytes become unavailable. When we allocate another 8-byte block, the heap needs to grow again. This is not an efficient use of memory.

There are at least two popular solutions for this case. One, more efficient, is to use bins: lists that group blocks by size. This is a more sophisticated and efficient approach, but more complex. Another option, simpler, is to find a large block and split it into smaller blocks. We’ll follow this approach.

But remember: simpler doesn’t exactly mean simple 😉

Initial Refactoring

Before we begin, let’s do a small refactoring. Currently, the header_new() function does two things: it allocates more memory for a new block and initializes its header, setting the metadata and pointers to the previous block. The part of initializing the header might be useful, so let’s extract it. We’ll create two new functions to improve readability:

The header_plug() function, which “plugs” the initialized block to the previous and next blocks.
The header_init() function, which sets the initial values of the block’s metadata (size and availability).

Here’s how they look:

void header_init(Header *header, size_t size, bool available) {
    header->size = size;
    header->available = available;
}

void header_plug(Header *header, Header *previous, Header *next) {
    header->previous = previous;
    if (previous != NULL) {
        previous->next = header;
    }
    header->next = next;
    if (next != NULL) {
        next->previous = header;
    }
}

Now, we just need to modify header_new() to use these new functions:

Header *header_new(Header *previous, size_t size, bool available) {
    Header *header = sbrk(sizeof(Header) + size);
    header_init(header, size, available);
    header_plug(header, previous, NULL);
    return header;
}

(Additionally, we can remove the line last->previous->next = last; from the abmalloc() function, since header_plug() now takes care of that.)

Splitting Blocks

With these tools in hand, let’s create the header_split() function. Given a header and a minimum required size, this function splits the memory block into two if the original block is large enough to contain

the required size,
a new header for the new block, and
a bit of extra memory.

First, we check if the block is large enough:

Header *header_split(Header *header, size_t size) {
    size_t original_size = header->size;
    if (original_size >= size + sizeof(Header)) {

If this condition is met, we split the block. First, we reduce the size of the current block by subtracting the size of a header and the space requested by abmalloc:

header->size = original_size - size - sizeof(Header);

This leaves a memory space after the current block, which we’ll use to create the new block. We calculate the pointer for this new block:

Header *new_header = header + sizeof(Header) + header->size;

Now that we have the pointer to the new block, we initialize its header with header_init():

header_init(new_header, size, true);

And we connect the new block to the previous and next blocks using header_plug():

header_plug(new_header, header, header->next);

If the original block was the last one, the new block will now be the last, so we update the last pointer:

if (header == last) {
    last = new_header;
}

Finally, we return the new block:

return new_header;

If the original block is not large enough, we simply return the original block:

} else {
    return header;
}
}

Updating `abmalloc()`

Now, we just need to go back to the abmalloc() function, and in the place where we find a usable block, we invoke header_split() to try to split it:

if (header->available && (header->size >= size)) {
    header = header_split(header, size);
    header->available = false;
    return header + 1;
}

If the block can be split, the new block will be returned. Otherwise, the original block will be kept and returned as before.

Note on Block Splitting

Notice that we created the new block at the end of the original block. We could have created it at the beginning, but by creating the new used block at the end, the new free block stays closer to older blocks. This way, it will be found first the next time abmalloc() is invoked.

Splitting large memory blocks is a step forward, but there’s an opposite problem: small memory blocks can cause fragmentation, making larger requests cause the heap to grow. We’ll see how to solve this in the next post.

Implementing malloc() and free() — old memory reused first

Suppose you have a linked list of blocks of memory that can be reused. Should you look for one to reuse from the beginning or the end? In this post, we have the answer, explain why and show how to implement it.

In the previous post in this series on implementing malloc()and free(), we showed how it is possible to reuse memory blocks and reduce the heap by freeing newer blocks. However, the current function introduces a subtle issue: it prioritizes reusing newer blocks, which can lead to increased memory consumption over time. Why does this happen? Let’s break it down.

Heap reduction by reusing recent blocks

Consider the following scenario. First, we allocate four memory blocks:

void *ptr1 = abmalloc(8); 
void *ptr2 = abmalloc(8); 
void *ptr3 = abmalloc(8); 
void *ptr4 = abmalloc(8);

The memory structure can be visualized like this:

Now, we release the first and third blocks…

abfree(ptr1); 
abfree(ptr3);

…resulting in the following structure:

Then we allocate another block of the same size:

void *ptr5 = abmalloc(8);

As the function abmalloc() starts searching for the most recent free block, it reuses the block at the top. If we now free the last block:

If we now release the last block…

abfree(ptr4);

…we can reduce the heap size by just one 8-byte block, since the previous block is no longer free:

Reuse of old blocks

Now, imagine the same scenario, but with one modification: our function starts searching for free blocks from the oldest one. The initial structure will be the same…

…and again we free the first and third memory blocks:

This time, the first block will be reused:

Now, when we free the last block, we will have two free blocks at the top, allowing us to reduce the heap by two 8-byte blocks:

This example illustrates how, by giving preference to newer blocks, we end up accumulating old unused blocks, wasting memory and leading to unnecessary heap growth. The solution is to modify the search strategy, prioritizing the reuse of older blocks.

Implementing preference for old blocks

To solve this problem, we will start by adding a pointer to the next block in the header. We will also create a global pointer to the first block, so we can start the search from it:

typedef struct Header { 
  struct Header *previous, *next; 
  size_t size; 
  bool available; 
} Header; 

Header *first = NULL; 
Header *last = NULL;

We will create memory blocks with headers in two different situations, so let’s make a small refactoring: we will extract this logic to a helper function that allocates and initializes the header (including setting the field nextwith NULL):

Header *header_new(Header *previous, size_t size, bool available) { 
  Header *header = sbrk(sizeof(Header) + size); 
  header->previous = previous; 
  header->next = NULL; 
  header->size = size; 
  header->available = false; 
  return header; 
}

With this new function, we can simplify the logic within abmalloc():

void *abmalloc(size_t size) { 
  if (size == 0) { 
    return NULL; 
  } 
  Header *header = last; 
  while (header != NULL) { 
    if (header->available && (header->size >= size)) { 
      header->available = false; 
      return header + 1; 
    } 
    header = header->previous; 
  } 
  last = header_new(last, size, false); 
  return last + 1; 
}

Now we have access to the first and last blocks, and given a block, we can find out the previous and next ones. We also know that when the pointer to the first block is null, no blocks have been allocated yet. So in this case, we will allocate the block immediately, and initialize both first and last:

void *abmalloc(size_t size) { 
  if (size == 0) { 
    return NULL; 
  } 
  if (first == NULL) { 
    first = last = header_new(NULL, size, false); 
    return first + 1; 
  }

If first is no longer NULL, there are already allocated blocks, so we will start searching for a reusable block. We will continue using the variable header as an iterator, but instead of starting with the most recent block, the search will start from the oldest:

  Header *header = first;

At each iteration, we will advance to the next block in the sequence, instead of going backwards to the previous block:

  while (header != NULL) { 
    if (header->available && (header->size >= size)) { 
      header->available = false; 
      return header + 1; 
    } 
    header = header->next; 
  }

The logic remains the same: if we find an available block of sufficient size, it is returned. Otherwise, if no reusable block is found after we traverse the list, a new block is allocated:

  last = header_new(last, size, false);

Now, we need to adjust the block that was the last one (after the allocation, the second to last). It pointed to NULL, but now it should point to the new block. To do this, we set the previous block’s next field to the new last block:

  last->previous->next = last; 
  return last + 1; 
}

Adjustments in the `abfree()` function

The function abfree() basically maintains the same structure, but now we must handle some edge cases. When we free blocks at the top of the heap, a new block becomes the last one, as we already do in this snippet:

    last = header->previous; 
    brk(header)

Here, the pointer header references the last non-null block available on the stack. We have two possible scenarios:

the current block has a previous block, which will become the new last block. In this case, we should set the pointer nextof this block to NULL.
the current block does not have a previous block (i.e., it is the first and oldest block). When it is freed, the stack is empty. In this case, instead of trying to update a field of a non-existent block, we simply set the variable first to NULL, indicating that there are no more allocated blocks.

Here is how we implement it:

  last = header->previous; 
  if (last != NULL) { 
    last->next = NULL; 
  } else { 
    first = NULL; 
  } 
  brk(header);

Conclusion

Our functions abmalloc() and abfree() now look like this:

        typedef struct Header {
  struct Header *previous, *next;
  size_t size;
  bool available;
} Header;

Header *first = NULL;
Header *last = NULL;

Header *header_new(Header *previous, size_t size, bool available) {
  Header *header = sbrk(sizeof(Header) + size);
  header->previous = previous;
  header->next = NULL;
  header->size = size;
  header->available = false;
  return header;
}

void *abmalloc(size_t size) {
  if (size == 0) {
    return NULL;
  }
  if (first == NULL) {
    first = last = header_new(NULL, size, false);
    return first + 1;
  }
  Header *header = first;
  while (header != NULL) {
    if (header->available && (header->size >= size)) {
      header->available = false;
      return header + 1;
    }
    header = header->next;
  }
  last = header_new(last, size, false);
  last->previous->next = last;
  return last + 1;
}

void abfree(void *ptr) {
  if (ptr == NULL) {
   return;
  }
  Header *header = (Header*) ptr - 1;
  if (header == last) {
    while ((header->previous != NULL) && header->previous->available) {
      header = header->previous;
    }
    last = header->previous;
    if (last != NULL) {
      last->next = NULL;
    } else {
      first = NULL;
    }
    brk(header);
  } else {
   header->available = true;
  }
 }Code language:  PHP  ( php )

This change allows us to save considerably more memory. There are, however, still problems to solve. For example, consider the following scenario: we request the allocation of a memory block of 8 bytes, and abmalloc()reuse a block of, say, 1024 bytes. There is clearly a waste.

We will see how to solve this in the next post.

(This post is a translation of Implementando malloc() e free() — memória antiga tem preferência, first published in Suspensão de Descrença.)

Error Handling in C with goto

In higher-level languages, the standard approach to handle errors are exceptions. C, however, does not have such feature. Even so, there is this pattern that simplifies handling scenarios where many operations have to be undone. And this uses the all too polemic goto command…

Recently, a discussion started on the Python Brasil mailing list about the reasons for using exceptions. At one point, a notably competent participant commented on how difficult it is to handle errors through function returns, as in C.

When you have a complex algorithm, each operation that can fail requires a series of ifs to check if the operation was successful. If the operation fails, you need to revert all previous operations to exit the algorithm without altering the program’s state.

Let’s look at an example. Suppose I have the following struct to represent arrays:

typedef struct {
    int size;
    int *array;
} array_t;

Now, I’m going to write a function that reads from a text file the number of elements to be placed in one of these arrays and then reads the elements themselves. This function will also allocate the array struct and the array itself. The problem is that this function is quite prone to errors, as we might fail to:

Open the given file;
Allocate the struct;
Read the number of elements from the given file, either due to input/output error or end of file;
Allocate memory to store the elements to be read;
Read one of the elements, either due to input/output error or end of file.

Complicated, right? Note that if we manage to open the file but fail to allocate the struct, we have to close the file; if we manage to open the file and allocate the struct but fail to read the number of elements from the file, we have to deallocate the struct and close the file; and so on. Thus, if we check all errors and adopt the tradition of returning NULL in case of an error, our function would look something like this:

array_t *readarray(const char *filename) {
    FILE *file;
    array_t *array;
    int i;

    file = fopen(filename, "r");
    if (file == NULL) return NULL;

    array = malloc(sizeof(array_t));
    if (array == NULL) {
        fclose(file);
        return NULL;
    }

    if (fscanf(file, "%d", &(array->size)) == EOF) {
        free(array);
        fclose(file);
        return NULL;
    }

    array->array = malloc(sizeof(int) * array->size);
    if (array->array == NULL) {
        free(array);
        fclose(file);
        return NULL;
    }

    for (i = 0; i < array->size; i++) {
        if (fscanf(file, "%d", array->array + i) == EOF) {
            free(array->array);
            free(array);
            fclose(file);
            return NULL;
        }
    }
    return array;
}

Indeed, quite laborious, and with a lot of repeated code…

Note, however, that there are two situations in the code above.

In one, when I have two operations to revert, I need to revert the last executed one first, and then the previous one. For example, when deallocating both the struct and the integer array, I need to deallocate the integer array first and then the struct. If I deallocate the struct first, I may not be able to deallocate the array later.
In the other situation, the order doesn’t matter. For example, if I am going to deallocate the struct and close the file, it doesn’t matter in which order I do it. This implies that I can also revert the last executed operation first and then the first operation.

What’s the point of this? Well, in practice, I’ve never seen a situation where I have to revert the first executed operation first, then the second, and so on. This means that, when performing the operations a(), b(), c(), etc., the “natural” way to revert them is to call the revert functions in reverse order, something like:

a();
b();
c();
/* ... */
revert_c();
revert_b();
revert_a();

Now comes the trick. In the code above, after each operation, we’ll place an if to check if it failed or not. If it failed, a goto will be executed to the revert function of the last successful operation:

a();
if (failed_a()) goto FAILED_A;
b();
if (failed_b()) goto FAILED_B;
c();
if (failed_c()) goto FAILED_C;
/* ... */
revert_c();
FAILED_C:
revert_b();
FAILED_B:
revert_a();
FAILED_A:
return;

If a() fails, the algorithm returns; if b() fails, the algorithm goes to FAILED_B:, reverts a() and returns; if c() fails, the algorithm goes to FAILED_C, reverts b(), reverts a(), and returns. Can you see the pattern?

If we apply this pattern to our readarray() function, the result will be something like:

array_t *readarray(const char *filename) {
    FILE *file;
    array_t *array;
    int i;

    file = fopen(filename, "r");
    if (file == NULL) goto FILE_ERROR;

    array = malloc(sizeof(array_t));
    if (array == NULL) goto ARRAY_ALLOC_ERROR;

    if (fscanf(file, "%d", &(array->size)) == EOF)
        goto SIZE_READ_ERROR;

    array->array = malloc(sizeof(int) * array->size);
    if (array->array == NULL) goto ARRAY_ARRAY_ALLOC_ERROR;

    for (i = 0; i < array->size; i++) {
        if (fscanf(file, "%d", array->array + i) == EOF)
            goto ARRAY_CONTENT_READ_ERROR;
    }
    return array;

    ARRAY_CONTENT_READ_ERROR:
    free(array->array);
    ARRAY_ARRAY_ALLOC_ERROR:
    SIZE_READ_ERROR:
    free(array);
    ARRAY_ALLOC_ERROR:
    fclose(file);
    FILE_ERROR:
    return NULL;
}

What are the advantages of this pattern? Well, it reduces the repetition of operation reversal code and separates the error handling code from the function logic. In fact, although I think exceptions are the best modern error handling method, for local error handling (within the function itself), I find this method much more practical.

(This post is a translation of Tratamento de errors em C com goto, originally published in Suspensão de Descrença.)

Implementing malloc() e free() — reducing the heap even more

In our journey implementing malloc() and free(), we learned to reuse memory blocks. Today, we will make a very simple optimization: reduce the heap size as much as possible.

This post is part of a series on implementing the malloc() and free() functions. In the previous article, we learned how to reuse memory blocks. It was a significant advancement, but there’s much more room for improvement.

One example is reducing the size of the heap, as explained in the first post. When we free the last memory block, we move the top of the heap to the end of the previous block. However, this previous block might also be free, as well as others. Consider the scenario below:

void *ptr1 = abmalloc(8);
void *ptr2 = abmalloc(8);
abfree(ptr1);
abfree(ptr2);

In this case, when we free the block pointed to by ptr2, we make ptr1 the last block. However, ptr1 is also free, so we could further reduce the heap size.

To achieve this, we’ll iterate over the pointers from the end of the list until there are no more free blocks. If the header of the received pointer points to the last block and the previous block is free, we move the header pointer to it. We repeat this process until we reach an available block whose previous block is in use (or NULL if it’s the first block). Then, we execute the heap reduction procedure:

if (header == last) {
  while ((header->previous != NULL) && header->previous->available) {
    header = header->previous;
  }
  last = header->previous;
  brk(header);
} else {

Now, though, we need to fix a bug in abfree(). According to the specification, the free() function should accept a null pointer and do nothing. However, if abfree() receives NULL, we will have a segmentation fault! Fortunately, it is easy to fix by adding a check at the beginning of the function:

void abfree(void *ptr) {
   if (ptr == NULL) {
     return;
   }
   Header *header = (Header*) ptr - 1;

So, here’s our abfree() function at the moment:

void abfree(void *ptr) {
   if (ptr == NULL) {
     return;
   }
   Header *header = (Header*) ptr - 1;
   if (header == last) {
     while ((header->previous != NULL) && header->previous->available) {
       header = header->previous;
     }
     last = header->previous;
     brk(header);
   } else {
     header->available = true;
   }
 }

Reducing the size of the heap is a simple optimization, but there are still challenges ahead. For example, the way we choose blocks to reuse can lead to larger than necessary heaps. We will why, and how to solve that, in the next post.

(This post is a translation of Implementando malloc() e free() — reduzindo ainda mais o heap, first published in Suspensão de Descrença.)

Implementing malloc() and free() — reusing memory blocks

Dynamic memory allocation is of no use if we cannot reuse freed memory, right? Proceeding with our implementation, we will make our malloc() function use freed blocks of memory when possible!

This post is part of a series on how to implement the malloc() and free() functions. In a previous article, we changed our functions to free up some memory blocks. However, this only occurred if the freed blocks were deallocated from newest to oldest.

This wouldn’t make much difference. Dynamically allocated memory rarely behaves like a stack, where the newest block is always deallocated first. The big advantage of dynamic memory allocation, after all, is that it doesn’t work like a stack.

To understand the limitations of our implementation, consider the code below:

void *ptr1 = abmalloc(8);
void *ptr2 = abmalloc(8);
abfree(ptr1);
void *ptr3 = abmalloc(8);

In the first line, we allocate eight bytes, and free them in the third line. In the last line, we allocate eight bytes again. However, we cannot reuse the freed memory. To truly save memory, we need a more sophisticated solution.

One option is to reuse free blocks. To do this, we add a Boolean field to the block header, called available, which will indicate whether the block is free. As a block can only be reused if the memory requested by abmalloc() is less than or equal to that available in the block, we also need a field in the header indicating the size of the block, which we will call size.

typedef struct Header {
  struct Header *previous;
  size_t size;
  bool available;
} Header;

When the block is allocated, the value of the available field must be false (since the block is not available). We also record the block size in the size field:

void *abmalloc(size_t size) {
  Header *header = sbrk(sizeof(Header) + size);
  header->previous = last;
  header->size = size;
  header->available = false;
  last = header;
  return last + 1;
}

We have the information in the header but we are not yet reusing deallocated memory. To reuse the available blocks, we need to find them! The algorithm for this is very simple: abmalloc() will start iterating over the blocks, starting from the last until reaching the first. Since the previous pointer of the first block is always NULL, we stop when we find such value:

void *abmalloc(size_t size) {
   Header *header = last;
   while (header != NULL) {
     header = header->previous;
   }

In each iteration, we check whether the block is available and has an acceptable size. If in the middle of this process we find an available block greater than or equal to what we need, we got lucky! Just mark the block as unavailable, and return it.

void *abmalloc(size_t size) {
   Header *header = last;
   while (header != NULL) {
     if (header->available && (header->size >= size)) {
       header->available = false;
       return header + 1;
     }
     header = header->previous;
   }

What if we don’t find a block that satisfies these conditions? In this case, the abmalloc() function increases the heap, as it used to do:

void *abmalloc(size_t size) {
  Header *header = last;
  while (header != NULL) {
    if (header->available && (header->size >= size)) {
      header->available = false;
      return header + 1;
    }
    header = header->previous;
  }
  header = sbrk(sizeof(Header) + size);
  header->previous = last;
  header->size = size;
  header->available = false;
  last = header;
  return last + 1;
}

When it comes to deallocating, we have two possible situations. If the block deallocated by abfree() is the last one, nothing changes: we move the top of the heap to the beginning of the block, we change the last pointer. But what if the block is not on top of the heap? We simply mark it as available, as can be seen in the else clause of the function below:

void abfree(void *ptr) {
   Header *header = (Header*) ptr - 1;
   if (header == last) {
     last = header->previous;
     brk(header);
   } else {
     header->available = true;
   }
 }

Reusing blocks of memory is a huge advance. However, we can be even more efficient in memory usage. For example, we only reduce the heap size if we deallocate the last block. If there are more unused blocks right before it, we could free them too. We will see how to do this in the next post.

(This post is a translation of Implementando malloc() and free() — reutilizando blocos de memória, originally published in Suspensão de Descrença.)

Implementing malloc() and free() — adding metadata to the memory blocks

When malloc() reserves blocks of memory, it needs to somehow make it able to unreserve them later, when free() is called. We fall short of any real solution for this in our last post. In this post, though, we take the first, most fundamental steps to bring real memory efficient to our implementations of malloc() and free()!

This post is part of a series on implementing the malloc() and free() functions. Previously, we implemented a rather simplistic approach that almost doesn’t free any memory: a pointer points to the last allocated block, enabling free() to deallocate it, but only it.

A better option is to make the last block point to the second-to-last, the second-to-last block to the third-to-last, and so on, forming a linked list. To achieve this, we create a struct that will serve as the header of the blocks, containing a pointer to the previous block:

typedef struct Header {
  struct Header *previous;
} Header;

Additionally, the pointer to the last block, which used to be void*, is now of type Header*:

Header *last = NULL;

To use these headers, abmalloc() reserves enough memory to store both the header and the requested size:

void *abmalloc(size_t size) {
  Header *header = sbrk(sizeof(Header) + size);

In this way, we use the beginning of the block to store necessary information, such as a pointer to the last allocated block before the new one:

  header->previous = last;

Then, we update last to point to the new block:

  last = header;

Finally, we return a pointer to the memory that the user can use. Since header points to the metadata, we cannot simply return it. Otherwise, all header information would be overwritten when the user used the pointer! Instead, we return a pointer to just after the header. This pointer is easy to calculate: it is the memory address of the header plus the size of the header:

  return header + 1;
}

Note how we increment the header pointer by 1. Since the pointer type is Header*, the increment is actually the number of bytes of the Header struct, not just one byte. The type of the pointer is very relevant in pointer arithmetic.

Now that our memory blocks have metadata at the beginning, we need to take this into account when deallocating. free() receives a pointer not to the start of the block but to the memory made available to the user. Therefore, we need to find the start of the block from the pointer the user passed. Nothing that a little pointer arithmetic can’t solve:

void abfree(void *ptr) {
  Header *header = (Header*) ptr - 1;

If header points to the last allocated block, the previous block will become the last. In this case, we can return memory from the heap to the operating system through brk():

  if (header == last) {
    last = header->previous;
    brk(header);
  }
}

Here are our new malloc() and free() functions:

typedef struct Header {
   struct Header *previous;
 } Header;

 Header *last = NULL;

 void *abmalloc(size_t size) {
   Header *header = sbrk(sizeof(Header) + size);
   header->previous = last;
   last = header;
   return header + 1;
 }

 void abfree(void *ptr) {
   Header *header = (Header*) ptr - 1;
   if (header == last) {
     last = header->previous;
     brk(header);
   }
 }

abmalloc() and abfree() may be slightly more memory-efficient now, but not by much. Dynamically allocated memory rarely behaves like a stack, where the oldest block is always deallocated first. In the next post, we will see how to use the memory of older blocks that are no longer in use.

(This post is a translation of Implementando malloc() e free() — adicionando metadados aos blocos de memória, from Suspensão de Descrença.)

Tiny Ticket Types

Tickets in Jira tend to accumulate redundant and optional fields, becoming complex and confusing. I like Jira, but I understand the frustration it causes.

A plausible solution could be inspired by software development. We programmers are used to finding massive source files, and we know that breaking them into smaller files drastically improves code comprehension. Therefore, inspired by coding best practices, I suggest creating smaller tickets.

Only three states

One way to limit the size of tickets is to simplify the workflow by restricting the number of states. For example, we can define that each type of ticket would have, at most, three states:

To do
In progress
Done

To represent other stages, we can create new types of tickets, such as sub-tasks.

A moderately complex ticket type

Let’s look at an example. Consider the ticket below:

Key: XYZ-1234. Status: Testing. Title: Nasal demons. Description: Calling free() on a previosly dealocaded pointer results in demons coming out of the nose. Technical analysis: The root cause is an undefined behavior. Test results: The patch does not work, now ghosts pop out of the user’s ears. Release date: 2023-12-22

It would follow this workflow:

Open ⇨ To do ⇨ In Analysis ⇨ Doing ⇨ Testing ⇨ Release ⇨ Done

How could we reduce the number of phases?

We can start by removing the “In Analysis” stage. In its place, we create a new type of ticket called “Technical Analysis.” This way, the original task remains in progress (“Doing”) while the technical analysis is underway.

Fewer fields in a ticket

An advantage of this would be transferring fields to sub-tasks. Fields that would clutter the original ticket can appear only in tasks where they are relevant.

Consider the “Release date” field, which only makes sense in the “Release” phase. If developers, testers, etc., are not responsible for the release, this field is confusing and pollutes the original task. With a new task type called “Release,” this field would be in the most appropriate place, keeping the original ticket concise.

Repeating stages without regressing

Another advantage is that the original ticket can go through the same stage multiple times. It’s common for a ticket to have a development phase followed by quality tests, for example. However, if a problem arises in the evaluation, it’s not advisable to revert to the development phase. How to deal with this?

By working with sub-tasks, we can mark validation as completed and create a new implementation ticket. In our ticket, for example, we can remove the “Testing” phase and create a sub-task of type “Test,” as well as another one called “Development.” Every time the test fails, we close testing and open a new development task.

Result

Following this strategy, our ticket would look like this:

And the workflow would be much simpler:

Naturally, this strategy is flexible. In our case, for example, we haven’t removed the “To do” phase yet. Restricting it to five (including backlog and validation) is another possibility. The core idea is to limit the number of stages to a small value for all tickets.

Conclusions

In programming, it’s common to encounter the so-called “God objects,” huge objects responsible for various different functions. Breaking them down is a surefire way to achieve code quality. Therefore, I suspect the same principle can apply to tickets in Jira.

I’m not a project manager, but as a programmer, I believe that limiting the size and steps of tickets can be an effective idea. I’m curious to know if anyone has tried this and how it went.

Implementing malloc() and free() — first steps

Following the wonderful journey that is reading Crafting Interpreters, I reached the point where we implemented an interpreter in C! As always, Bob Nystrom mercilessly proposes very interesting challenges that keep us busy for long periods. For instance, in this chapter, he suggests implementing our own memory allocator, without any real need! Inevitably, I was nerdsniped.

The challenge allows us to allocate a large memory region with an existing malloc() function and manage it, but I decided to implement the malloc() from scratch. Since I use Ubuntu, it was necessary to first understand the memory layout of a process on Linux better.

Consider the diagram below, which represents the memory layout of a process.

In the memory allocated for the process, there are various sections. When the program starts its execution, the shaded part is not yet in use. Throughout its execution, the program declares local variables, causing the stack to grow backward.

On the other hand, dynamically allocated memory is obtained from the heap, which grows in the opposite direction. The popular way to expand the heap is by increasing the size of the data segment (i.e., the section that contains global and static variables) with the sbrk() system call.

Diagram representing how srbk() works, by increasing the data segment pointer but returning the old value.

The above diagram illustrates how this functional system call works. sbrk() takes an integer parameter that will be added to the pointer indicating the end of the data segment. After that, sbrk() returns the value of the pointer before the increment.

In a way, the behavior of sbrk() is already sufficient for memory allocation. Our malloc() function can simply invoke sbrk() and return to the user the pointer to the beginning of the allocated memory block:

void *abmalloc(size_t size) {
   return sbrk(size);
}

In principle, free() doesn’t need to do anything: since in this implementation, we always use memory from the top of the heap, there is nothing we can do to reuse older memory blocks. In that sense, free() can perfectly be a no-op:

void abfree(void *ptr) {
}

A useful operation can be done, however, if the block to be freed is the last one allocated. This means it is at the top of the stack, so we just need to move the stack pointer back with the brk() system call. This syscall takes a pointer as a parameter and, if this pointer is a “reasonable” value (not null, does not point into the stack, does not point before the heap), it uses the pointer’s value as the new top of the heap. The result would be something like this:

void abfree(void *ptr) {
  if (ptr == last_block) {
      brk(last_block);
  }
}

This deallocation, however, is practically useless. Consider the example below:

void *ptr1 = abmalloc(8);
void *ptr2 = abmalloc(8);
abfree(ptr2);
abfree(ptr1);

With the current version of abfree(), we can free the memory pointed to by ptr1, but not the one pointed to by ptr2. To be able to free ptr2, it would be necessary to know that, once ptr1 has been deallocated, the next last block is ptr2. Could we create a second_last_block variable? It wouldn’t help: we would have the same problem with the penultimate block, and so on.

We need a more powerful data structure here, and that’s what we’ll see in our next post.

(This post is a translation of Implementando malloc() e free() — primeiros passos, originally published in Suspensão de Descrença.)

What is memory alignment?

Ensuring allocated blocks’ alignment

A question and a perk

Conclusion

How small blocks can increase memory consumption

Refactoring for reuse and readability

Searching for adjacent free blocks

When there are no free blocks

Updating size when there are free blocks

Updating merged block pointers

Edge case: the last block

Using header_merge() in abfree()

Conclusion

Initial Refactoring

Splitting Blocks

Updating abmalloc()

Note on Block Splitting

Heap reduction by reusing recent blocks

Reuse of old blocks

Implementing preference for old blocks

Adjustments in the abfree() function

Conclusion

Only three states

A moderately complex ticket type

Fewer fields in a ticket

Repeating stages without regressing

Result

Conclusions

Using `header_merge()` in `abfree()`

Updating `abmalloc()`

Adjustments in the `abfree()` function