Implementing malloc() and free() — memory alignment

Our implementation of malloc() and free() has progressed significantly. In the last post, we saw, for example, how to unify smaller memory blocks to avoid fragmentation. To avoid even more problems in this area, we’re going to address an important performance detail that we’ve been neglecting for a while: memory alignment.

What is memory alignment?

Modern computers read fixed-size memory blocks, corresponding to the processor’s word size. A 32-bit processor, for example, reads four bytes at a time, while a 64-bit processor reads eight. However, this reading only occurs from an aligned address—that is, a 32-bit processor can only read its four bytes if the memory address is a multiple of four.

If the address is not a multiple of the word size, the processor will need to access memory twice: first to fetch the first part of the word and then for the second. This becomes even worse if the allocator returns misaligned pointers, as all subsequent nodes will also be misaligned, doubling the number of memory accesses on the heap.

Ensuring allocated blocks’ alignment

To avoid this problem, it is necessary to allocate a multiple of the processor’s word size. However, determining this size can be complicated. A simple and effective solution for our case is to adopt multiples of 8 bytes, since the most common bus currently is 64 bits. Thus, all allocated block addresses must be multiples of 8. Let’s then define a constant for the alignment size:

define _ABMALLOC_ALIGNMENT_SIZE 8

In the abmalloc() function, we check if the requested size is aligned, that is, if it is a multiple of _ABMALLOC_ALIGNMENT_SIZE. If it is already aligned, we don’t need to do anything:

void *abmalloc(size_t size) {
  if (size == 0) {
    return NULL;
  }
  size_t rest = size % _ABMALLOC_ALIGNMENT_SIZE;

If the requested size is not aligned, we need to increase it. First, we calculate the remainder of dividing the requested size by _ABMALLOC_ALIGNMENT_SIZE and subtract this value from the originally requested size. Then, we add _ABMALLOC_ALIGNMENT_SIZE. This ensures that the new size is the smallest possible multiple of the required alignment.

if (rest != 0) {
  size = size - rest + _ABMALLOC_ALIGNMENT_SIZE;
}

Done! Now, abmalloc() always returns aligned addresses.

A question and a perk

A valid question is: won’t this result in wasted memory? On platforms with 32-bit buses, for example, this could lead us to allocate 7 unused bytes when we could allocate only 3 unused bytes. Here, we assume this isn’t a problem, but there’s a relatively simple solution: just change the value of the constant to 4, for example, and recompile our code.

This adjustment brings another advantage: very small blocks cease to be allocated. For example, if someone requests a block of only 1 byte, it would hardly be reused, since a single byte is rarely needed. (Ideally, such small blocks should never be allocated, but this is a user decision.) With a minimum size of 8 bytes, these smaller blocks, when released, will have a greater chance of being reused.

Conclusion

With that, we conclude our experiment. This is not a very good implementation of malloc(). For starters, it’s not thread-safe. Furthermore, modern implementations utilize much more efficient techniques and heuristics, such as binning. There are also some very interesting advancements in research, such as Mesh.

Our goal was simply to learn a little more, through practice and gradually evolving code, about how memory allocation works. Let us know in the comments if you gained a better understanding of the topic! And if you’d like to explore further, you can check out the final result in the GitHub repository.

Implementing malloc() and free() — first steps

Following the wonderful journey that is reading Crafting Interpreters, I reached the point where we implemented an interpreter in C! As always, Bob Nystrom mercilessly proposes very interesting challenges that keep us busy for long periods. For instance, in this chapter, he suggests implementing our own memory allocator, without any real need! Inevitably, I was nerdsniped.

The challenge allows us to allocate a large memory region with an existing malloc() function and manage it, but I decided to implement the malloc() from scratch. Since I use Ubuntu, it was necessary to first understand the memory layout of a process on Linux better.

Consider the diagram below, which represents the memory layout of a process.

In the memory allocated for the process, there are various sections. When the program starts its execution, the shaded part is not yet in use. Throughout its execution, the program declares local variables, causing the stack to grow backward.

On the other hand, dynamically allocated memory is obtained from the heap, which grows in the opposite direction. The popular way to expand the heap is by increasing the size of the data segment (i.e., the section that contains global and static variables) with the sbrk() system call.

Diagram representing how srbk() works, by increasing the data segment pointer but returning the old value.

The above diagram illustrates how this functional system call works. sbrk() takes an integer parameter that will be added to the pointer indicating the end of the data segment. After that, sbrk() returns the value of the pointer before the increment.

In a way, the behavior of sbrk() is already sufficient for memory allocation. Our malloc() function can simply invoke sbrk() and return to the user the pointer to the beginning of the allocated memory block:

void *abmalloc(size_t size) {
   return sbrk(size);
}

In principle, free() doesn’t need to do anything: since in this implementation, we always use memory from the top of the heap, there is nothing we can do to reuse older memory blocks. In that sense, free() can perfectly be a no-op:

void abfree(void *ptr) {
}

A useful operation can be done, however, if the block to be freed is the last one allocated. This means it is at the top of the stack, so we just need to move the stack pointer back with the brk() system call. This syscall takes a pointer as a parameter and, if this pointer is a “reasonable” value (not null, does not point into the stack, does not point before the heap), it uses the pointer’s value as the new top of the heap. The result would be something like this:

void abfree(void *ptr) {
  if (ptr == last_block) {
      brk(last_block);
  }
}

This deallocation, however, is practically useless. Consider the example below:

void *ptr1 = abmalloc(8);
void *ptr2 = abmalloc(8);
abfree(ptr2);
abfree(ptr1);

With the current version of abfree(), we can free the memory pointed to by ptr1, but not the one pointed to by ptr2. To be able to free ptr2, it would be necessary to know that, once ptr1 has been deallocated, the next last block is ptr2. Could we create a second_last_block variable? It wouldn’t help: we would have the same problem with the penultimate block, and so on.

We need a more powerful data structure here, and that’s what we’ll see in our next post.

(This post is a translation of Implementando malloc() e free() — primeiros passos, originally published in Suspensão de Descrença.)