Following the wonderful journey that is reading Crafting Interpreters, I reached the point where we implemented an interpreter in C! As always, Bob Nystrom mercilessly proposes very interesting challenges that keep us busy for long periods. For instance, in this chapter, he suggests implementing our own memory allocator, without any real need! Inevitably, I was nerdsniped.
The challenge allows us to allocate a large memory region with an existing malloc()
function and manage it, but I decided to implement the malloc()
from scratch. Since I use Ubuntu, it was necessary to first understand the memory layout of a process on Linux better.
Consider the diagram below, which represents the memory layout of a process.
In the memory allocated for the process, there are various sections. When the program starts its execution, the shaded part is not yet in use. Throughout its execution, the program declares local variables, causing the stack to grow backward.
On the other hand, dynamically allocated memory is obtained from the heap, which grows in the opposite direction. The popular way to expand the heap is by increasing the size of the data segment (i.e., the section that contains global and static variables) with the sbrk()
system call.
The above diagram illustrates how this functional system call works. sbrk()
takes an integer parameter that will be added to the pointer indicating the end of the data segment. After that, sbrk()
returns the value of the pointer before the increment.
In a way, the behavior of sbrk()
is already sufficient for memory allocation. Our malloc()
function can simply invoke sbrk()
and return to the user the pointer to the beginning of the allocated memory block:
void *abmalloc(size_t size) {
return sbrk(size);
}
In principle, free()
doesn’t need to do anything: since in this implementation, we always use memory from the top of the heap, there is nothing we can do to reuse older memory blocks. In that sense, free()
can perfectly be a no-op:
void abfree(void *ptr) {
}
A useful operation can be done, however, if the block to be freed is the last one allocated. This means it is at the top of the stack, so we just need to move the stack pointer back with the brk()
system call. This syscall takes a pointer as a parameter and, if this pointer is a “reasonable” value (not null, does not point into the stack, does not point before the heap), it uses the pointer’s value as the new top of the heap. The result would be something like this:
void abfree(void *ptr) {
if (ptr == last_block) {
brk(last_block);
}
}
This deallocation, however, is practically useless. Consider the example below:
void *ptr1 = abmalloc(8);
void *ptr2 = abmalloc(8);
abfree(ptr2);
abfree(ptr1);
With the current version of abfree()
, we can free the memory pointed to by ptr1
, but not the one pointed to by ptr2
. To be able to free ptr2
, it would be necessary to know that, once ptr1
has been deallocated, the next last block is ptr2
. Could we create a second_last_block
variable? It wouldn’t help: we would have the same problem with the penultimate block, and so on.
We need a more powerful data structure here, and that’s what we’ll see in our next post.
(This post is a translation of Implementando malloc() e free() — primeiros passos, originally published in Suspensão de Descrença.)