Malloc.c and memory allocation in UNIX

This is an extract from a much longer annotated UNIX 6 source code post, about the specific file malloc.c, and the two functions malloc and mfree.

The purpose of these functions is to manage the memory (RAM or SWAP) available in the machine, as other programs and routines ask for it. I was actually quite surprised at how simple it is. I wasn’t expecting to be able to understand it at all. But both functions are very short and comprehensible.

The signature of the functions are int malloc(map* mp, int size) and void mfree(map* mp, int size, int aa)1 1 Note, UNIX 6 is written in very old style C. I’ve updated a lot of the source code to be more comprehensible to modern eyes. I’ve also expanded a lot of the argument and local variable names to be more descriptive. . The first challenge is figuring out what a ‘map’ is. Looking through the rest of the code, malloc is called with either coremap or swapmap as the first argument. In the headers these are declared as int coremap[100] and int swapmap[100]. So arrays of integers, one for RAM (core) and one for SWAP.

But you also, in malloc.c itself, have the map struct:

struct map {
    char* m_size;
    char* m_addr;
};

So now a map is a struct of two char pointers, not an array of integers? This was hard for me, as someone who doesn’t do much low level, memory managed programming, but these are the same thing.

First, an int is a 16bit number. A char is an 8bit number. So an int can plausibly be interpreted as two consecutive char. An ‘array of integers’ is a block of contiguous integers. But in another sense an ‘array’ is just the address of the first element of that block. So an array of integers is also a int*, or pointer to an integer. Put both of these together and you get to the conclusion that a struct of two char* is the same as a int[].

Fine, but what is a map? It took me a bit of thinking, but what it is is a table of blocks, where a block is a ‘block of available memory’ described by a size and a starting address:

el | size | address 
---|------|--------
...| ...  |   ...
 4 |  10  |   100
 5 |  50  |   500
 6 |   0  |   600 <- note size==0 acts as 'sentinal'

The table has three rules: 1. the addresses of the blocks must be ordered ascending 2. There must be no ‘empty’ entries (size=0) in the middle of the table 3. There must be no contiguous blocks, where the end address of block X is the start address of block X+1

When a program requests memory of a certain size, malloc looks at this table, finds the first block with enough space, and returns the address of that block, updating the table to reflect that the memory is no longer available. When a program says it doesn’t need that memory any more, mfree creates a new entry in the table, reflecting that this memory is now available again. There are some nuances, which we’ll get into.

malloc

When asked for a block of memory of a certain size, malloc finds the address of a piece of memory that fits the bill, and returns that address.

The function looks through the table for a block of adequate size. That is, one where the size which the caller is requesting is smaller than or equal to the size of the block available 2 2 For this reason, the allocation algorithm is known as ‘first fit’. . It either finds one, in which case it returns a pointer to that address, or it reaches the end of the maps without finding a block, and returns zero.

When a suitable block has been found, the address of the block is increased by the requested size, and the size of the block is reduced by the requested size. So if there was a request for 100 bytes, and the block was size 200 at address 10000, the new block would be size 100 at address 10100.

There is an important last step: If, by allocating the new memory, the size of the block is now zero, then the block/row in the table is ‘deleted’, and everything gets ‘shifted up’. This ensures that there are no ‘hanging’ empty blocks in the middle of the table, which would cause the algorithm to fail to allocate prematurely. An illustration of this is below the code.

int malloc(map* mem_table, int size) {
    int return_pointer;
    struct map* block;

    for (block = mem_table; block->m_size > 0; block++) { 
        if (block->m_size >= size) {
            return_pointer = block->m_addr;
            block->m_addr =+ size;
            block->m_size =- size;
            if (block->m_size == 0)
                do {
                    block++;
                    (block-1)->m_addr = block->m_addr;
                    (block-1)->m_size = block->m_size;
                } while ((block-1)->m_size > 0);

            return(return_pointer);
        }
    }
    return(0);
}

The illustration of the ‘table shifting’:

el | size | adk 
---|------|----
...| ...  | ...
 4 |  10  | 100 <- before request for 10
 5 |  50  | 500
 6 |   0  | 600 <- note size==0 acts as 'sentinal'

el | size | add 
---|------|----
...| ...  | ...
 4 |   0  | 110 <- after request for 10, size is now 0
 5 |  50  | 500
 6 |   0  | 600 

el | size | add 
---|------|----
...| ...  | ...
 4 |   0  | 110
 5 |  50  | 500 <- block now points here (block++)
 6 |   0  | 600 


el | size | add 
---|------|----
...| ...  | ...
 4 |  50  | 500 <- (block-1)->m_addr = block->m_addr, (block-1)->m_size = block->m_size
 5 |  50  | 500 <- block still points here
 6 |   0  | 600 

(block-1)->m_size != 0, so we continue

el | size | add 
---|------|----
...| ...  | ...
 4 |  50  | 500
 5 |  50  | 500
 6 |   0  | 600 <- block++ now points here

el | size | add 
---|------|----
...| ...  | ...
 4 |  50  | 500
 5 |   0  | 600 <- (block-1)->m_addr = block->m_addr, (block-1)->m_size = block->m_size
 6 |   0  | 600 <- block still points here

(block-1)->m_size == 0, so we are done

mfree

When freeing memory, the table, the size, and the address is passed in. The function then creates a new entry in the table with those values. However, doing this is a recipe for many tiny blocks of memory that can’t actually be used. We need a mechanism for merging blocks if they are contiguous.

Consider the case mfree(table, 50, 1000) in the context of the table

el | size | add 
---|------|----
...| ...  | ...
 4 |  50  |  500
 5 |  50  |  950
 6 |  50  | 1050

el | size | add 
---|------|----
...| ...  | ...
 4 |  50  |  500
 5 |  50  |  950
 x |  50  | 1000 <- We want to insert here
 6 |  50  | 1050

Here, we have 3 contiguous blocks: 5, x and 6. We want to merge these blocks into one, to end up with

el | size | add 
---|------|----
...| ...  | ...
 4 |  50  |  500
 5 | 150  |  950
...| ...  | ...

The algorithm is for freeing a block X is: 1. Find the first block in the table where the address is greater than the start address of X. 2. Look at the block before this. Call this block A. If the block X is contiguous with A (A.addr + A.size == X.addr), then instead of inserting a new block for X, we want to extend block A. 3. Next we look at the block we got in step 1. Call this block B. If X is contiguous with A and B, we need to merge our newly extended A with B. This is the same ‘shift up’ operation we saw in malloc 4. If X is not contiguous with A, but is contiguous with B, just extend B backwards 5. Is X is not contiguous with A or B, we need to ‘insert’ a row into the table.

void mfree(map* mem_table, int size, int addr_to_free) {
    struct map* block;
    int temp_address;

    // Step 1
    for (block = mem_table; block->m_addr<=addr_to_free && block->m_size!=0; block++);

    // Step 2
    if (block > mem_table && (block-1)->m_addr+(block-1)->m_size == addr_to_free) {
        (block-1)->m_size =+ size;
        // Step 3
        if (addr_to_free+size == block->m_addr) {
           (block-1)->m_size =+ block->m_size;
           while (block->m_size > 0) {
              block++;
              (block-1)->m_addr = block->m_addr;
              (block-1)->m_size = block->m_size;
           }
        }
    } else {
        // Step 4
        if (addr_to_free+size == block->m_addr && block->m_size > 0) {
            block->m_addr =- size;
            block->m_size =+ size;
        } else if (size > 0) {
            // Step 5
            do {
                temp_address = block->m_addr;
                block->m_addr = addr_to_free;
                addr_to_free = temp_address;
                temp_address = block->m_size;
                block->m_size = size;
                block++;
            } while (size = temp_address);
        }
    }
}