glibc’s heap manager functions (malloc
, free
, etc) uses “arenas” to coordinate heap usage. There are two types of arenas: main and non-main (for threads). The main arena manages the process’s “main” heap—the usual kind that is allocated on program start-up. On the other hand, a process may also have 0 or more non-main arenas owned by non-main threads, whose heaps are allocated through mmap
’s. This speeds up multithreading performance, since threads no longer use the same global mutex to allocate memory and could instead use their own arenas/heaps. Main arena’s state is stored as a global variable, whereas other arenas are stored in the heap segment.
Number of arenas per process
The number of arenas in a process is by default capped at for 32-bit systems and for 64-bit systems (source), though this limit is configurable. As a result, even though each thread are supposed to spawn their own arena, it might end up sharing an existing arena if there are too many threads and the program reaches the cap.
NOTE
By default,
INTERNAL_SIZE_T
mentioned below has the same size assize_t
(i.e., pointer size).
struct malloc_state
{
/* Serialize access. */
__libc_lock_define (, mutex);
/* Flags (formerly in max_fast). */
int flags;
/* Fastbins */
mfastbinptr fastbinsY[NFASTBINS];
/* Base of the topmost chunk -- not otherwise kept in a bin */
mchunkptr top;
/* The remainder from the most recent split of a small request */
mchunkptr last_remainder;
/* Normal bins packed as described above */
mchunkptr bins[NBINS * 2 - 2];
/* Bitmap of bins */
unsigned int binmap[BINMAPSIZE];
/* Linked list */
struct malloc_state *next;
/* Linked list for free arenas. Access to this field is serialized
by free_list_lock in arena.c. */
struct malloc_state *next_free;
/* Number of threads attached to this arena. 0 if the arena is on
the free list. Access to this field is serialized by
free_list_lock in arena.c. */
INTERNAL_SIZE_T attached_threads;
/* Memory allocated from the system in this arena. */
INTERNAL_SIZE_T system_mem;
INTERNAL_SIZE_T max_system_mem;
};
We can see at the top that concurrent accesses to the arena are serialized via a mutex, since heaps are shared by all threads.
So how can we use the arena to allocate chunks? Naively, we can just create a new chunk using unallocated heap memory, but we can also recycle free chunks. Within an arena, free chunks are categorized and tracked via an array of bins, which are (doubly) linked lists of free chunks (this is why free chunks hold forward and backward pointers whereas allocated chunks just hold user data).
There are four types of bins: fast, unsorted, small, large bins, and thread-local tcache.
There are also special chunks directly managed by the arena:
- The last remainder chunk is the unused chunk obtained from a split. When the heap does not have an exact sized free chunk for a small allocation request, a larger free chunk is split to accommodate the request, and the left-over is set as the last remainder chunk to improve cache hit rate. This is stored in the unordered bin.
- Top chunk is a large region of unallocated memory at the top of the heap (i.e., at high address, since
malloc
starts from low address). This is the last resort themalloc
uses to allocate data, i.e., when there are no existing free chunks that satisfy allocation size requirement. If the top chunk still does not satisfy the size requirement, the runtime invokessbrk()
to expand the heap or simply returnNULL
when memory is exhausted.