Slice Internals in Go: How the Runtime Expands Slices Efficiently

The deeper you delve into Go’s internals, the more evident it becomes that its creators carefully engineered the language to strike a precise balance between performance and flexibility. This delicate equilibrium influences many of Go’s core features, including its approach to memory management and data structures. One standout example of this thoughtful design is the implementation of slice growth. Through this approach, Go ensures that slices expand seamlessly, optimizing both performance and memory usage without compromising ease of use.

Slice Internals

In Go, a slice is a lightweight data structure that serves as a window into a contiguous block of memory where elements of a specific type are stored. At its core, a slice doesn’t directly contain the data itself but instead holds a pointer to an underlying array (know as the backing array.)

s := make([]byte, 0, 10)

When the Go runtime creates a slice, as in this example, it constructs a small struct under the hood, defined in the runtime as slice:

type slice struct {
	array unsafe.Pointer
	len   int
	cap   int
}

The slice type has the following fields:

array holds a pointer to the underlying array.
len stores the number of elements in the slice.
cap stores the capacity of the array.

The unsafe.Pointer type used for the array field is a generic pointer type that bypasses Go's type safety rules. Since the size of an array in Go is part of the type, the runtime uses unsafe.Pointer so it can replace the array with a larger one when needed.

The code in the example above creates a slice that has zero elements with a backing array that can hold 10 elements of type byte. We can later fill that array with elements by using the append function:

s = append(s, 1)

Here, we append a byte to the empty array represented by the unsigned 8-bit integer 1.

Growing the array

When appending elements to a slice, the Go runtime first checks whether the backing array has enough capacity to accommodate the new elements. If it does, the elements are simply added to the existing array. However, if the current array lacks sufficient space, the runtime allocates a larger backing array, copies the existing elements into it, and then appends the new elements.

s := make([]byte, 0, 1)

s = append(s, 1) // no growth
s = append(s, 1) // growth

The capacity of the newly allocated array is determined by several factors, including the current array’s capacity, the type of elements it holds, and the number of new elements being appended. These factors influence how much the array grows, ensuring efficient memory usage while minimizing the need for frequent reallocations.

The runtime begins by attempting to double the existing capacity as the first step in determining the new array size:

newCapacity := oldCapacity * 2

If the total number of existing and newly appended elements exceeds the doubled capacity, the runtime sets the new capacity to match the required number of elements:

if newLength > newCapacity {
    newCapacity = newLength
}

This ensures that the new capacity is larger than or equals to the number of elements after the appending operation.

s := make([]int, 0, 1)

s = append(s, 1, 2, 3)

fmt.Println(cap(s)) // 3

In this example, the integer slice initially had a capacity of 1. After adding three elements, the runtime allocated a new backing array with a capacity of 3. This happened because doubling the original capacity (1 * 2) was insufficient to accommodate the new elements, prompting the runtime to adjust the capacity accordingly.

Growth Factor

If doubling the capacity is sufficient, the runtime further evaluates whether allocating such a large array is efficient or merely a waste of memory.

For small slices, capacity less than 256, the runtime employs a simple doubling strategy (e.g., 2 to 4, 4 to 8, 8 to 16.) This makes sense for small workloads: doubling ensures plenty of headroom for future appends. However, as the slice’s capacity climbs into more than 256, or beyond, doubling becomes less practical. Doubling a capacity of, say, 10,000 to 20,000 allocates an extra 10,000 elements’ worth of memory (potentially tens or hundreds of kilobytes, depending on the element size) which might sit unused for a long time.

To address this, the runtime adjusts its growth strategy for larger slices by reducing the growth factor gradually until it reaches 1.25. This slower growth means that if a slice already has a capacity of, say, 512, adding a few elements doesn’t balloon it to 1024; it might rise to 832 instead (a 62.5% increase). The key insight is that a larger slice can absorb more appends before hitting its capacity limit. For instance, a slice with a capacity of 512 has room for 512 more elements if empty, compared to just 8 for a capacity of 8. This naturally delays the need for reallocation.

This conservative approach aims to curb excessive memory usage. By growing incrementally rather than exponentially, the runtime avoids reserving vast swaths of memory that might remain idle, which is critical in applications handling large datasets or with limited resources (e.g., embedded systems). However, there’s a flip side: smaller growth steps mean the slice fills up sooner, triggering reallocation more often. Each reallocation involves CPU work (allocating memory, copying the existing elements, and updating the slice’s pointer) which can add up if appends are frequent.

The runtime’s strategy thus balances these two forces: memory footprint versus CPU overhead. It leans toward saving memory at the cost of potentially more frequent (but smaller) reallocations, betting that the trade-off pays off in most real-world scenarios where slices don’t grow indefinitely.

The Type of Elements

When determining how a slice should grow, the Go runtime takes into account the type of elements stored in the array, as this directly impacts memory allocation. On 64-bit systems, memory is generally allocated in chunks of 8 bytes. Any allocation that does not align with this rule is rounded up to the nearest multiple of 8 to ensure efficient memory usage and alignment.

Let's say we create a slice of bytes with capacity zero and then append an element to it:

s := make([]byte, 0, 0)

s = append(s, 1)

fmt.Println(cap(s)) // 8

After the growth, the capacity of the slice becomes 8 (8 bytes). If the element type was a 64-bit integer instead, the growth will increase the capacity to 1 (1 * 64 bits = 8 bytes):

s := make([]int64, 0, 0)

s = append(s, 1)

fmt.Println(cap(s)) // 1

If the element type was a 32-bit integer, the growth will increase the capacity to 2 (2 * 32 bits = 8 bytes):

s := make([]int32, 0, 0)

s = append(s, 1)

fmt.Println(cap(s)) // 2

The reason is that modern CPUs, particularly on 64-bit systems, operate most efficiently when data is aligned to their word size (the amount of data they can process in one cycle.) On a 64-bit system, the word size is 64 bits, or 8 bytes. If the runtime allocates, say, 5 bytes, the CPU still fetches 8 bytes and masks off the unused portion. That means, allocating in 8-byte multiples ensures the entire chunk is usable without waste or extra work.

In addition, CPU caches fetch memory in 64-byte lines (8 words of 8 bytes each.) Multiples of 8 bytes fit neatly into these lines, reducing cache misses and improving locality when accessing sequential data, like a slice’s backing array.

Allocation Classes

In addition to the 8-byte chunk allocation rule, the Go runtime maintains a table of predefined constants to guide its memory allocation decisions. This table categorizes memory allocations into specific size classes, helping the runtime minimize fragmentation and efficiently reuse freed memory blocks.

The table looks like this:

+---------+-------+
| Class   | Value |
+---------+-------+
| Class 1 |  8    |
| Class 2 |  16   |
| Class 3 |  24   |
| Class 4 |  32   |
| Class 5 |  48   |
| Class 6 |  64   |
| Class 7 |  80   |
| Class 8 |  96   |
| Class 9 | 112   |
| ...     | ...   |
+---------+-------+

This size class allocation table functions as an efficient lookup mechanism for managing memory allocation and deallocation. When a memory block belonging to a specific size class is freed, the runtime stores it in the table rather than immediately returning it to the operating system. Later, if a request is made for a memory block of the same size class, the system can quickly retrieve and reuse the previously freed block instead of performing an extensive search through physical memory to find a suitable allocation.

+---------------------------------+
|  Freed memory for size class X  |
+---------------------------------+
        │
        ▼
+-------------------+  
| Memory Block 1    |  <-- Freed Block (Stored in Table)
+-------------------+  
| Memory Block 2    |  <-- Freed Block (Stored in Table)
+-------------------+  
| Memory Block 3    |  <-- Freed Block (Stored in Table)
+-------------------+  
        │
        ▼
+-------------------------------------+
| Incoming Memory Allocation Request  |
+-------------------------------------+
        │
        ▼  (Lookup in Table)
+-------------------------------+
| Matching Freed Block Found    |
+-------------------------------+
        │
        ▼  (Reused Instead of New Allocation)
+----------------------------+
|  Allocated to Application  |
+----------------------------+

With that in mind, the runtime not only rounds up to the nearest 8-byte boundary but also rounds up to the nearest size class in the allocation table.

Consider this slice operation:

s := make([]int64, 0, 0)

s = append(s, 1, 2, 3, 4, 5)

Here, the slice starts with zero capacity and we add 5 elements of type in64. Without considering the size class allocation table, the runtime would allocate 40 bytes for the new backing array:

5 * 64 bits = 320 bits / 8 = 40 bytes

Instead, the runtime consults the class allocation table and rounds up to the nearest match (48 in this case). As a result, it allocates a backing array with a capacity of 6 (48 bytes / 64 bits), even though the new array would only need to hold 5 elements (that require only 40 bytes).

This approach significantly improves performance by reducing external fragmentation (where free memory is scattered in small, non-contiguous blocks, making larger allocations difficult.) It also minimizes allocation overhead and speeds up memory access by eliminating the need to repeatedly request new memory from the operating system.

Wrapping Up

In summary, the Go runtime takes several key factors into account when growing an array:

The Growth Factor: It starts with a doubling factor (2x) and then gradually winds down to 1.25x.
CPU Word Size: The array is rounded up to the nearest 8-byte boundary.
The Size Class Allocation Table: The runtime rounds up to the nearest available class in the table.

I really admire the thoughtful work the Go team has put into making the language both efficient and flexible. It's clear that a lot of careful consideration went into optimizing performance while maintaining flexibility for developers.