Go Map Internals and Concurrent Safety

"PHP spoiled me," I thought, as I realize that concurrent operations on maps in Go require special handling. This was back in early 2023 when I first started writing Go programs after over a decade as a PHP developer. Although I didn't study computer science formally, I’ve always been fascinated by computing since I was 13. I even remember reading about hash maps in an old book my father constantly talked about—one I had to read with basic English skills.

A hash map, also known as a dictionary, is a data structure that stores key/value pairs and enables efficient data retrieval. Each key is unique and linked to a specific value. The structure employs a hashing function to generate a hash code, which determines where the key-value pair is stored in memory.

Consider the following map:

people := map[string]int{
    "Alice":   25,
    "Bob":     30,
    "Charlie": 35,
    "Diana":   40,
    "Eve":     45,
}

This map contains five key/value pairs, but it has the potential to store many more. To look up a key’s value, the runtime would need to scan through all the keys one by one until it finds the right match. As the map grows, this process becomes increasingly inefficient. That’s where a hash map proves useful.

Map Internals in Go 1.23

Instead of storing all key/value pairs in a contiguous memory block and searching through each entry, a hash map distributes them across multiple buckets. Each bucket holds information about the keys it contains, enabling the runtime to use the key to quickly locate the appropriate bucket. From there, it only needs to search within that bucket to find the desired value, significantly improving lookup efficiency.

In Go, maps are implemented using hash tables with an array of buckets, and each bucket contains a small, fixed number of key/value pairs. The runtime distributes keys into these buckets using a hashing function. When we insert a key/value pair into a map, the key is first passed through the hash function, which produces a hashed code. This code is then used to determine the bucket index where the key/value pair will be stored.

Given the map in our example above, the buckets in the map may look like this:

+-----------+-----------+----------------+-----------+
| Bucket 0  | Bucket 1  | Bucket 2       | Bucket 3  |
+-----------+-----------+----------------+-----------+
| "Alice:25"| "Bob:30"  | "Charlie:35"   | "Diana:40"|
|           | "Eve:45"  |                |           |
+-----------+-----------+----------------+-----------+

Adding a value for key Zain, for example, would instruct the runtime to hash the key hash("Zain") and use the result to identify the bucket in which that key should be stored. When later looking up Zain in the map, the key goes through the same hash function, and the same algorithm is used to identify the bucket in which the pair is stored. Once the bucket is located, the runtime scans the bucket’s stored keys until it finds a match.

Mapping to buckets

Imagine the internal structure of a map where the indices of buckets and the generated hash are represented using 8-bit integers, while Go’s map implementation doesn’t actually use 8-bit integers, we’ll use them here for simplification. Now, suppose the map has 8 buckets, and the runtime needs to determine which bucket should store a hash value of 255. How does it decide where to place it?

In binary representation, 8 is 00001000, while 255 is 11111111. To map the 255 hash into a bucket, the runtime uses the bitwise AND operation to compare corresponding bits of the two numbers and return a new number where each bit is set to 1 if both bits in the corresponding position of the operands are 1, or 0 if one of the bits is 0. The formula used to find the index is:

bucketIndex = HASH & (NUM_BUCKETS - 1)

NUM_BUCKETS - 1 in our case is 7, which is 00000111 in 8-bit binary representation. The bitwise AND operation will then look like this:

  11111111
& 00000111
----------
  00000111

Here, in the last three bits, both numbers have 1s, so the result is 1 for each bit. The other bits are 0 because at least one of the bits in those positions is 0. The result is 00000111 in binary, which is the integer value 7.

What if the hash is 250 instead of 255? In that case, the binary representation will be 11111010 and the operation will look like this:

  11111010
& 00000111
------------
  00000010

The result 00000010 corresponds to the integer 2. That means a hash of value 250 maps to the bucket at index 2.

Bucket Overflow

Each bucket can store up to 8 key/value pairs. When the runtime notices that more than 8 keys hash to the same bucket, it creates an overflow bucket and attaches it to the bucket that is full:

+--------------+---------------------+
| Bucket 0     | Overflow Bucket 1   |
+--------------+---------------------+
| "Alice:25"   | "Isaac:65"          |
| "Bob:30"     | "Jack:70"           |
| "Charlie:35" |                     |
| "Diana:40"   |                     |
| "Eve:45"     |                     |
| "Frank:50"   |                     |
| "Grace:55"   |                     |
| "Hank:60"    |                     |
+--------------+---------------------+

In the scenario above, more than 8 keys hashed to Bucket 0, and the runtime created Overflow Bucket 1 to store the extra key/value pairs. When looking up the key Jack in our program, the runtime will scan all the pairs in Bucket 0 and then move to scanning Overflow Bucket 1.

As more keys hash to the same bucket, more overflow buckets may be created. Which slows down lookups because the runtime must search through multiple buckets. To prevent performance degradation, the runtime implements an automatic resizing strategy in which a new hash map is created with double the number of buckets and the keys are rehashed to distribute them evenly across the new buckets. Once all the keys are redistributed in the new hash map, the old map is discarded, freeing up memory. Another strategy the runtime takes when not all the buckets are full is to redistribute the key/pairs on a new hash map of the same size by changing the hashing procedure.

Maps in Go 1.24

In Go 1.24, the internal structure of maps has undergone a significant redesign to enhance performance and scalability. Previously, all keys in a map were stored within a single hash table. However, in the updated implementation, keys are distributed across multiple hash tables, each of which is further divided into multiple groups of eight key/value pairs.

When a key is accessed, its hash code determines which table it belongs to, as well as the specific group within that table where it might be stored. If the key is not found within the identified group, the runtime proceeds to scan the subsequent group in search of a match. This process continues until either the key is located or the search reaches the end of the table.

The same method is applied during insertion. First, the appropriate group is identified based on the key’s hash. If the group already contains eight key/value pairs (i.e., it is full), the runtime scans the next group to find an available slot.

A key advantage of this new design is its ability to scale dynamically as each table can grow independently. As more keys are added, the number of groups within a table can grow up to a predefined limit. When this limit is reached, rather than further expanding the existing table, the runtime splits it into two separate tables and rehashes the keys accordingly.

Another significant advantage of this new design is its ability to enhance both lookup and insertion performance. In previous implementations, when a map's primary bucket became full, overflow buckets were allocated in separate memory locations. This led to scattered memory access patterns, increasing cache misses and slowing down lookups and insertions.

With the updated approach, each table maintains references to its groups in a contiguous block of memory. This layout enables more efficient scanning of multiple groups in sequence, leveraging spatial locality to improve cache efficiency.

Spatial Locality

Spatial locality refers to the principle that data items stored close to each other in memory are more likely to be accessed together in a short period. Modern processors take advantage of this by loading chunks of adjacent memory into the CPU cache in a single operation. Since the hash table groups are stored contiguously, scanning them benefits from fewer cache misses, as multiple key/value pairs can be loaded into the cache at once.

Pointer Chasing and Its Impact

In older implementations, when a bucket was full, the runtime allocated an overflow bucket elsewhere in memory and linked it to the original bucket using a pointer. This resulted in pointer chasing, a process where the CPU must follow pointers scattered across different memory locations to retrieve necessary data. Pointer chasing is expensive because each pointer dereference can lead to a cache miss, requiring the processor to fetch data from main memory rather than the much faster CPU cache.

By contrast, in the new design, storing groups in contiguous memory minimizes the need for pointer chasing. Instead of jumping between arbitrary memory locations, the CPU can scan through the groups sequentially, leading to more predictable memory access patterns. This significantly reduces cache misses, improves data locality, and enhances overall lookup and insertion efficiency, particularly for large maps with frequent key accesses.

Concurrent Safety

Understanding how a hash map handles key insertions, key lookups, and growth makes it clear why concurrent write operations can lead to data corruption. If multiple goroutines modify the same bucket or trigger a map resize simultaneously, the runtime may run into conflicts.

On the other hand, concurrent reads are generally safe since the runtime doesn’t alter the map’s structure during read operations. However, if at least one write operation occurs alongside reads, race conditions can still arise.

In summary, multiple concurrent reads are safe only if no writes are happening. But multiple concurrent writes are never safe without proper synchronization.

Since PHP is single threaded, race conditions doesn't occur. We read and write to associative arrays without worrying about what's going on behind the scenes. The PHP runtime handles the internal hash map during each operation with no chance of interference from a different operation. That's clearly not the case in Go.

Maps in Go are convenient in-memory cache stores in which we can share values between multiple Goroutines. Think of a rate limiter, for example, in which we store the number of times an IP address has hit our application:

var visitorCounts = make(map[string]int)

func rateLimitMiddleware(w http.ResponseWriter, r *http.Request) {
	ip := strings.Split(r.RemoteAddr, ":")[0]

    if visitorCounts[ip] > 100 {
		http.Error(w, "Rate limit exceeded.", http.StatusTooManyRequests)
		return
	}

	visitorCounts[ip]++
}

In this example, we extract the IP address from the request object and check the visitorCounts map for the number of times this IP address has visited our application. If the value exceeds 100, we respond with an error. Otherwise, we increment the counter.

Since Go's web server handles every request in a separate goroutine, the rateLimitMiddleware function may be executed concurrently. Which means our access to the map requires protection against race conditions.

Luckily, Go ships with various memory synchronization primitives. One of them is sync.RWMutex. It allows us to acquire locks before critical sections and release them after the critical section has been executed.

The sync.RWMutex type has two types of locks:

Read-only locks: These locks allow multiple read operations to be performed concurrently as long as no write operation is in progress.
Read-write locks: These locks prevent any read or write operations.

To apply sync.RWMutex in our code, we may do something like this:

var visitorCounts = make(map[string]int)
var mu sync.RWMutex

func rateLimitMiddleware(w http.ResponseWriter, r *http.Request) {
	ip := strings.Split(r.RemoteAddr, ":")[0]

	mu.RLock() // Lock for reading
	if visitorCounts[ip] > 100 {
		mu.RUnlock() // Release the read lock

		http.Error(w, "Rate limit exceeded.", http.StatusTooManyRequests)
		return
	}
	mu.RUnlock() // Release the read lock

	mu.Lock() // Lock for writing
	visitorCounts[ip]++
	mu.Unlock() // Release the write lock
}

Here, the variable mu holds an sync.RWMutex instance. We call mu.RLock() to acquire a read lock before reading from the visitorCounts map, and release the lock using mu.RUnlock() after the reading operation is completed. We also use mu.Lock() to acquire a read-write lock before we increment the visits counter, and release the lock after.

With this setup, multiple goroutines may execute the visitorCounts[ip] read instruction concurrently as long as there are no read-write locks, and no goroutine may acquire any kind of lock if a read-write lock is already in place.

It's important to note that this isn't a true rate limiter. In a real implementation, the check and counter increment would need to happen as a single atomic operation, and we'd also need a mechanism to reset the counters after a cooldown period to allow more requests.

To ensure the check and the counter increment happens in a single atomic operation, we may use the sync.Mutex primitive instead. This type has a single Lock and Unlock methods that protect against all operations:

var visitorCounts = make(map[string]int)
var mu sync.Mutex

func rateLimitMiddleware(w http.ResponseWriter, r *http.Request) {
	ip := strings.Split(r.RemoteAddr, ":")[0]

	mu.Lock() // Lock for reading & writing
	if visitorCounts[ip] > 100 {
		mu.Unlock() // Release the lock

		http.Error(w, "Rate limit exceeded.", http.StatusTooManyRequests)
		return
	}

    visitorCounts[ip]++
	mu.Unlock() // Release the lock
}

With this change, we acquire a lock before accessing the visitorCounts map, ensuring that no other operations interfere while we check the IP’s request count. If the IP has exceeded the rate limit, we release the lock immediately. Otherwise, we safely update the counter before unlocking. This guarantees that no read or write operations occur between checking and modifying the counter, preventing race conditions.

What we've implemented with the changes in this article is called concurrent safety, which is ensuring the program can handle multiple tasks happening at the same time without leading to errors or inconsistent results. I never had to worry about that in PHP, at least outside Redis and database operations. Which shows the beauty of PHP for rapid development. All the dragons are locked and the path is clear.

If you're a PHP developer looking to dive into Go, I've written a book just for you. PHP to Go breaks down Go concepts in a way that makes sense to someone coming from a PHP background. It’s the guide I wish I had when I first started learning the language.

A More Structured Approach

In a large application, memory access synchronization is often needed in multiple places. Keeping shared variables and their corresponding mutexes in the global scope can quickly lead to disorganized and hard-to-maintain code. As the application grows, tracking which parts of the code modify shared data becomes increasingly complex.

To address this, a common pattern in Go is to encapsulate the map and its mutex within a custom type. This approach not only improves organization but also enforces better concurrency safety by exposing well-defined methods for interacting with the data. By centralizing synchronization logic within the type, we reduce the risk of accidental race conditions and make the codebase cleaner, more modular, and easier to maintain.

Given our rate limiter example, we may declare a RateLimiter type that looks like this:

type RateLimiter struct {
	visitorCounts map[string]int
	mu            sync.Mutex
	Limit         int
}

Our RateLimiter type includes the visitorCounts map, the mutex, and the limit. We may then add a method that we would call to check if a request is allowed:

func (rl *RateLimiter) AllowRequest(ip string) bool {
	rl.mu.Lock()
    
    if rl.visitorCounts[ip] > rl.Limit {
		rl.mu.Unlock()

		return false
	}

    rl.visitorCounts[ip]++
	rl.mu.Unlock()

	return true
}

We can now define our middleware as follows:

var limiter = RateLimiter{
	Limit: 100,
}

func rateLimitMiddleware(w http.ResponseWriter, r *http.Request) {
	ip := strings.Split(r.RemoteAddr, ":")[0]

	if !limiter.AllowRequest(ip) {
		http.Error(w, "Rate limit exceeded.", http.StatusTooManyRequests)
		return
	}
}

This change not only makes our code more organized but also provides the flexibility to refine the rate limiter in the future without modifying the middleware logic. For instance, if our application starts handling a high volume of concurrent traffic, we can distribute the counters across multiple maps to reduce contention on a single lock. This technique, known as sharding, helps minimize the performance overhead caused by multiple goroutines waiting to acquire the lock, ultimately improving response times and scalability.