Common Pitfalls With Go Concurrency

Sep 27, 2024    #golang   #async  

This came out of conversation during Techbash 2024 I had with some people about some common pitfalls when using go’s concurrency in the real world

Go’s concurrency is very simple and powerful, but sometimes with its simplicity, it is easy to make some common mistakes. These common pitfalls are not exclusive to golang, but are common due to the lack of enforced rigor in spawning a goroutine

I’ve seen these cause patterns bugs in the real world.

Starting a Goroutine Without Termination Or Cancellation

Starting a goroutine for a background process with no way to stop it means that there is no way to cleanly exit. Most important work that we do in the background (workers) require cleanup for graceful shutdown.

package main

import (
    "time"
)

func main() {
    go doLongWork(0)
    time.Sleep(5 * time.Second)
}

// This could be gracefully terminating connections
func importantCleanupWork()  {
    println("Important cleanup work")
}
func doLongWork(startCount int) {
    defer importantCleanupWork()

    count := startCount
    for {
        count = count + 1
        println("Current value of count:", count)
        time.Sleep(1 * time.Second)
    }
}

Output:

Current value of count: 1
Current value of count: 2
Current value of count: 3
Current value of count: 4
Current value of count: 5

Recommendation: Pass Contexts And Check For Cancellation for Goroutines

By passing a cancelable context checking for ctx.Done(), we can cleanly return from the spawned goroutine

package main
import (
	"context"
	"time"
)
func main() {
	ctx, cancelFun := context.WithCancel(context.Background())
	go doLongWork(ctx, 0)
	time.Sleep(5 * time.Second)
	cancelFun()
	time.Sleep(2 * time.Second)
}

func doLongWork(ctx context.Context, startCount int) {
	count := startCount
	for {
		select {
		case <-ctx.Done():
			return
		default:
			count = count + 1
			println("Current value of count:", count)
			time.Sleep(1 * time.Second)
		}
	}
}

Output:

Current value of count: 1
Current value of count: 2
Current value of count: 3
Current value of count: 4
Current value of count: 5
Important cleanup work

Sharing Variables Between Goroutines

This seems obvious in concurrency, but I have seen this as a common mistake especially with starting a goroutine with an anonymous function. Usually the shared memory between the goroutines is not so obvious can cause:

package main
import "time"

func main() {
	name := "Dan"
	
	// name is accessable from the goroutine, but can be mutated below
	go func() {
		// sleep to highlight the situation
		time.Sleep(600 * time.Millisecond)
		println("Hello: " + name)
	}()
	// sleep to highlight the situation
	time.Sleep(500 * time.Millisecond)
	name = "Not Dan"
	println("Goodbye: " + name)
	time.Sleep(500 * time.Millisecond)

}

Output:

Goodbye: Not Dan
Hello: Not Dan

Recommendation: Avoid Starting Goroutines With Anonymous Functions

By extracting the function and passing all the values as parameters (non-pointers), the values are copied. Because the values have been copied, we avoid the bug from shared memory

package main
import "time"
func main() {
	name := "Dan"
	go sayHello(name)
	time.Sleep(500 * time.Millisecond)
	name = "Not Dan"
	println("Goodbye: " + name)
	time.Sleep(500 * time.Millisecond)
}

func sayHello(name string)  {
	time.Sleep(600 * time.Millisecond)
	println("Hello: " + name)
}

Output:

Goodbye: Not Dan
Hello: Dan

Unconstrained Fanouts Accessing Resources

Concurrency can be good, but when multiple goroutines need to use a shared resource, we can run into contention with shared resources. The example is a simple semaphore but in the real world, convention can happen for connection pools.

In the real world, this can surface as spikes of latency when api goroutines need to acquire a connection, but they are all consumed by background goroutines

package main

import (
    "context"
    "fmt"
    "time"

    "golang.org/x/sync/semaphore"
)

func main() {
    ctx, cancelFun := context.WithCancel(context.Background())

    // Simulate an api with consistent usage of the resource pool
    go ConsistentUsageFromHttp(ctx)
    time.Sleep(1 * time.Second)

    // launching 6 goroutines that use up the resources from the pool temporarily
    for i := 0; i <= 5; i++ {
        go DoWorkWithLock(1 * time.Second)
    }

    //wait and then shut down
    time.Sleep(3 * time.Second)
    cancelFun()
    time.Sleep(2 * time.Second)
}

func ConsistentUsageFromHttp(ctx context.Context) {
    for {
        select {
            case <-ctx.Done():
                return
            default:
				before := time.Now()
                DoWorkWithLock(500 * time.Millisecond)
                after := time.Now()
                waittime := after.Sub(before).Milliseconds()
                println(fmt.Sprintf("Api Waited %d ms", waittime))
        }
    }

}

var sem = semaphore.NewWeighted(3)

// Simulates a shared resource pool
func DoWorkWithLock(duration time.Duration) {
    _ = sem.Acquire(context.Background(), 1)
    defer sem.Release(1)
    time.Sleep(duration)
}

We can see that the spawned goroutines cause contention on the shared semaphore and the function calls on api have to wait because the 6 spawned goroutines have consumed all the shared resources for 2000ms

Output:

Api Waited 500 ms
Api Waited 500 ms
Api Waited 2500 ms
Api Waited 500 ms

Recommendation: Limit concurrency when using shared Resources

By spawning less goroutines, the calls on the api simulation thread are not experiencing contention and the work on the main thread does not get blocked

package main

import (
	"context"
	"fmt"
	"time"

	"golang.org/x/sync/semaphore"
)

func main() {
	ctx, cancelFun := context.WithCancel(context.Background())

	// Simulate an api with consistent usage of the resource pool
	go ConsistentUsageFromHttp(ctx)
	time.Sleep(1 * time.Second)

	// launching 1 goroutine to not use up the pool temporarily
	go DoWithLimit()

	//wait and then shut down
	time.Sleep(6 * time.Second)
	cancelFun()
	time.Sleep(2 * time.Second)
}
func DoWithLimit() {
	for i := 0; i <= 5; i++ {
		DoWorkWithLock(1 * time.Second)
	}
}

func ConsistentUsageFromHttp(ctx context.Context) {
	for {
		select {
		case <-ctx.Done():
			return
		default:
			before := time.Now()
			DoWorkWithLock(500 * time.Millisecond)
			after := time.Now()
			waittime := after.Sub(before).Milliseconds()
			println(fmt.Sprintf("Api Waited %d ms", waittime))
		}
	}

}

var sem = semaphore.NewWeighted(3)

// Simulates a shared resource pool
func DoWorkWithLock(duration time.Duration) {
	_ = sem.Acquire(context.Background(), 1)
	defer sem.Release(1)
	time.Sleep(duration)
}

Because the background goroutine can only consume 1 at a time, we do not see the simulated spike in api latency

Output:

Api Waited 500 ms
Api Waited 500 ms
Api Waited 500 ms
Api Waited 500 ms
Api Waited 500 ms
Api Waited 500 ms
Api Waited 500 ms
Api Waited 500 ms
Api Waited 500 ms
Api Waited 500 ms
Api Waited 500 ms
Api Waited 500 ms
Api Waited 500 ms
Api Waited 500 ms