This came out of conversation during Techbash 2024 I had with some people about some common pitfalls when using go’s concurrency in the real world
Go’s concurrency is very simple and powerful, but sometimes with its simplicity, it is easy to make some common mistakes. These common pitfalls are not exclusive to golang, but are common due to the lack of enforced rigor in spawning a goroutine
I’ve seen these cause patterns bugs in the real world.
Starting a Goroutine Without Termination Or Cancellation
Starting a goroutine for a background process with no way to stop it means that there is no way to cleanly exit. Most important work that we do in the background (workers) require cleanup for graceful shutdown.
package main
import (
"time"
)
func main() {
go doLongWork(0)
time.Sleep(5 * time.Second)
}
// This could be gracefully terminating connections
func importantCleanupWork() {
println("Important cleanup work")
}
func doLongWork(startCount int) {
defer importantCleanupWork()
count := startCount
for {
count = count + 1
println("Current value of count:", count)
time.Sleep(1 * time.Second)
}
}
Output:
Current value of count: 1
Current value of count: 2
Current value of count: 3
Current value of count: 4
Current value of count: 5
Recommendation: Pass Contexts And Check For Cancellation for Goroutines
By passing a cancelable context checking for ctx.Done()
, we can cleanly return from the spawned goroutine
package main
import (
"context"
"time"
)
func main() {
ctx, cancelFun := context.WithCancel(context.Background())
go doLongWork(ctx, 0)
time.Sleep(5 * time.Second)
cancelFun()
time.Sleep(2 * time.Second)
}
func doLongWork(ctx context.Context, startCount int) {
count := startCount
for {
select {
case <-ctx.Done():
return
default:
count = count + 1
println("Current value of count:", count)
time.Sleep(1 * time.Second)
}
}
}
Output:
Current value of count: 1
Current value of count: 2
Current value of count: 3
Current value of count: 4
Current value of count: 5
Important cleanup work
Sharing Variables Between Goroutines
This seems obvious in concurrency, but I have seen this as a common mistake especially with starting a goroutine with an anonymous function. Usually the shared memory between the goroutines is not so obvious can cause:
- logical bugs due to non-deterministic execution order
- panics due to concurrent read/write access
package main
import "time"
func main() {
name := "Dan"
// name is accessable from the goroutine, but can be mutated below
go func() {
// sleep to highlight the situation
time.Sleep(600 * time.Millisecond)
println("Hello: " + name)
}()
// sleep to highlight the situation
time.Sleep(500 * time.Millisecond)
name = "Not Dan"
println("Goodbye: " + name)
time.Sleep(500 * time.Millisecond)
}
Output:
Goodbye: Not Dan
Hello: Not Dan
Recommendation: Avoid Starting Goroutines With Anonymous Functions
By extracting the function and passing all the values as parameters (non-pointers), the values are copied. Because the values have been copied, we avoid the bug from shared memory
package main
import "time"
func main() {
name := "Dan"
go sayHello(name)
time.Sleep(500 * time.Millisecond)
name = "Not Dan"
println("Goodbye: " + name)
time.Sleep(500 * time.Millisecond)
}
func sayHello(name string) {
time.Sleep(600 * time.Millisecond)
println("Hello: " + name)
}
Output:
Goodbye: Not Dan
Hello: Dan
Unconstrained Fanouts Accessing Resources
Concurrency can be good, but when multiple goroutines need to use a shared resource, we can run into contention with shared resources. The example is a simple semaphore but in the real world, convention can happen for connection pools.
In the real world, this can surface as spikes of latency when api goroutines need to acquire a connection, but they are all consumed by background goroutines
package main
import (
"context"
"fmt"
"time"
"golang.org/x/sync/semaphore"
)
func main() {
ctx, cancelFun := context.WithCancel(context.Background())
// Simulate an api with consistent usage of the resource pool
go ConsistentUsageFromHttp(ctx)
time.Sleep(1 * time.Second)
// launching 6 goroutines that use up the resources from the pool temporarily
for i := 0; i <= 5; i++ {
go DoWorkWithLock(1 * time.Second)
}
//wait and then shut down
time.Sleep(3 * time.Second)
cancelFun()
time.Sleep(2 * time.Second)
}
func ConsistentUsageFromHttp(ctx context.Context) {
for {
select {
case <-ctx.Done():
return
default:
before := time.Now()
DoWorkWithLock(500 * time.Millisecond)
after := time.Now()
waittime := after.Sub(before).Milliseconds()
println(fmt.Sprintf("Api Waited %d ms", waittime))
}
}
}
var sem = semaphore.NewWeighted(3)
// Simulates a shared resource pool
func DoWorkWithLock(duration time.Duration) {
_ = sem.Acquire(context.Background(), 1)
defer sem.Release(1)
time.Sleep(duration)
}
We can see that the spawned goroutines cause contention on the shared semaphore and the function calls on api have to wait because the 6 spawned goroutines have consumed all the shared resources for 2000ms
Output:
Api Waited 500 ms
Api Waited 500 ms
Api Waited 2500 ms
Api Waited 500 ms
Recommendation: Limit concurrency when using shared Resources
By spawning less goroutines, the calls on the api simulation thread are not experiencing contention and the work on the main thread does not get blocked
package main
import (
"context"
"fmt"
"time"
"golang.org/x/sync/semaphore"
)
func main() {
ctx, cancelFun := context.WithCancel(context.Background())
// Simulate an api with consistent usage of the resource pool
go ConsistentUsageFromHttp(ctx)
time.Sleep(1 * time.Second)
// launching 1 goroutine to not use up the pool temporarily
go DoWithLimit()
//wait and then shut down
time.Sleep(6 * time.Second)
cancelFun()
time.Sleep(2 * time.Second)
}
func DoWithLimit() {
for i := 0; i <= 5; i++ {
DoWorkWithLock(1 * time.Second)
}
}
func ConsistentUsageFromHttp(ctx context.Context) {
for {
select {
case <-ctx.Done():
return
default:
before := time.Now()
DoWorkWithLock(500 * time.Millisecond)
after := time.Now()
waittime := after.Sub(before).Milliseconds()
println(fmt.Sprintf("Api Waited %d ms", waittime))
}
}
}
var sem = semaphore.NewWeighted(3)
// Simulates a shared resource pool
func DoWorkWithLock(duration time.Duration) {
_ = sem.Acquire(context.Background(), 1)
defer sem.Release(1)
time.Sleep(duration)
}
Because the background goroutine can only consume 1 at a time, we do not see the simulated spike in api latency
Output:
Api Waited 500 ms
Api Waited 500 ms
Api Waited 500 ms
Api Waited 500 ms
Api Waited 500 ms
Api Waited 500 ms
Api Waited 500 ms
Api Waited 500 ms
Api Waited 500 ms
Api Waited 500 ms
Api Waited 500 ms
Api Waited 500 ms
Api Waited 500 ms
Api Waited 500 ms