Posts about go(1)

Page 1 of 1
Cool pic

A Generic Set

Generics, concurrency and data structures in Go

In computer science, a set is an abstract data type that can store distinct values, without any particular order. In this article we’ll take a look at how to implement this basic data structure using generics and a bit of concurrency.

Before Generics

Before generics were a thing in Go, people used maps to create custom sets when needed:

existingTitles := map[string]bool{}
existingTitles["Getting Started with Go"] = true
existingTitles["Understanding Goroutines"] = true
existingTitles["Getting Started with Go"] = true
if existingTitles["Getting Started with Go"] {
fmt.Println("Title already exists.")
}

The type of the value isn’t important, since we only care about the keys. Using bool is convenient because it lets us check for membership directly in a conditional:

if validCategories[category] {
// whatevs...
}

When you access a map with a key that doesn’t exist, Go returns the zero value for the map’s value type, i.e. false. That way the condition evaluates to false whenever category isn’t in the set.

TIP

If memory usage is a concern, you can use an empty struct as the map value type:

existingTitles := map[string]struct{}{}

Note that I’ve also initialized the map with {}. Without it, the declaration is invalid. If that’s too many braces for you, use:

existingTitles := make(map[string]struct{})

And if you’re only declaring the variable, you could instead write:

var existingTitles map[string]struct{}

Using struct{} for set values is the idiomatic approach in Go because it occupies zero bytes.

This pattern works well and is widely used in Go, but it’s really just using a map to simulate a set. Before generics, you’d need a separate set implementation for every element type — one for strings, one for ints, and so on. With generics, we write it once and let the type parameter do the rest.

A Generic Set

The general syntax for a generic type declaration in Go is:

type TypeName[T Constraint] UnderlyingType

Where:

  • TypeName is the name of the generic type.
  • T is the type parameter (a placeholder for a concrete type).
  • Constraint specifies which types T is allowed to be.
  • UnderlyingType is the actual type definition that uses T.

Our generic Set will take a single type parameter, E, which represents the type of elements stored in the set:

type Set[E comparable] map[E]struct{}

Let’s break it down:

  • E is the type parameter. You can think of it as a placeholder for the actual element type, such as string, int, or User.
  • comparable is the type constraint: It restricts E to types that can be compared with == and !=.

NOTE

Go maps use equality operators to determine whether a key already exists. When you look up a key or insert one, the map compares it with existing keys. Because of this, map keys must be comparable.

A Generic Function

Just like types, generic functions can also have type parameters. The general syntax is:

func FunctionName[T Constraint](parameters) ReturnType {
// ...
}

Where:

  • FunctionName is, well, you guessed it.
  • T is the type parameter.
  • Constraint specifies which types T is allowed to be.

The type parameter can be used in the function’s parameters, return type, or both.

Since we can’t use a map until it’s initialised, let’s provide a constructor function, which will be generic as well:

func NewSet[E comparable]() Set[E] {
return Set[E]{}
}

When we call this function, we need to specify the concrete element type. For example, to create a Set of strings:

names := NewSet[string]()

Note that a generic set is not a complete type until we instantiate it with a concrete type. Each instantiation gives us a different set type; we write the set implementation once, but each set still keeps its own element type.

Adding Elements

Now let’s write a generic method to add values to the set. We’ll make it variadic, so that it can add one or more elements at a time:

func (s Set[E]) Add(values ...E) {
for _, v := range values {
s[v] = struct{}{}
}
}

Since our set is backed by a map, adding an element means assigning that value as a key. The map value is struct{}{} because we don’t need to store any extra data; the presence of the key is enough. For example:

Making Add variadic gives us a more flexible API. We can still add a single element, but we can also add multiple elements in one call:

names := NewSet[string]()
names.Add("Alice")
names.Add("Bob", "Charlie", "Frankie")
fmt.Println(names) // map[Alice:{} Bob:{} Frankie: {} Charlie: {}]

Note how the values are empty structs, {}. That doesn’t look nice, let’s fix it in the next section.

Listing Set Members

Printing the set reveals its underlying map representation, which isn’t ideal if we simply want to inspect its contents. Let’s add a method that returns all the elements in the set as a slice. Because sets are inherently unordered, the order of the elements is undefined.

func (s Set[E]) Members() []E {
result := make([]E, 0, len(s))
for v := range s {
result = append(result, v)
}
return result
}

Let’s take a look now:

names := NewSet[string]()
names.Add("Alice")
names.Add("Bob")
names.Add("Frankie")
fmt.Println(names.Members()) // [Alice Bob Charlie]

WARNING

The order of the elements is not guaranteed and may vary between runs!

String Method

Let’s also implement the String method so that our set prints more nicely:

func (s Set[E]) String() string {
return fmt.Sprintf("%v", s.Members())
}

With this method in place, we no longer need to call Members explicitly when printing a set:

fmt.Println(s) // [2 3 1]

You may notice that the elements appear in a different order each time you run the program, and that’s perfectly fine. By definition, a set is an unordered collection of distinct values. Since our implementation is backed by a map, it naturally inherits the map’s lack of ordering guarantees.

Checking for Membership

A set is primarily useful because it lets us answer a simple question efficiently:

Does this value belong to the set?

For sets, we don’t care about the value at all—we only care whether the key is present:

func (s Set[E]) Contains(v E) bool {
_, ok := s[v]
return ok
}

NOTE

Since our Set is backed by a map, we can take advantage of Go’s comma ok idiom. A map index expression can return two values:

  • The value associated with the key.
  • A boolean indicating whether the key exists in the map.

The blank identifier (_) discards the map value, while ok tells us whether v is a member of the set. For example:

names := NewSet[string]()
names.Add("Alice")
names.Add("Bob")
fmt.Println(names.Contains("Alice")) // true
fmt.Println(names.Contains("Charlie")) // false

Because map lookups are highly optimized, checking for membership in a set is typically an O(1) operation.

Union of Two Sets

A common operation on sets is the union of two sets: a new set containing all the elements that appear in either set.

union set

One way to implement this is to create a new set containing the members of the first set, and then add the members of the second set:

func (s Set[E]) Union(s2 Set[E]) Set[E] {
result := NewSet(s.Members()...)
result.Add(s2.Members()...)
return result
}

As you can see, we’re making use of our existing API to implement the Union:

  • NewSet to create the new set.
  • Add variadic to merge the second set without the need of a loop.

Let’s try it with two small sets of integers:

s1 := NewSet(1, 2, 3)
s2 := NewSet(3, 4, 5)
fmt.Println(s1.Union(s2)) // [1 2 3 4 5]

The resulting set contains every element from both sets, but only once. Even though both s1 and s2 contain 3, it appears only once in the union because sets cannot contain duplicate elements.

Intersection of Two Sets

Let’s also implement intersection. The intersection of two sets is a new set containing only the elements that both sets have in common.

union set

Unlike Union, we can’t simply combine the members of both sets. Instead, we need to examine each element of one set and check whether it also exists in the other:

func (s Set[E]) Intersection(s2 Set[E]) Set[E] {
result := NewSet[E]()
for _, v := range s.Members() {
if s2.Contains(v) {
result.Add(v)
}
}
return result
}

In other words, we iterate over the members of s, and whenever an element is also present in s2, we add it to the result. Let’s try it:

s1 := NewSet(1, 2, 3)
s2 := NewSet(3, 4, 5)
fmt.Println(s1.Intersection(s2)) // [3]

As expected, the only element shared by both sets is 3, so the intersection contains just that value.

Concurrency Safety

Maps in Go are not safe for concurrent use. If multiple goroutines read from and write to the same map simultaneously, the program will panic. To make our Set concurrency-safe, we can embed a sync.RWMutex and lock appropriately on every operation.

A Read-Write Mutex

Let’s update our Set type with an RWMutex:

type Set[E comparable] struct {
data map[E]struct{}
mutex *sync.RWMutex
}

A sync.RWMutex distinguishes between reads and writes:

  • Any number of goroutines can hold a read lock simultaneously. — Write locks are exclusive: when a writer holds it, no readers or other writers can proceed.

This makes it a good fit for a set, where reads (like Contains) are frequent and writes (like Add) are less so.

IMPORTANT

We’ll store a pointer to the mutex rather than the mutex value itself. If we stored it by value, copying the Set would copy the mutex too — and two independent mutexes no longer protect the same data. Since it never makes sense to copy a mutex, defining the mutex field as a pointer prevents this from happening accidentally.

Before generics, duplicating data structures across types made sophistication impractical — nobody wants to maintain N versions of the same complex code. So corners got cut. Generics remove that disincentive, making it worthwhile to build data structures that are actually good.

Constructor

The constructor looks much the same as before, with two additions: it initializes the mutex with &sync.RWMutex{} and the underlying map explicitly, then returns a pointer to the Set struct rather than the struct itself — consistent with storing the mutex by pointer.

func NewSet[E comparable](vals ...E) *Set[E] {
s := Set[E]{
mutex: &sync.RWMutex{},
data: map[E]struct{}{},
}
for _, v := range vals {
s.data[v] = struct{}{}
}
return &s
}

Add

The behaviour of Add is the same as before, with one important difference: it must acquire a write lock before touching the map. This prevents any other goroutine from reading or writing while the update is in progress.

func (s *Set[E]) Add(vals ...E) {
s.mutex.Lock()
defer s.mutex.Unlock()
for _, v := range vals {
s.data[v] = struct{}{}
}
}

defer s.mutex.Unlock() ensures the lock is always released when Add returns, even if something panics mid-loop.

Members

Members doesn’t modify the map, so a write lock would be overkill. A read lock is enough — it allows other goroutines to read concurrently, while still blocking any writer that tries to modify the map at the same time.

func (s *Set[E]) Members() []E {
s.mutex.RLock()
defer s.mutex.RUnlock()
result := make([]E, 0, len(s.data))
for v := range s.data {
result = append(result, v)
}
return result
}

Contains

Contains is also read-only, so the same strategy applies.

func (s *Set[E]) Contains(v E) bool {
s.mutex.RLock()
defer s.mutex.RUnlock()
_, ok := s.data[v]
return ok
}

Putting It to the Test

To verify that our is concurrently safe, we need to exercise the Set concurrently — at least one goroutine writing, another reading, and enough repetitions that a data race has a fair chance to surface. A single read and write might get lucky; a thousand of each is harder to fluke.

s := NewSet(1, 2, 3)
var wg sync.WaitGroup
wg.Add(1)
go func() {
for i := 0; i < 1000; i++ {
s.Add(i)
}
wg.Done()
}()
for i := 0; i < 1000; i++ {
_ = s.Members()
}
wg.Wait()
fmt.Println("Huzzah!") // Huzzah!

If the locking is correct, our program won’t panic, and we should see the output:

Huzzah!

Let’s see what happens if we comment out the locking code in Add and run the same test:

func (s *Set[E]) Add(vals ...E) {
// s.mutex.Lock()
// defer s.mutex.Unlock()
// ...
}

If you run the program, you’d most probably see something like this:

fatal error: concurrent map iteration and map write
goroutine 1 [running]:
...

The Go runtime detected a concurrent read in Members and a write in Add, and since it can no longer guarantee the integrity of the data, it terminates the program immediately.

Intersection: Taming Mutex Churn

We could implement Intersection the same way as before — reusing Members, Contains, and Add — and it would be concurrency-safe without any extra work, since those methods already handle their own locking.

But there’s an efficiency concern. Since we call s2.Contains — which locks and unlocks the mutex — for every member of s, large sets will produce a lot of mutex churn. Mutexes are fast, but they aren’t free, and repeatedly acquiring and releasing a lock in a tight loop adds up.

A better approach is to lock s2 once, do all the membership checks while the lock is held, then release it:

func (s *Set[E]) Intersection(s2 *Set[E]) *Set[E] {
result := NewSet[E]()
s2.mutex.RLock()
defer s2.mutex.RUnlock()
for _, v := range s.Members() {
if _, ok := s2.data[v]; ok {
result.Add(v)
}
}
return result
}

One read lock on s2 for the entire operation, rather than one per element.

NOTE

It’s a trade-off, though. Holding a read lock on s2 for the entire operation avoids mutex churn, but it also means blocking any writer that wants to modify s2 for the full duration of the intersection. For large sets, that could be a long time.

Depending on the application, more granular locking — acquiring and releasing the lock for each individual Contains check — might actually be preferable. Less contention per operation, at the cost of more locking overhead overall. It just depends on whether your bottleneck is mutex churn or lock contention.

Unsafe Set

Not every application needs concurrency safety. If your set is only ever accessed from a single goroutine, the mutex overhead is pure waste. In that case, we can go back to the simpler implementation from the start of this article — same methods, no locking.

Rather than maintaining two entirely separate types, we can offer both through two constructors:

  • NewSet for the concurrency safe default.
  • NewUnsafeSet for when performance matters more than safety.

Users get the right tool for the job, and we only write the implementation once.

Types

The idea is to split the two concerns into two types:

  • UnsafeSet is the raw, lock-free implementation — just a map, same as we started with.
  • Set wraps it, adding a mutex to make it safe for concurrent use:
type UnsafeSet[E comparable] map[E]struct{}
type Set[E comparable] struct {
data UnsafeSet[E]
mutex *sync.RWMutex
}

Constructors

Each type gets its own constructor:

  • NewUnsafeSet just initialises the map and populates it.
  • NewSet builds on top of it, wrapping the UnsafeSet in a Set and initialising the mutex:
func NewUnsafeSet[E comparable](vals ...E) UnsafeSet[E] {
s := UnsafeSet[E]{}
for _, v := range vals {
s[v] = struct{}{}
}
return s
}
func NewSet[E comparable](vals ...E) *Set[E] {
return &Set[E]{
data: NewUnsafeSet[E](vals...),
mutex: &sync.RWMutex{},
}
}

Adding Items

UnsafeSet.Add is the same as before — iterate and assign. Set.Add wraps it with a write lock, ensuring no other goroutine can read or write the underlying map while the update is in progress:

func (s UnsafeSet[E]) Add(vals ...E) {
for _, v := range vals {
s[v] = struct{}{}
}
}
func (s *Set[E]) Add(vals ...E) {
s.mutex.Lock()
defer s.mutex.Unlock()
s.data.Add(vals...)
}

Listing Members

UnsafeSet.Members is the same implementation we wrote earlier: create a slice, iterate over the map keys, and append each element. Set.Members adds a read lock around that same logic:

func (s UnsafeSet[E]) Members() []E {
result := make([]E, 0, len(s))
for v := range s {
result = append(result, v)
}
return result
}
func (s *Set[E]) Members() []E {
s.mutex.RLock()
defer s.mutex.RUnlock()
return s.data.Members()
}

Since Members only reads from the map, the safe version uses RLock instead of Lock.

Checking for Membership

UnsafeSet.Contains uses the comma-ok idiom directly on the map. Set.Contains wraps the same check in a read lock:

func (s UnsafeSet[E]) Contains(v E) bool {
_, ok := s[v]
return ok
}
func (s *Set[E]) Contains(v E) bool {
s.mutex.RLock()
defer s.mutex.RUnlock()
return s.data.Contains(v)
}

Since membership checks only read from the map, the safe version uses RLock.

String Method

UnsafeSet.String formats the set by reusing Members. Set.String does the same through the safe Members method:

func (s UnsafeSet[E]) String() string {
return fmt.Sprintf("%v", s.Members())
}
func (s *Set[E]) String() string {
return fmt.Sprintf("%v", s.Members())
}

Both versions print the set as a slice, and the element order is still undefined.

Union of Two Sets

UnsafeSet.Union creates a new unsafe set, copies the members of the first set, and then adds the members of the second set. Set.Union does the same through safe snapshots of both sets:

func (s UnsafeSet[E]) Union(s2 UnsafeSet[E]) UnsafeSet[E] {
result := NewUnsafeSet(s.Members()...)
result.Add(s2.Members()...)
return result
}
func (s *Set[E]) Union(s2 *Set[E]) *Set[E] {
result := NewSet(s.Members()...)
result.Add(s2.Members()...)
return result
}

The safe version uses Members on both sets, so each read is protected by its own read lock.

Intersection of Two Sets

UnsafeSet.Intersection creates a new unsafe set and adds only the values that also exist in the second set. Set.Intersection takes safe snapshots before doing the same work:

func (s UnsafeSet[E]) Intersection(s2 UnsafeSet[E]) UnsafeSet[E] {
result := NewUnsafeSet[E]()
for _, v := range s.Members() {
if s2.Contains(v) {
result.Add(v)
}
}
return result
}
func (s *Set[E]) Intersection(s2 *Set[E]) *Set[E] {
result := NewSet[E]()
for _, v := range s.Members() {
if s2.Contains(v) {
result.Add(v)
}
}
return result
}

The safe version delegates reads through Members and Contains, so each map access is protected by a read lock.

Deleting Items

UnsafeSet.Delete removes values directly from the map. Set.Delete wraps that same operation with a write lock:

func (s UnsafeSet[E]) Delete(vals ...E) {
for _, v := range vals {
delete(s, v)
}
}
func (s *Set[E]) Delete(vals ...E) {
s.mutex.Lock()
defer s.mutex.Unlock()
s.data.Delete(vals...)
}

Deleting a value that is not in the set is safe: Go’s built-in delete function does nothing when the key does not exist.

Set Length

UnsafeSet.Len returns the number of elements in the map. Set.Len wraps the same operation with a read lock:

func (s UnsafeSet[E]) Len() int {
return len(s)
}
func (s *Set[E]) Len() int {
s.mutex.RLock()
defer s.mutex.RUnlock()
return s.data.Len()
}

Since Len only reads the map, the safe version uses RLock.

Closing Thoughts

This article is not really about building the perfect set implementation. The set is just a useful example: small enough to understand, but rich enough to show how generics and concurrency fit together in Go.

NOTE

That’s not to say the implementation is complete. There’s plenty more we could add:

  • An Equals method
  • An IsSubset method
  • A Subtract method that removes all members that belong to some other set
  • A Difference function to find the members that two sets don’t have in common
  • A Make function to pre-allocate memory
  • Map, Reduce, and Filter operations on set elements

These are left as an exercise for the reader.

If you need a production-ready set package, two good options are: