joshrosso

Time-based KV Store in Go (Interview Question)

I recently got the Time Based Key-Value Store problem in a technical interview. It’s a fun exercise, and I believe there might a interesting spin one could put on it. What follows is not a recommendation for an interview question, more-so just an interesting modification to the problem for us to ponder and solve.

Content in video form if that's more of your thing:

The original question breaks down to the following:

The following is assumed:

The standard solution to the problem is to use a map of keys that hold a list of values. Since the values can be assumed to be retrieved chronologically, you can use binary search to find the first key larger than the time stamp asked for in the get call. Then you return all elements lower than it. This solution has a time complexity of:

$$ set = O(1) \newline get = O(log\ n) $$

Enhancements

Some of these enhancements are inspired by the specific questions asked of me, while others are ideas I came up with (I'm trying not to give too much away about the specific interview).

In this new question, we’re going to keep most properties the same. Here are the changes:

Extra considerations:

With these changes in mind, we need to enhance the data structure to optimize around get. Specifically, we need to set something up that can get us a constant lookup time regardless of whether the user asks for a specific time or not (ie gets the latest).

Solution: Design

First, let’s consider the data structure we want to build. In order to facilitate O(1) on get, we can create something like:

TimeMap
TimeMap
key:
"dog"
key:...
timeStore
timeStore
values
values
timeIndex
timeIndex
key:
Jan 2nd, 2023
key:...
key:
Jan 1st, 2023
key:...
key:
Jan 3rd, 2023
key:...
"woof"
"woof"
"bark"
"bark"
"howl"
"howl"
Text is not SVG - cannot display

TimeMap is the top-level map that contains the keys. Each key points to a unique timeStore, which holds the values list along with a timeIndex using the time stamp of a value as a key and pointing to the value in values.

With this data-structure in place, we can return both the latest element and specific (time-based) element in constant time. When the last element is desired, we simply return the last element in the values. When a specific timestamp is provided, we can lookup the timestamp since they’re store in the timeIndex map as a key. If a key is present, we can return the value it points to.

There are a few downsides to consider:

  1. While constant time, we are potentially doing 2 extra traversals via pointers. This can mean:
    1. Memory of this data structure will grow faster as we maintain the map of timeIndex.
    2. This pointer-based model means even less contiguous memory, thus our data structure has less chance (especially as it grows) to land in CPU cache.
    3. Insertion and deletion is more complex as we need to ensure we manage timeIndex correctly.

Regarding the getBefore implementation, this remains the same as the original question. Thus our complexity ends up at:

$$ get = O(1) \newline set = O(1) \newline getRange = O(log\ n) $$

Solution Code

In this section, I’ll be showing and explaining the code chunk by chunk, to see the full file, visit the GitHub.

To begin, let’s create the data structure described above.

// TimeMap holds a key and all values added over time for that key.
type TimeMap struct {
	times map[string]*timeStore
}

// timeStore is the underlying store for each key.
type timeStore struct {
	timeIndex map[time.Time]*value // mapping to each value based on time
	values    []*value             // underlying data store
}

// value represents the object stored.
type value struct {
	stamp time.Time // timestamp of insertion
	val   string    // string used for simplicity, but imagine a larger struct
}

Next, we’ll create some helper functions.

// New returns a new [TimeMap].
func New() TimeMap {
	return TimeMap{
		times: map[string]*timeStore{},
	}
}

// newTimeStore is used when a new key is introduced. It intializes and returns
// the pointer to the new key's store.
func newTimeStore() *timeStore {
	return &timeStore{
		timeIndex: map[time.Time]*value{},
		values:    []*value{},
	}
}

func (tm *TimeMap) getTimeStore(key string) (*timeStore, error) {
	if ts, ok := tm.times[key]; ok {
		return ts, nil
	}
	return nil, fmt.Errorf("key [%s] does not exist", key)
}

Note that our need for newTimeStore is to have an easy way to create a new store when a new key is created in the TimeMap. Additionally, getTimeStore is a helper function to quickly lookup a key’s existence and return an error if not. This will be reused in our our get* implementations.

The first function to implement is Set:

// Set adds a value to a given key. When the value is added, its time stamp is
// recorded.
func (tm *TimeMap) Set(key, data string) {
	// when the key exists, insert the value into the store
	if timeStore, ok := tm.times[key]; ok {
		stamp := time.Now()
		v := &value{stamp: stamp, val: data}
		timeStore.values = append(timeStore.values, v)
		timeStore.timeIndex[stamp] = v
		return
	}
	// when the key is new, create a new store for the key and recall this
	// method
	tm.times[key] = newTimeStore()
	tm.Set(key, data)
}

Set focuses on 2 outcomes, whether the key exists or not. When the key exists, it’s simply a matter of recording the timestamp and inserting the value into the existing timeStore. When the key does not exist, we need to create a new timeStore associated with that key. Once that timeStore exists, recursively calling this function guarantees the value-insertion (condition 1) logic is run. There is a case to be made that the recursive call is unnecessarily ‘clever’, however I feel it’s fine.

With set in place, we can do some simple testing in a main() function:

tm := New()
kvs := map[string][]string{
	"dog": {"woof", "bark", "sigh", "growl", "wimper"},
	"cat": {"hiss", "screech", "crash", "meow"},
}
for k, sounds := range kvs {
	for _, sound := range sounds {
		tm.Set(k, sound)
	}
}
spew.Dump(tm)

spew can be retrieved for your project via go get [github.com/davecgh/go-spew/spew](http://github.com/davecgh/go-spew/spew), it’ll enable us to view the end state of the data structure:

(main.TimeMap) {
 times: (map[string]*main.timeStore) (len=2) {
  (string) (len=3) "cat": (*main.timeStore)(0x1400007c0c0)({
   timeIndex: (map[time.Time]*main.value) (len=4) {
    (time.Time) 2023-03-17 18:58:21.721497 -0600 MDT m=+0.000864293: (*main.value)(0x140000643f0)({
     stamp: (time.Time) 2023-03-17 18:58:21.721497 -0600 MDT m=+0.000864293,
     val: (string) (len=4) "hiss"
    }),
    (time.Time) 2023-03-17 18:58:21.742538 -0600 MDT m=+0.021904918: (*main.value)(0x14000180000)({
     stamp: (time.Time) 2023-03-17 18:58:21.742538 -0600 MDT m=+0.021904918,
     val: (string) (len=7) "screech"
    }),
    (time.Time) 2023-03-17 18:58:21.763635 -0600 MDT m=+0.043003168: (*main.value)(0x14000064420)({
     stamp: (time.Time) 2023-03-17 18:58:21.763635 -0600 MDT m=+0.043003168,
     val: (string) (len=5) "crash"
    }),
    (time.Time) 2023-03-17 18:58:21.783989 -0600 MDT m=+0.063356585: (*main.value)(0x14000180030)({
     stamp: (time.Time) 2023-03-17 18:58:21.783989 -0600 MDT m=+0.063356585,
     val: (string) (len=4) "meow"
    })
   },
   values: ([]*main.value) (len=4 cap=4) {
    (*main.value)(0x140000643f0)({
     stamp: (time.Time) 2023-03-17 18:58:21.721497 -0600 MDT m=+0.000864293,
     val: (string) (len=4) "hiss"
    }),
    (*main.value)(0x14000180000)({
     stamp: (time.Time) 2023-03-17 18:58:21.742538 -0600 MDT m=+0.021904918,
     val: (string) (len=7) "screech"
    }),
    (*main.value)(0x14000064420)({
     stamp: (time.Time) 2023-03-17 18:58:21.763635 -0600 MDT m=+0.043003168,
     val: (string) (len=5) "crash"
    }),
    (*main.value)(0x14000180030)({
     stamp: (time.Time) 2023-03-17 18:58:21.783989 -0600 MDT m=+0.063356585,
     val: (string) (len=4) "meow"
    })
   }
  }),
  (string) (len=3) "dog": (*main.timeStore)(0x1400007c100)({
   timeIndex: (map[time.Time]*main.value) (len=5) {
    (time.Time) 2023-03-17 18:58:21.887096 -0600 MDT m=+0.166465210: (*main.value)(0x140000644e0)({
     stamp: (time.Time) 2023-03-17 18:58:21.887096 -0600 MDT m=+0.166465210,
     val: (string) (len=6) "wimper"
    }),
    (time.Time) 2023-03-17 18:58:21.80502 -0600 MDT m=+0.084387918: (*main.value)(0x14000064480)({
     stamp: (time.Time) 2023-03-17 18:58:21.80502 -0600 MDT m=+0.084387918,
     val: (string) (len=4) "woof"
    }),
    (time.Time) 2023-03-17 18:58:21.826097 -0600 MDT m=+0.105464793: (*main.value)(0x14000180060)({
     stamp: (time.Time) 2023-03-17 18:58:21.826097 -0600 MDT m=+0.105464793,
     val: (string) (len=4) "bark"
    }),
    (time.Time) 2023-03-17 18:58:21.846546 -0600 MDT m=+0.125914460: (*main.value)(0x140000644b0)({
     stamp: (time.Time) 2023-03-17 18:58:21.846546 -0600 MDT m=+0.125914460,
     val: (string) (len=4) "sigh"
    }),
    (time.Time) 2023-03-17 18:58:21.867033 -0600 MDT m=+0.146401293: (*main.value)(0x14000180090)({
     stamp: (time.Time) 2023-03-17 18:58:21.867033 -0600 MDT m=+0.146401293,
     val: (string) (len=5) "growl"
    })
   },
   values: ([]*main.value) (len=5 cap=8) {
    (*main.value)(0x14000064480)({
     stamp: (time.Time) 2023-03-17 18:58:21.80502 -0600 MDT m=+0.084387918,
     val: (string) (len=4) "woof"
    }),
    (*main.value)(0x14000180060)({
     stamp: (time.Time) 2023-03-17 18:58:21.826097 -0600 MDT m=+0.105464793,
     val: (string) (len=4) "bark"
    }),
    (*main.value)(0x140000644b0)({
     stamp: (time.Time) 2023-03-17 18:58:21.846546 -0600 MDT m=+0.125914460,
     val: (string) (len=4) "sigh"
    }),
    (*main.value)(0x14000180090)({
     stamp: (time.Time) 2023-03-17 18:58:21.867033 -0600 MDT m=+0.146401293,
     val: (string) (len=5) "growl"
    }),
    (*main.value)(0x140000644e0)({
     stamp: (time.Time) 2023-03-17 18:58:21.887096 -0600 MDT m=+0.166465210,
     val: (string) (len=6) "wimper"
    })
   }
  })
 }
} 

While that’s a lot of output, it does give us a solid visual of the data structure. Some key notes are:

If your lost, consider reviewing this output against the diagram in the Solution Design above.

Going forward, I won’t print the spew.Dump output, but if you’re following along you should continue testing with it.

Next we can implement the Get function:

// Get returns a value for key. If stamp is provided, the value with that
// timestamp is returned. If stamp is not provided, the last element inserted
// under key is returned.
func (tm *TimeMap) Get(key string, stamp ...time.Time) (*value, error) {
	ts, err := tm.getTimeStore(key)
	if err != nil {
		return nil, err
	}

	// return latest
	if len(stamp) < 1 {
		return ts.values[len(ts.values)-1], nil
	}
	lookup := stamp[0]
	if v, ok := ts.timeIndex[lookup]; ok {
		return v, nil
	}
	return nil, fmt.Errorf("key [%s] had no timestamp [%s]", key, lookup)
}

Get uses the variadic argument stamp to make it optional. This has some trade-offs but is a common approach in Go. When no stamp is provided, we return the last element within values as this will be the last element added. When the stamp is provided, we look it up in the timeIndex. By using time.Time, i’m requiring precision down to the nanosecond, which might not be desired, so you can work with the time precision on your insert / lookups. For the sake of testing, you can easily capture some known time stamps and test Get, heres how I’ve done it in main:

// validate Get for latest
spew.Dump(tm.Get("dog"))

// validate get for each stamp
for stamp := range tm.times["dog"].timeIndex {
	noise, err := tm.Get("dog", stamp)
	if err != nil {
		panic(err)
	}
	fmt.Printf("At %s: %s\n", stamp, noise.val)
}

Next let’s implement GetBefore:

// GetBefore returns all values stored for key, where their insertion time is
// equal to or before stamp.
func (tm *TimeMap) GetBefore(key string, stamp time.Time) ([]*value, error) {
	ts, err := tm.getTimeStore(key)
	if err != nil {
		return nil, err
	}

	// locate the lowest element with a timestamp lower than stamp
	idx := sort.Search(len(ts.values), func(i int) bool {
		return ts.values[i].stamp.After(stamp)
	})
	// return the list in for of [0:n). When n is 0, meaning the first element
	// is after stamp, an empty list is returned.
	return ts.values[:idx], nil
}

This uses binary search via the sort.Search function in the standard library. The nice thing about sort.Search is that it finds the lowest element that matches the condition. The index returned, idx, can be used to return a subset of the slice with the [:idx] notation. Note that in this notation idx is not inclusive, thus it works perfect for us in all cases. This can also be tested in main using the following logic:

// collect a list of all stamps for dog
collectedStamps := []time.Time{}
for _, v := range tm.times["dog"].values {
	collectedStamps = append(collectedStamps, v.stamp)
}
stampIdxToTest := 2
fmt.Printf("searching before %s\n", collectedStamps[stampIdxToTest])

// Uncomment this and pass testTime below to test a time before all elements
//testTime, err := time.Parse("2006-01-02", "2006-01-02")
// if err != nil {
// 	panic(err)
// }

before, err := tm.GetBefore("dog", collectedStamps[stampIdxToTest])
if err != nil {
	panic(err)
}
spew.Dump(before)

And that’s it! Now we have a fully functioning TimeMap is Set, Get, and GetBefore implemented.

Wrapping up, here are a few things to consider about this solution:

Closing

While the original interview question is interesting, it may be too complex for a standard coding interview. However, the exercise does offer valuable insight into the process of designing efficient data structures and algorithms, in contrast to a more straightforward binary search implementation. As a final note, I should disclose that I’m not a fan of live coding interviews like this. Trust me when I say that the pressure causes me to seem like I’m programming for the first time in my life 🙂. While I do think orgs that can afford it should consider a more thoughtful/practical interview process, these kinds of exercises are really fun!

Contents