Time-based KV Store in Go (Interview Question)
I recently got the Time Based Key-Value Store problem in a technical interview. It’s a fun exercise, and I believe there might a interesting spin one could put on it. What follows is not a recommendation for an interview question, more-so just an interesting modification to the problem for us to ponder and solve.
Content in video form if that's more of your thing:
The original question breaks down to the following:
- Create a new data-structure named
TimeMap
. - Implement
set(String key, String value, int timestamp)
- Implement
String get(String key, int timestamp)
- Based on
timestamp
, all values that are less than or equal to this value should be returned.
- Based on
The following is assumed:
- You may use the standard library.
- Keys are always
set
chronologically.
The standard solution to the problem is to use a map
of keys that hold a list
of values. Since the values can be assumed to be retrieved chronologically, you
can use binary search to find the first key larger than the time stamp asked for
in the get
call. Then you return all elements lower than it. This solution has
a time complexity of:
$$ set = O(1) \newline get = O(log\ n) $$
Enhancements
Some of these enhancements are inspired by the specific questions asked of me, while others are ideas I came up with (I'm trying not to give too much away about the specific interview).
In this new question, we’re going to keep most properties the same. Here are the changes:
set
no longer takes a timestamp; it’s calculated at time of call.- Time precision should be to the nanosecond.
get
accepts the timestamp optionally.- When
get
receives no timestamp return the latest value for the key. - When
get
receives a timestamp, return a single element at that timestamp. - Implement
String getBefore(String key, int timestamp)
, which effectively does what the originalget
asked for.
Extra considerations:
- Attempt to keep time complexity of
get
as low as possible.- Ideally
O(1)
. - Try to be efficient with memory, but lookup time is more important.
- Ideally
- String is used as the value for simplicity, but pretend the value can be a sizable object.
With these changes in mind, we need to enhance the data structure to optimize
around get
. Specifically, we need to set something up that can get us a
constant lookup time regardless of whether the user asks for a specific time or
not (ie gets the latest).
Solution: Design
First, let’s consider the data structure we want to build. In order to
facilitate O(1)
on get
, we can create something like:
TimeMap
is the top-level map that contains the keys. Each key points to a
unique timeStore
, which holds the values
list along with a timeIndex
using
the time stamp of a value as a key and pointing to the value in values
.
With this data-structure in place, we can return both the latest element and
specific (time-based) element in constant time. When the last element is
desired, we simply return the last element in the values
. When a specific
timestamp is provided, we can lookup the timestamp since they’re store in the
timeIndex
map as a key. If a key is present, we can return the value it points
to.
There are a few downsides to consider:
- While constant time, we are potentially doing 2 extra traversals via
pointers. This can mean:
- Memory of this data structure will grow faster as we maintain the map of
timeIndex
. - This pointer-based model means even less contiguous memory, thus our data structure has less chance (especially as it grows) to land in CPU cache.
- Insertion and deletion is more complex as we need to ensure we manage
timeIndex
correctly.
- Memory of this data structure will grow faster as we maintain the map of
Regarding the getBefore
implementation, this remains the same as the original
question. Thus our complexity ends up at:
$$ get = O(1) \newline set = O(1) \newline getRange = O(log\ n) $$
Solution Code
In this section, I’ll be showing and explaining the code chunk by chunk, to see the full file, visit the GitHub.
To begin, let’s create the data structure described above.
// TimeMap holds a key and all values added over time for that key.
type TimeMap struct {
times map[string]*timeStore
}
// timeStore is the underlying store for each key.
type timeStore struct {
timeIndex map[time.Time]*value // mapping to each value based on time
values []*value // underlying data store
}
// value represents the object stored.
type value struct {
stamp time.Time // timestamp of insertion
val string // string used for simplicity, but imagine a larger struct
}
Next, we’ll create some helper functions.
// New returns a new [TimeMap].
func New() TimeMap {
return TimeMap{
times: map[string]*timeStore{},
}
}
// newTimeStore is used when a new key is introduced. It intializes and returns
// the pointer to the new key's store.
func newTimeStore() *timeStore {
return &timeStore{
timeIndex: map[time.Time]*value{},
values: []*value{},
}
}
func (tm *TimeMap) getTimeStore(key string) (*timeStore, error) {
if ts, ok := tm.times[key]; ok {
return ts, nil
}
return nil, fmt.Errorf("key [%s] does not exist", key)
}
Note that our need for newTimeStore
is to have an easy way to create a new
store when a new key is created in the TimeMap
. Additionally, getTimeStore
is a helper function to quickly lookup a key’s existence and return an error if
not. This will be reused in our our get*
implementations.
The first function to implement is Set
:
// Set adds a value to a given key. When the value is added, its time stamp is
// recorded.
func (tm *TimeMap) Set(key, data string) {
// when the key exists, insert the value into the store
if timeStore, ok := tm.times[key]; ok {
stamp := time.Now()
v := &value{stamp: stamp, val: data}
timeStore.values = append(timeStore.values, v)
timeStore.timeIndex[stamp] = v
return
}
// when the key is new, create a new store for the key and recall this
// method
tm.times[key] = newTimeStore()
tm.Set(key, data)
}
Set
focuses on 2 outcomes, whether the key exists or not. When the key exists,
it’s simply a matter of recording the timestamp and inserting the value into the
existing timeStore
. When the key does not exist, we need to create a new
timeStore
associated with that key. Once that timeStore
exists, recursively
calling this function guarantees the value-insertion (condition 1) logic is run.
There is a case to be made that the recursive call is unnecessarily ‘clever’,
however I feel it’s fine.
With set in place, we can do some simple testing in a main()
function:
tm := New()
kvs := map[string][]string{
"dog": {"woof", "bark", "sigh", "growl", "wimper"},
"cat": {"hiss", "screech", "crash", "meow"},
}
for k, sounds := range kvs {
for _, sound := range sounds {
tm.Set(k, sound)
}
}
spew.Dump(tm)
spew
can be retrieved for your project via go get [github.com/davecgh/go-spew/spew](http://github.com/davecgh/go-spew/spew)
,
it’ll enable us to view the end state of the data structure:
(main.TimeMap) {
times: (map[string]*main.timeStore) (len=2) {
(string) (len=3) "cat": (*main.timeStore)(0x1400007c0c0)({
timeIndex: (map[time.Time]*main.value) (len=4) {
(time.Time) 2023-03-17 18:58:21.721497 -0600 MDT m=+0.000864293: (*main.value)(0x140000643f0)({
stamp: (time.Time) 2023-03-17 18:58:21.721497 -0600 MDT m=+0.000864293,
val: (string) (len=4) "hiss"
}),
(time.Time) 2023-03-17 18:58:21.742538 -0600 MDT m=+0.021904918: (*main.value)(0x14000180000)({
stamp: (time.Time) 2023-03-17 18:58:21.742538 -0600 MDT m=+0.021904918,
val: (string) (len=7) "screech"
}),
(time.Time) 2023-03-17 18:58:21.763635 -0600 MDT m=+0.043003168: (*main.value)(0x14000064420)({
stamp: (time.Time) 2023-03-17 18:58:21.763635 -0600 MDT m=+0.043003168,
val: (string) (len=5) "crash"
}),
(time.Time) 2023-03-17 18:58:21.783989 -0600 MDT m=+0.063356585: (*main.value)(0x14000180030)({
stamp: (time.Time) 2023-03-17 18:58:21.783989 -0600 MDT m=+0.063356585,
val: (string) (len=4) "meow"
})
},
values: ([]*main.value) (len=4 cap=4) {
(*main.value)(0x140000643f0)({
stamp: (time.Time) 2023-03-17 18:58:21.721497 -0600 MDT m=+0.000864293,
val: (string) (len=4) "hiss"
}),
(*main.value)(0x14000180000)({
stamp: (time.Time) 2023-03-17 18:58:21.742538 -0600 MDT m=+0.021904918,
val: (string) (len=7) "screech"
}),
(*main.value)(0x14000064420)({
stamp: (time.Time) 2023-03-17 18:58:21.763635 -0600 MDT m=+0.043003168,
val: (string) (len=5) "crash"
}),
(*main.value)(0x14000180030)({
stamp: (time.Time) 2023-03-17 18:58:21.783989 -0600 MDT m=+0.063356585,
val: (string) (len=4) "meow"
})
}
}),
(string) (len=3) "dog": (*main.timeStore)(0x1400007c100)({
timeIndex: (map[time.Time]*main.value) (len=5) {
(time.Time) 2023-03-17 18:58:21.887096 -0600 MDT m=+0.166465210: (*main.value)(0x140000644e0)({
stamp: (time.Time) 2023-03-17 18:58:21.887096 -0600 MDT m=+0.166465210,
val: (string) (len=6) "wimper"
}),
(time.Time) 2023-03-17 18:58:21.80502 -0600 MDT m=+0.084387918: (*main.value)(0x14000064480)({
stamp: (time.Time) 2023-03-17 18:58:21.80502 -0600 MDT m=+0.084387918,
val: (string) (len=4) "woof"
}),
(time.Time) 2023-03-17 18:58:21.826097 -0600 MDT m=+0.105464793: (*main.value)(0x14000180060)({
stamp: (time.Time) 2023-03-17 18:58:21.826097 -0600 MDT m=+0.105464793,
val: (string) (len=4) "bark"
}),
(time.Time) 2023-03-17 18:58:21.846546 -0600 MDT m=+0.125914460: (*main.value)(0x140000644b0)({
stamp: (time.Time) 2023-03-17 18:58:21.846546 -0600 MDT m=+0.125914460,
val: (string) (len=4) "sigh"
}),
(time.Time) 2023-03-17 18:58:21.867033 -0600 MDT m=+0.146401293: (*main.value)(0x14000180090)({
stamp: (time.Time) 2023-03-17 18:58:21.867033 -0600 MDT m=+0.146401293,
val: (string) (len=5) "growl"
})
},
values: ([]*main.value) (len=5 cap=8) {
(*main.value)(0x14000064480)({
stamp: (time.Time) 2023-03-17 18:58:21.80502 -0600 MDT m=+0.084387918,
val: (string) (len=4) "woof"
}),
(*main.value)(0x14000180060)({
stamp: (time.Time) 2023-03-17 18:58:21.826097 -0600 MDT m=+0.105464793,
val: (string) (len=4) "bark"
}),
(*main.value)(0x140000644b0)({
stamp: (time.Time) 2023-03-17 18:58:21.846546 -0600 MDT m=+0.125914460,
val: (string) (len=4) "sigh"
}),
(*main.value)(0x14000180090)({
stamp: (time.Time) 2023-03-17 18:58:21.867033 -0600 MDT m=+0.146401293,
val: (string) (len=5) "growl"
}),
(*main.value)(0x140000644e0)({
stamp: (time.Time) 2023-03-17 18:58:21.887096 -0600 MDT m=+0.166465210,
val: (string) (len=6) "wimper"
})
}
})
}
}
While that’s a lot of output, it does give us a solid visual of the data structure. Some key notes are:
TimeMap
has 2 keys:dog
andcat
.- Each key has its own
timeStore
. - Every
timeStore
holds the values invalues
and atimeIndex
, which uses the insertion time as the key and hold a pointer to the value.- for example, in the key
dog
, you can see thevalues
slice andtimeIndex
map hold the same value,0x14000064480
, which is the memory address ofwoof
.
- for example, in the key
If your lost, consider reviewing this output against the diagram in the
Solution Design
above.
Going forward, I won’t print the spew.Dump
output, but if you’re following
along you should continue testing with it.
Next we can implement the Get
function:
// Get returns a value for key. If stamp is provided, the value with that
// timestamp is returned. If stamp is not provided, the last element inserted
// under key is returned.
func (tm *TimeMap) Get(key string, stamp ...time.Time) (*value, error) {
ts, err := tm.getTimeStore(key)
if err != nil {
return nil, err
}
// return latest
if len(stamp) < 1 {
return ts.values[len(ts.values)-1], nil
}
lookup := stamp[0]
if v, ok := ts.timeIndex[lookup]; ok {
return v, nil
}
return nil, fmt.Errorf("key [%s] had no timestamp [%s]", key, lookup)
}
Get
uses the variadic argument stamp
to make it optional. This has some
trade-offs but is a common approach in
Go. When
no stamp is provided, we return the last element within values
as this will be
the last element added. When the stamp is provided, we look it up in the
timeIndex
. By using time.Time
, i’m requiring precision down to the
nanosecond, which might not be desired, so you can work with the time precision
on your insert / lookups. For the sake of testing, you can easily capture some
known time stamps and test Get
, heres how I’ve done it in main
:
// validate Get for latest
spew.Dump(tm.Get("dog"))
// validate get for each stamp
for stamp := range tm.times["dog"].timeIndex {
noise, err := tm.Get("dog", stamp)
if err != nil {
panic(err)
}
fmt.Printf("At %s: %s\n", stamp, noise.val)
}
Next let’s implement GetBefore
:
// GetBefore returns all values stored for key, where their insertion time is
// equal to or before stamp.
func (tm *TimeMap) GetBefore(key string, stamp time.Time) ([]*value, error) {
ts, err := tm.getTimeStore(key)
if err != nil {
return nil, err
}
// locate the lowest element with a timestamp lower than stamp
idx := sort.Search(len(ts.values), func(i int) bool {
return ts.values[i].stamp.After(stamp)
})
// return the list in for of [0:n). When n is 0, meaning the first element
// is after stamp, an empty list is returned.
return ts.values[:idx], nil
}
This uses binary
search
via the sort.Search function in the standard library.
The nice thing about sort.Search
is that it finds the
lowest element that matches
the condition. The index returned, idx
, can be used to return a subset of the
slice with the [:idx]
notation. Note that in this notation idx
is not
inclusive, thus it works perfect for us in all cases. This can also be tested in
main
using the following logic:
// collect a list of all stamps for dog
collectedStamps := []time.Time{}
for _, v := range tm.times["dog"].values {
collectedStamps = append(collectedStamps, v.stamp)
}
stampIdxToTest := 2
fmt.Printf("searching before %s\n", collectedStamps[stampIdxToTest])
// Uncomment this and pass testTime below to test a time before all elements
//testTime, err := time.Parse("2006-01-02", "2006-01-02")
// if err != nil {
// panic(err)
// }
before, err := tm.GetBefore("dog", collectedStamps[stampIdxToTest])
if err != nil {
panic(err)
}
spew.Dump(before)
And that’s it! Now we have a fully functioning TimeMap
is Set
, Get
, and
GetBefore
implemented.
Wrapping up, here are a few things to consider about this solution:
- All variants of get return a pointer to a value, this would allow the caller to mutate the value, so returning a copy might be desirable.
TimeMap
is not thread safe, thread safety would require locks/mutexs.- There are many things this code is simplifying for brevity, e.g.
panic(err)
. After all, this is a coding interview, so it’s not meant to be practical 😆.
Closing
While the original interview question is interesting, it may be too complex for a standard coding interview. However, the exercise does offer valuable insight into the process of designing efficient data structures and algorithms, in contrast to a more straightforward binary search implementation. As a final note, I should disclose that I’m not a fan of live coding interviews like this. Trust me when I say that the pressure causes me to seem like I’m programming for the first time in my life 🙂. While I do think orgs that can afford it should consider a more thoughtful/practical interview process, these kinds of exercises are really fun!