Khanlou | The Flat Cache

October 3, 2017

The Flat Cache

The model layer of a client is a tough nut to crack. Because it’s not the canonical representation of the data (that representation lives on the server), the data must live a fundamentally transient form. The version of the data that the app has is essentially a cache of what lives on the network (whether that cache is in memory or on-disk), and if there’s anything that’s true about caches, it’s that they will always end up with stale data. The truth is multiplied by the number of caches you have in your app. Reducing the number of cached versions of any given object decreases the likelihood that it will be out of date.

Core Data has an internal feature that ensures that there is never more than one instance of an object for a given identifier (within the same managed object context). They call this feature “uniquing”, but it is known more broadly as the identity map pattern. I want to steal it without adopting the rest of baggage of Core Data.

I think of this concept as a “flat cache”. A flat cache is basically just a big dictionary. The keys are a composite of an object’s type name and the object’s ID, and the value is the object. A flat cache normalizes the data in it, like a relational database, and all object-object relationships go through the flat cache. A flat cache confers several interesting benefits.

Normalizing the data means that you’ll use less bandwidth while the data is in-flight, and less memory while the data is at rest.
Because data is normalized, modifying or updating a resource in one place modifies it everywhere.
Because relationships to other entities go through the flat cache, back references with structs are now possible. Back references with classes don’t have to be weak.
With a flat cache of structs, any mutation deep in a nested struct only requires the object in question to change, instead of the object and all of its parents.

In this post, we’ll discuss how to make this pattern work in Swift. First, you’ll need the composite key.

struct FlatCacheKey: Equatable, Hashable {

    let typeName: String
    let id: String
    
    static func == (lhs: FlatCacheKey, rhs: FlatCacheKey) -> Bool {
	    return lhs.typeName == rhs.typeName && lhs.id == rhs.id
    }
		    
	var hashValue: Int {
		return typeName.hashValue ^ id.hashValue
	}
}

We can use a protocol to make the generation of flat cache keys easier:

protocol Identifiable {
    var id: String { get }	
}

protocol Cachable: Identifiable { }

extension Cachable {
    static var typeName: String {
        return String(describing: self)
    }
    
    var flatCacheKey: FlatCacheKey {
        return FlatCacheKey(typeName: Self.typeName, id: id)
    }
}

All of our model objects already have IDs so they can trivially conform to Cachable and Identifiable.

Next, let’s get to the basic flat cache.

class FlatCache {
    
    static let shared = FlatCache()
    
    private var storage: [FlatCacheKey: Any] = [:]
    
    func set<T: Cachable>(value: T) {
        storage[value.flatCacheKey] = value
    }
    
    func get<T: Cachable>(id: String) -> T? {
        let key = FlatCacheKey(typeName: T.typeName, id: id)
        return storage[key] as? T
    }
    	    
    func clearCache() {
        storage = [:]
    }
    
}

Not too much to say here, just a private dictionary and get and set methods. Notably, the set method only takes one parameter, since the key for the flat cache can be derived from the value.

Here’s where the interesting stuff happens. Let’s say you have a Post with an Author. Typically, the Author would be a child of the Post:

struct Author {
	let name: String
}

struct Post {
	let author: Author
	let content: String
}

However, if you wanted back references (so that you could get all the posts by an author, let’s say), this isn’t possible with value types. This kind of relationship would cause a reference cycle, which can never happen with Swift structs. If you gave the Author a list of Post objects, each Post would be a full copy, including the Author which would have to include the author’s posts, et cetera. You could switch to classes, but the back reference would cause a retain cycle, so the reference would need to be weak. You’d have to manage these weak relationships back to the parent manually.

Neither of these solutions is ideal. A flat cache treats relationships a little differently. With a flat cache, each relationship is fetched from the centralized identity map. In this case, the Post has an authorID and the Author would have a list of postIDs:

struct Author: Identifiable, Cachable {
	let id: String
	let name: String
	let postIDs: [String]
}

struct Post: Identifiable, Cachable {
	let id: String
	let authorID: String
	let content: String
}

Now, you still have to do some work to fetch the object itself. To get the author for a post, you would write something like:

FlatCache.shared.get(id: post.authorID) as Author

You could put this into an extension on the Post to make it a little cleaner:

extension Post {
	var author: Author? {
		return FlatCache.shared.get(id: authorID)
	}
}

But this is pretty painful to do for every single relationship in your model layer. Fortunately, it’s something that can be generated! By adding an annotation to the ID property, you can tell a tool like Sourcery to generate the computed accessors for you. I won’t belabor the explanation of the template, but you can find it here. If you have trouble reading it or understanding how it works, you can read the Sourcery in Practice post.

It will let you write Swift code like this:

struct Author {
	let name: String
	// sourcery: relationshipType = Post
	let postIDs: [String]
}

struct Post {
	// sourcery: relationshipType = Author
	let authorID: String
	let content: String
}

Which will generate a file that looks like this:

// Generated using Sourcery 0.8.0 — https://github.com/krzysztofzablocki/Sourcery
// DO NOT EDIT
	
extension Author {
	var posts: [Post] {
		return postIDs.flatMap({ id -> Post? in
			return FlatCache.shared.get(id: id)
		})
	}
}

extension Post {	
	var author: Author? {
		return FlatCache.shared.get(id: authorID)
	}	
}

This is the bulk of the pattern. However, there a few considerations to examine.

JSON

Building this structure from a tree of nested JSON is messy and tough. The system works a lot better if you use a structure of the JSON looks like the structure of the flat cache. All the objects exist in a big dictionary at the top level (one key for each type of object), and the relationships are defined by IDs. When a new JSON payload comes in, you can iterate over this top level, create all your local objects, and store them in the flat cache. Inform the requester of the JSON that the new objects have been downloaded, and then it can fetch relevant objects directly from the flat cache. The ideal structure of the JSON looks a lot like JSON API, although I’m not surpassingly familiar with JSON API.

Missing Values

One big difference between managing relationships directly and managing them through the flat cache is that with the flat cache, there is a (small) chance that the relationship won’t be there. This might happen because of a bug on the server side, or it might happen because of a consistency error when mutating the data in the flat cache (we’ll discuss mutation more in a moment). There are a few ways to handle this:

Return an Optional. What we chose to do for this app is return an optional. There are a lot of ways of handling missing values with optionals, including optional chaining, force-unwrapping, if let, and flatmapping, so it isn’t too painful to have to deal with an optional, and there aren’t any seriously deleterious effects to your app if a value is missing.
Force unwrap. You could choose to force-unwrap the relationship. That’s putting a lot of trust in the source of the data (JSON in our case). If a relationship is missing becuase of a bug on your server, your app will crash. This is really bad, but on the bright side, you’ll get a crash report for missing relationships, and you can fix it on the server-side quickly.
Return a Promise. While a promise is the most complex of these three solutions to deal with at the call site, the benefit is that if the relationship doesn’t exist, you can fetch it fresh from the server and fulfill the promise a few seconds later.

Each choice has its downsides. However, one benefit to code generation is that you can support more than one option. You can synthesize both a promise and an optional getter for each relationship, and use whichever one you want at the call site.

Mutability

So far I’ve only really discussed immutable relationships and read-only data. The app where we’re using the pattern has entirely immutable data in its flat cache. All the data comes down in one giant blob of JSON, and then the flat cache is fully hydrated. We never write back up to the server, and all user-generated data is stored locally on the device.

If you want the same pattern to work with mutable data, a few things change, depending on if your model objects are classes or structs.

If they’re classes, they’ll need to be thread safe, since each instance will be shared across the whole app. If they’re classes, mutating any one reference to an entity will mutate them all, since they’re all the same instance.

If they’re structs, your flat cache will need to be thread safe. The Sourcery template will have to synthesize a setter as well as a getter. In addition, anything long-lived (like a VC) that relies on the data being updated regularly should draw its value directly from the flat cache.

final class PostVC {
    var postID: String
    
    init(postID: String) {
        self.postID = postID
    }
    
    var post: Post? {
		get {
			return cache.get(id: postID)
		}
		set {
			guard let newValue = newValue else { return }
			cache.set(value: newValue)
		}
	}
}

To make this even better, we can pull a page from Chris Eidhof’s most recent blog post about struct references, and roll this into a type.

class Cached<T: Cachable> {
	
	var cache = FlatCache.shared
	
	let id: String
	
	init(id: String) {
		self.id = id
	}
	
	var value: T? {
		get {
			return cache.get(id: id)
		}
		set {
			guard let newValue = newValue else { return}
			cache.set(value: newValue)
		}
	}
}

And in your VC, the post property would be replaced with something like:

lazy var cachedPost: Cached<Post>(id: postID)

Lastly, if your system has mutations, you need a way to inform objects if a mutation occurs. NSNotificationCenter could be a good system for this, or some kind of reactive implementation where you filter out irrelevant notifications and subscribe only to the ones you care about (a specific post with a given post ID, for example).

Singletons and Testing

This pattern relies on a singleton to fetch the relationships. This has a chance of hampering testability. There are a few different options to handle this, including injecting the flat cache into the various objects that use it, and having the flat cache be a weak var property on each model object instead of a static let. At that point, any flat cache would be responsible for ensuring the integrity of its child objects references to the flat cache. This is definitely added complexity, and it’s a tradeoff that comes along with this pattern.

References and other systems

Redux recommends a pattern similar to this. They have a post describing how to structure your data called Normalizing State Shape.
They discuss it a bit in this post as well:

The solution to caching GraphQL is to normalize the hierarchical response into a flat collection of records. Relay implements this cache as a map from IDs to records. Each record is a map from field names to field values. Records may also link to other records (allowing it to describe a cyclic graph), and these links are stored as a special value type that references back into the top-level map. With this approach each server record is stored once regardless of how it is fetched.