Khanlou | Advanced Sequences On The Server

June 27, 2017

Advanced Sequences On The Server

Beacon is built with Swift on the server. Since we have all of the niceties of Swift in this new environment, we can use our knowledge and experience from building iOS app to build efficient server applications. Today, we’ll look at two examples of working with sequences on the server to achieve efficiency and performance.

Over the network

For its social graph, Beacon needs to find your mutual Twitter followers — that is, the people you follow that follow you back. There’s no Twitter API for this, so we have to get the list of follower IDs and the list of following IDs, and intersect them. The Twitter API batches these IDs into groups of 5,000. While people rarely follow more than 5,000 people, some users on Beacon have a few hundred thousand Twitter followers, so these will have to be batched. Because of these contraints, this problem provides a pretty interesting case study for advanced sequence usage.

We do this on the server instead of the client, because there will be a lot of requests to the Twitter API, and it doesn’t make much sense to perform those on a user’s precarious cellular connection. For our backend, we use the Vapor framework, and Vapor’s request handling is completely synchronous. Because of this, there’s no sense in using completion blocks for network requests. You can just return the result of the network request as the result of your function (and throw if anything goes wrong). For an example, let’s fetch the IDs of the first 5,000 people that someone follows:

let following = try client.send(request: TwitterFollowingRequest())

To perform the batching, the Twitter API uses the concept of cursors. To get the first batch, you can leave off the cursor, or pass -1. Each request returns a new next_cursor, which you give back to Twitter when you want the next batch. This concept of cursors fits nicely into Swift’s free function sequence(state:next:). Let’s examine this function’s signature:

func sequence<T, State>(state: State, next: @escaping (inout State) -> T?) -> UnfoldSequence<T, State>

This function is generic over two types: T and State. We can tell from the signature that we need provide an initial State as a parameter, and we also provide a closure that takes an inout State and returns an optional T. inout means we can mutate the state, so this is how we update the state for the next iteration of the sequence. The T that we return each time will form our sequence. Returning nil instead of some T ends the sequence.

Because the Fibonacci sequence is the gold standard for stateful sequences, let’s take a look at using sequence(state:next:) to create a Fibonacci sequence:

let fibonacci = sequence(state: (1, 1), next: { (state: inout (Int, Int)) -> Int? in
    let next = state.0 + state.1
    state = (state.1, next)
    return next
})

The state in this case has type (Int, Int) and represents the last two numbers in the sequence. First, we figure out the next number by adding the two elements in the tuple together; then, we update the state variable with the new last two values; finally, we return the next element in the sequence.

(Note that this sequence never returns nil, so it never terminates. It is lazy, however, so none of this code is actually evaluated until you ask for some elements. You can use .prefix(n) to limit to the first n values.)

To build our sequence of Twitter IDs, we start with the state "-1", and build our sequence from there.

let lazyFollowerIDs = sequence(state: "-1", next: { (state) -> [Int]? in

})

We need to send the request in this block, and return the IDs from the result of the request. The request itself looks a lot like the TwitterFollowingRequest from above, except it’s now for followers instead.

let lazyFollowerIDs = sequence(state: "-1", next: { (state) -> [Int]? in
    let result = try? self.client.send(request: TwitterFollowersRequest(cursor: state))
    return result?.ids
})

Right now, this request never updates its state, so it fetches the same page over and over again. Let’s fix that.

let lazyFollowerIDs = sequence(state: "-1", next: { (state) -> [Int]? in
    let result = try? self.client.send(request: TwitterFollowersRequest(cursor: state))
    state = result?.nextCursor ?? "0"
    return result?.ids
})

For the last page, Twitter will return "0" for the next_cursor, so we can use that for our default value if the request fails. (If the request fails, result?.ids will also be nil, so the sequence will end anyway.)

Lastly, let’s put a guard in place to catch the case when Twitter has shown us the last page.

let lazyFollowerIDs = sequence(state: "-1", next: { (state) -> [Int]? in
    guard state != "0" else { return nil }
    let result = try? self.client.send(request: TwitterFollowersRequest(cursor: state))
    state = result?.nextCursor ?? "0"
    return result?.ids
})

(If we added a little more error handling here, it would look almost identical to the actual code that Beacon uses.)

This sequence is getting close. It’s already lazy, like our Fibonacci sequence, so it won’t fetch the second batch of 5,000 items until the 5,001st element is requested. It needs one more big thing: it’s not actually a sequence of IDs yet. It’s still a sequence of arrays of IDs. We need to flatten this into one big sequence. For this, Swift has a function called joined() that joins a sequence of sequences into a big sequence. This function (mercifully) preserves laziness, so if the sequence was lazy before, it’ll stay lazy. All we have to do is add .joined() to the end of our expression.

To get our mutual follows from this lazyFollowerIDs sequence, we need something to intersect the followers and the following. To make this operation efficient, let’s turn the following IDs into a set. This will make contains lookup really fast:

let followingIDSet = Set(following.ids)

We make sure to filter over the lazyFollowerIDs since that sequence is lazy and we’d like to iterate over it only once.

let mutuals = lazyFollowerIDs.filter({ id in followingIDSet.contains(id) })

This reads “keep only the elements from lazyFollowerIDs that can be be found in followingIDSet”. Apply a little syntactic sugar magic to this, and you end up with a pretty terse statement:

let mutuals = lazyFollowerIDs.filter(followingIDSet.contains)

Off the disk

A similar technique can be used for handling batches of items from the database.

Vapor’s ORM is called Fluent. In Fluent, all queries go through the Query type, which is type parameterized on T, your entity, e.g, User. Queries are chainable objects, and you can call methods like filter and sort on them to refine them. When you’re done refining them, you can call methods like first(), all() , or count() to actually execute the Query.

While Fluent doesn’t have the ability to fetch in batches, its interface allows us to build this functionality easily, and Swift’s lazy sequence mechanics let us build it efficiently.

We know we’ll need a function on every Query. We don’t know what kind of Sequence we’ll be returning, but we’ll use Sequence<T> as a placeholder for now.

extension Query {
	func inBatches(of batchSize: Int) throws -> Sequence<T> {
		
	}
}

First, we need to know how many items match our query, so we can tell how many batches we’ll be fetching. Because the object we’re inside already represents the query that we’re going to be fetching with, and it already has all the relevant filters and joins, we can just call count() on self, and get the number of objects that match the query.

extension Query {
	func inBatches(of batchSize: Int) throws -> Sequence<T> {
		let count = try self.count()
		
	}
}

Once we have the count, we can use Swift’s stride(from:to:by:) to build a sequence that will step from 0 to our count with a stride of our batchSize.

extension Query {
	func inBatches(of batchSize: Int) throws -> Sequence<T> {
		let count = try self.count()
		stride(from: 0, to: self.count(), by: batchSize)
		
	}
}

Next, we want to transform each step of this stride (which represents one batch) into a set of the objects in question.

extension Query {
	func inBatches(of batchSize: Int) throws -> Sequence<T> {
		let count = try self.count()
		stride(from: 0, to: self.count(), by: batchSize)
			.map({ offset in
				return (try? self.limit(batchSize, withOffset: offset).all()) ?? []
			})
	}
}

Because .all() is a throwing function, we need to handle its error somehow. This will be a lazy sequence, so the map block will get stored and executed later. It is @escaping. This means that we can’t just throw, because we can’t guarantee that we’d be in a position to catch that error. Therefore, we just discard the error and return an empty array if it fails.

If we try to execute this as-is, the map will run instantly and fetch all of our batches at once. Not ideal. We have to add a .lazy to our chain to ensure that that each fetch doesn’t happen until an item from that batch is requested.

extension Query {
	func inBatches(of batchSize: Int) throws -> Sequence<T> {
		let count = try self.count()
		stride(from: 0, to: self.count(), by: batchSize)
			.lazy
			.map({ offset in
				return (try? self.limit(batchSize, withOffset: offset).all()) ?? []
			})
	}
}

The last step here, like the Twitter example, is to call .joined() to turn our lazy sequence of arrays into one big lazy sequence.

extension Query {
	func inBatches(of batchSize: Int) throws -> Sequence<T> {
		let count = try self.count()
		return stride(from: 0, to: self.count(), by: batchSize)
			.lazy
			.map({ offset in
				return (try? self.limit(batchSize, withOffset: offset).all()) ?? []
			})
			.joined()
	}
}

When we run this code, we see that the our big Sequence chain returns a LazySequence<FlattenSequence<LazyMapSequence<StrideTo<Int>, [T]>>>. This type is absurd. We can see all the components of our sequence chain in there, but we actually don’t care about those implementation details. It would be great if we could just erase the type and be left with something simple. This technique is called a type erasing and it will hide these details. AnySequence is a type eraser that the Swift standard library provides for this exact purpose. AnySequence also will become our return type.

extension Query {
    func inBatches(of batchSize: Int) throws -> AnySequence<T> {
		let count = try self.count()
        return AnySequence(stride(from: 0, to: count, by: batchSize)
            .lazy
            .map({ (offset) -> [T] in
                return (try? self.limit(batchSize, withOffset: offset).all()) ?? []
            })
            .joined())
    }
}

We can now write the code we want at the callsite:

try User.query().sort("id", .ascending)
	.inBatches(of: 20)
	.forEach({ user in
		//do something with user
	})

This is reminiscent of Ruby’s find_in_batches or the property fetchBatchSize on NSFetchRequest, which returns a very similar lazy NSArray using the NSArray class cluster.

This is not the first time I’ve said this, but Swift’s sequence handling is exceptionally robust and fun to work with. Understanding the basics of Swift’s sequences enable you to compose those solutions to tackle bigger and more interesting problems.