Khanlou | Infinite Collections

April 30, 2018

Infinite Collections

There’s quite a bit of conversation on Swift Evolution about the potential future of Sequence and Collection hierarchy. There’s a number of problems that the community is trying to solve and a variety of approaches that have been presented to solve them. Several of these approaches include the ability to create infinite collections, and every time anyone brings this up, someone invariably asks how they would work. Nothing in the type system prevents us from making an infinite collection — it’s purely a semantic constraint — meaning that while we can write infinite collections in Swift today, we’re not supposed to. It violates the assumptions that other people’s code will make about our object. In this post, I’ll take an infinite sequence and demonstrate how we could turn it into an infinite collection.

The Fibonacci numbers are a great example of an infinite sequence. For a quick refresher, each element in that sequence is the sum of the previous two numbers.

Swift has pretty good facility for defining new sequences quickly. Two useful functions are sequence(state:next:) and sequence(first:next). Here, we’ll use the state version:

let fib = sequence(state: (0, 1), next: { state -> Int? in
    let next = state.0 + state.1
    state = (state.1, next)
    return next
})

Calling fib.prefix(10) will give you the first 10 Fibonacci numbers.

Note that this is an infinite sequence! If you try to map over this sequence (which the type system fully allows!), your program will just spin infinitely until it runs out of integers or the heat death of the universe, whichever comes first.

To convert this into an infinite collection, we need to think about a few things. First, how will we represent an index to a value? For an array, this is a number that probably represents a memory offset. For a string, this might represents a byte offset that that character starts at. It needs to be some value that lets us retrieve the value at that index at constant time, or O(1).

One of the interesting things about iterators is that their current state represents the data that you need to get to the next value in constant time. So any deterministic sequence can be represented as a collection, by using its iterator’s state as an index. Our state before is (Int, Int), so we can start off by using that type for our Index. These are the four methods we need to define a Collection:

struct FibonacciCollection: Collection {

	typealias Index = (Int, Int)
	
	var startIndex: Index {
		return (0, 1)
	}
	
	func index(after previous: Index) -> Index {
		return (previous.1, previous.0 + previous.1)
	}
	
	var endIndex: Index {
		fatalError("This collection has no endIndex, because it is infinite")
	}
	
	subscript(index: Index) -> Int {
		return index.0 + index.1
	}
}

We can use (Int, Int) (our state from the fibonacci numbers when they were a sequence) as the index, and use (0, 1) as our starting value (just like our sequence).

But there are a few problems with this approach. To iterate over an arbitrary collection, you can imagine constructing a big while loop:

var index = collection.startIndex
while index < endIndex {
	let value = collection[index]
	// use `value` somehow
	index = collection.index(after: index)
}

(This isn’t how we’d use our collection, since we know it’s infinite, but this is how normal collections work.) Two things stand out. First, we need the Index type to be Comparable — so we can’t use a tuple — and we need endIndex to be defined so that we can check against it — so we can’t fatalError (ding!).

So we need a new index type: something that’s Comparable and something that’s unreachable. The unreachable value should compute to being “greater” than all the reachable values (since it goes at the “end” of the collection). Our first attack is to model this as a struct of two integers, to solve the comparability problem:

struct FibIndex {
	let first: Int
	let second: Int
}

extension FibIndex: Comparable {
	static func == (lhs: FibIndex, rhs: FibIndex) -> Bool {
		return lhs.first == rhs.first && lhs.second == rhs.second
	}
	
	static func < (lhs: FibIndex, rhs: FibIndex) -> Bool {
		return lhs.first < rhs.first
	}
}

This solves the comparability problem, but not the unreachability problem. We need a way to represent one more value, a value that’s separate from all the others. Let’s start over, but with an enum this time:

enum FibIndex {
	case reachable(Int, Int)
	case unreachable
}

This could solve both of our problems. Unfortunately, it makes writing == and < pretty messy:

extension FibIndex: Comparable {
	static func == (lhs: FibIndex, rhs: FibIndex) -> Bool {
		switch (lhs, rhs) {
		case (.unreachable, .unreachable):
			return true
		case let (.reachable(l1, l2), .reachable(r1, r2)):
			return l1 == r1 && l2 == r2
		case (.reachable, .unreachable):
			return false
		case (.unreachable, .reachable):
			return false
		}
	}
	
	static func < (lhs: FibIndex, rhs: FibIndex) -> Bool {
		switch (lhs, rhs) {
		case (.unreachable, .unreachable):
			return false
		case let (.reachable(l1, l2), .reachable(r1, r2)):
			return l1 < r1
		case (.reachable, .unreachable):
			return true
		case (.unreachable, .reachable):
			return false
		}
	}
}

(There are some ways to clean this up with better pattern matching, but I’m going with the explicitness of the full patterns here.)

Let’s add some helpers:

extension FibIndex {
	var sum: Int {
		switch self {
		case .unreachable:
			fatalError("Can't get the sum at an unreachable index")
		case let .reachable(first, second): 
			return first + second
		}
	}
	
	var next: FibIndex {
		switch self {
		case .unreachable:
			fatalError("Can't get the next value of an unreachable index")
		case let .reachable(first, second): 
			return .reachable(second, first + second)
		}
	}
}

And now we can complete our FibonacciCollection:

struct FibonacciCollection: Collection {

	typealias Index = FibIndex
	
	var startIndex: Index {
		return .reachable(0, 1)
	}
	
	func index(after previous: Index) -> Index {
		return previous.next
	}
	
	var endIndex: Index {
		return .unreachable
	}
	
	subscript(index: Index) -> Int {
		return index.sum
	}
}

This is pretty good! We get all of the methods from the Sequence and Collection protocols for free. Unfortunately, this is still an infinite collection, and just like the infinite sequence, we can’t map over it, or we’ll risk looping forever. There are new things on the Collection protocol that we also can’t do — like get the count of this collection. That will also spin forever.

There is one more tweak I’d like to show here, which is extracting the Fibonacci-specific parts of this and making a generic InfiniteCollectionIndex. To do this, we’d keep our basic enum’s structure, but instead of the .reachable case having type (Int, Int), we’d need to put a generic placeholder there. How would that look?

Let’s call the inner index type WrappedIndex. We need to make sure that the wrapped Index is comparable:

enum InfiniteCollectionIndex<WrappedIndex: Comparable>: Comparable {
	case reachable(WrappedIndex)
	case unreachable
}

The Equatable and Comparable implementation stay mostly the same, but instead of comparing integers, they just forward to WrappedIndex:

extension InfiniteCollectionIndex: Comparable {
	static func == (lhs: InfiniteCollectionIndex, rhs: InfiniteCollectionIndex) -> Bool {
		switch (lhs, rhs) {
		case (.unreachable, .unreachable):
		    return true
		case let (.reachable(left), .reachable(right)):
		    return left == right
		case (.reachable, .unreachable):
		    return false
		case (.unreachable, .reachable):
		    return false
		}
	}
	
	static func < (lhs: InfiniteCollectionIndex, rhs: InfiniteCollectionIndex) -> Bool {
		switch (lhs, rhs) {
		case (.unreachable, .unreachable):
			return false
		case let (.reachable(left), .reachable(right)):
			return left < right
		case (.reachable, .unreachable):
			return true
		case (.unreachable, .reachable):
			return false
		}
	}
}

And one more helper, because there will be a lot of situations where we want to assume that that the .unreachable value is, well, unreachable:

extension InfiniteCollectionIndex {
	func assumingReachable<T>(_ block: (WrappedIndex) -> T) -> T {
		switch self {
		case .unreachable:
			fatalError("You can't operate on an unreachable index")
		case let .reachable(wrapped):
			return block(wrapped)
		}
	}
	
	func makeNextReachableIndex(_ block: (WrappedIndex) -> WrappedIndex) -> InfiniteCollectionIndex {
		return .reachable(assumingReachable(block))
	}
}

This function takes a block that assumes that the value is .reachable, and returns whatever that block returns. If the value is .unreachable, it crashes.

We can now use the struct form of FibIndex from above:

struct FibIndex {
	let first: Int
	let second: Int
}

extension FibIndex: Comparable {
	static func == (lhs: FibIndex, rhs: FibIndex) -> Bool {
		return lhs.first == rhs.first && lhs.second == rhs.second
	}
	
	static func < (lhs: FibIndex, rhs: FibIndex) -> Bool {
		return lhs.first < rhs.first
	}
}

I won’t go into too much of the details here, but this allows us to define simple Comparable index types, and make them into infinite indexes easily:

struct FibonacciCollection: Collection {

	typealias Index = InfiniteCollectionIndex<FibIndex>
	
	var startIndex: Index {
		return .reachable(FibIndex(first: 0, second: 1))
	}
	
	func index(after previous: Index) -> Index {
		return previous.makeNextReachableIndex({
			FibIndex(first: $0.second, second: $0.first + $0.second)
		})
	}
	
	var endIndex: Index {
		return .unreachable
	}
	
	subscript(index: Index) -> Int {
		return index.assumingReachable({ $0.first + $0.second })
	}
}

It’s not clear which way the tides will ultimately shift with Swift’s Collection model, but if and when infinite collections become legal in Swift, this technique will help you define them easily.