Khanlou | Stealing

February 8, 2018

Stealing

If you want some extreme bikeshedding on a very simple topic, this is the post for you.

For the folks who have been at any of the recent conferences I’ve presented at, you’ve probably seen my talk, You Deserve Nice Things. The talk is about Apple’s reticence to provide conveniences for its various libraries. It draws a lot on an old post of mine, Categorical, as well as bringing in a lot of new Swift stuff.

One part of the standard library that’s eminently extendable is Sequence and friends. Because of the number of different operations we want to perform on these components, even the large amount of affordances that the standard library does provide (standard stuff like map, filter, as well as more abstruse stuff like partition(by:) and lexicographicallyPrecedes), it still doesn’t cover the breadth of operations that are useful in our day to day programming life.

In the talk, I propose a few extensions that I think are useful, including any, all, none, and count(where:), eachPair, chunking, and a few others. None of these ideas are original to me. They’re ideas lifted wholesale from other languages, primary Ruby and its Enumerable module. Enumerable is a useful case study because anything that’s even marginally useful gets added to this module, and Ruby developers reap the benefits.

Swift’s standard library is a bit more conservative, but its ability to extend types and specifically protocols makes the point mostly moot. You can bring your own methods as needed. (In Categorical, I make the case that this is the responsibility of the vendor, but until they’re willing to step up, we’re on our own.)

I have an app where I need to break up a gallery of images in groups based on date. The groups could be based on individual days, but we thought grouping into “sessions” would be better. Each session is defined by starting more than an hour after the last session ended. More simply, if there’s a gap between two photos of more than hour, that should signal the start of a new session.

Since we’re splitting a Sequence, the first thought is to use something built in: there’s a function called split, which takes a bunch of parameters (maxSplits, omittingEmptySubsequences and a block called isSeparator to determine if an element defines a split). However, we can already see from the type signature that this function isn’t quite going to do what we want. The isSeparator function yields only one element, which makes it hard to tell if there should be a split between two consecutive elements, which is what we’re actually trying to do. In point of fact, this function is consumes the separators, because it’s a more generic version of the String function split, which is useful in its own right.

No, we need something different.

In her post, Erica Sadun has some answers for us: she’s got a function that works sort of like what we want. In this case, the block takes the current element and the current partition, and you can determine if the current element fits in the current partition.

extension Sequence {
    public func partitioned(at predicate: (_ element: Iterator.Element, _ currentPartition: [Iterator.Element]) -> Bool ) -> [[Iterator.Element]] {
        var current: [Iterator.Element] = []
        var results: [[Iterator.Element]] = []
        for element in self {
            guard !current.isEmpty else { current = [element]; continue }
            switch predicate(element, current) {
            case true: results.append(current); current = [element]
            case false: current.append(element)
            }
        }
        guard !current.isEmpty else { return results }
        results.append(current)
        return results
    }
}

I’ve got a few small gripes with this function as implemented:

The name. Partitioning in the standard library currently refers to changing the order of a MutableCollection and returning a pivot where the collection switches from one partition to another. I wouldn’t mind calling it something with split in the name, but as mentioned before, that usually consumes the elements. I think calling this slicing is the best thing to do. Slicing is also a concept that already exists in the standard library (taking a small part of an existing Collection) but in a lot of ways, that’s actually what we’re doing in this case. We’re just generating more than one slice.

The name again. Specifically, the at in partition(at:) doesn’t seem exactly right. Since we’re not consuming elements anymore, the new partition has to happen either before the given element, or after it. The function’s name doesn’t tell us which. If we change the signature of the function to return a consecutive pair of elements, we can change its name to slice(between:).

This has the added benefit of no longer requiring a force unwrap in usage. Erica’s whole problem with this code is that passing back a collection that is known to be non-empty (*COUGH*), means that she has to force unwrap access to the last element:

testArray.partitioned(at: { $0 != $1.last! })

If we change the function to slice(between:), we can simplify this way down, to:

testArray.slice(between: !=)

Way nicer.

Now that we’ve beaten the name into the ground, let’s move onto discussion about the implementation. Specifically, there is some duplication in the code that I find problematic. Two expressions in particular:

First,

current = [element]

Second,

results.append(current)

This code may not look like much — only a few characters, you protest! — but it represents a duplication in concept. If these expressions needed to expand to do more, they’d have to be expanded in two places. This was excruciatingly highlighted when porting this code over to my gallery app (which is Objective-C) I wanted to wrap the image collections in a class called a PhotoSection. Creating the PhotoSection in two places made it painfully obvious that this algorithm duplicates code.

This code is ugly and frustrating: the important part of the code, the actual logic of what’s happening is tucked away in the middle there.

[photosForCurrentSection.lastObject.date timeIntervalSinceDate:photo.date] > 60*60

The entire rest of the code could be abstracted away. Part of this is Objective-C’s fault, but this version of the code really does highlight the duplication that happens when modifying the code that creates a new section.

The last piece of the puzzle here is how Ruby’s Enumerable comes into play. When exploring that module, you can see three functions that look like they could be related: slice_when, slice_before, and slice_after. What is the relationship between these functions?

It seems obvious now that I know, but I did have to do some searching around for blog posts to explain it. Essentially, slice_when is like our slice(between:). It gives you two elements and you tell it if there should be a division there. slice(before:) and slice(after:) each yield only one element in their block, and slice before or after that element. This clears up the confusion with Erica’s original function — we now know if it’ll slice before or after our element, based on the name of the function.

As for implementation, you could use a for loop and the slight duplication of Erica’s code, but I’ve recently been trying to express more code in terms of chains of functions that express a whole operation that happens to the collection all at once. I find this kind of code more elegant, easier to read, and less prone to error, even though it does pass over the relevant collection multiple times.

To write our slice(between:) function in that style, we first need two helpers:

extension Collection {
	func eachPair() -> Zip2Sequence<Self, Self.SubSequence> {
		return zip(self, self.dropFirst())
	}

	func indexed() -> [(index: Index, element: Element)] {
		return zip(indices, self).map({ (index: $0, element: $1) })
	}
}

I’m a big fan of composing these operations together, from smaller, simpler concepts into bigger, more complex ones. There are slightly more performant ways to write both of those functions, but these simple implementations will do for now.

extension Collection {
	func slice(between predicate: (Element, Element) -> Bool) -> [SubSequence] {
		let innerSlicingPoints = self
			.indexed()
			.eachPair()
			.map({ (indexAfter: $0.1.index, shouldSlice: predicate($0.0.element, $0.1.element)) })
			.filter({ $0.shouldSlice })
			.map({ $0.indexAfter })

		let slicingPoints = [self.startIndex] + innerSlicingPoints + [self.endIndex]

		return slicingPoints
			.eachPair()
			.map({ self[$0..<$1] })
	}
}

This function is broken up into 3 main sections:

Find the indexes after any point where the caller wants to divide the sequence.
Add the startIndex and the endIndex.
Create slices of for each consecutive pair of those indexes.

Again, this isn’t the most performant implementation. Namely, it does quite a few iterations, and requires Collection where Erica’s solution required just a Sequence. But I do think it’s easier to read and understand. As I discussed in the post about refactoring, distilling an algorithm into forms straightforward, simple forms, operating at the highest possible level of abstraction, enables you to see the structure of our algorithm and potential transforms that it can undergo to create other algorithms.

Erica’s original example is a lot simpler now:

let groups = [1, 1, 2, 3, 3, 2, 3, 3, 4].slice(between: !=)

To close this post out, the implementations for slice(before:) and slice(after:) practically fall out of the ether, now that we have slice(between:):

extension Collection {
	func slice(before predicate: (Element) -> Bool) -> [SubSequence] {
		return self.slice(between: { left, right in
			return predicate(right)
		})
	}

	func slice(after predicate: (Element) -> Bool) -> [SubSequence] {
		return self.slice(between: { left, right in
			return predicate(left)
		})
	}
}

This enables weird things to become easy, like splitting a CamelCased type name:

let components = "GalleryDataSource"
	.slice(before: { ("A"..."Z").contains($0) })
	.map({ String($0) })

(Don’t try it with type name with a initialism in it, like URLSession.)