Khanlou | Refactoring Reveals Truths

July 11, 2017

Refactoring Reveals Truths

This is a post I’ve been trying to write for a long time — literally years — and have struggled for want of the perfect example. I think I’ve finally found the one, courtesy of David James, Tim Vermeulen, Dave DeLong, and Erica Sadun.

Once upon a time, Erica came up with a way for constraints to install themselves. This code was eventually obviated by isActive in UIKit, but the code moved from Objective-C to Swift. It wasn’t perfect or particularly efficient, but it got the job done.

The following code comes from a rote Swift migration. It calculates the nearest common ancestor between two items in a tree of views. This was an early stab at this concept, abandoned after isActive was added.

Back when I worked at Rap Genius, we would often say the first cut is the deepest. Your first attempt at something, while it might not be the cleanest or most polished, involves the most work because it provides the superstructure for what you’re building. Erica’s version is that superstructure. It’s a solution and it works, but it’s ripe for cleaning up.

public extension UIView {

	// Return nearest common ancestor between two views
	public func nearestCommonAncestor(with otherView: UIView) -> UIView? {
		
		// Two equal views are each other's NCA
		guard self != otherView else { return self }
		
		// Compute superviews
		let mySuperviews = sequence(first: self.superview, next: { $0?.superview }).flatMap({ $0 })
		let theirSuperviews = sequence(first: otherView.superview, next: { $0?.superview }).flatMap({ $0 })	 
		
		// Check for direct ancestry
		guard !mySuperviews.contains(otherView)
			else { return otherView }
		guard !theirSuperviews.contains(self)
			else { return self }
		
		// Check for indirect ancestry
		for view in mySuperviews {
			guard !theirSuperviews.contains(view)
				else { return view }
		}
		
		// No shared ancestry
		return nil
	}
}

There’s a lot wrong with this code. It’s complex. There’s lots of cases to think about. It’s a simple piece of functionality, and yet there’s four guards and three different lookups. Simplifying this code will make it easier to read, understand, and maintain.

Perhaps you’re fine with this code in your codebase. The old saying goes, “if it ain’t broke, don’t fix it”. However, my experience has shown me that when there’s an inelegant algorithm like this, there’s a pearl in the center of it that wants to come out. Even a function this long is too hard to keep in your brain all at once. If you can’t understand it all, things slip through the cracks. I’m not confident that there aren’t any bugs in the above code; as a friend said, “every branch is a place for bugs to hide”. (This concept is known more academically as cyclomatic complexity.) And bugs or no bugs, with the power of retrospection, I can now see a few performance enhancements hiding in there, obscured by the current state of the code.

The refactoring process helps eliminate these potential bugs and expose these enhancements by iteratively driving the complex towards the simple. Reducing the algorithm down to its barest form also helps you see how it’s similar to other algorithms in your code base. These are all second-order effects, to be sure, but second-order effects pay off.

To kick off our refactoring, let’s look at the sequence(first:next:) function. Erica’s version started with self.superview, which is an optional value. This creates a sequence of optionals, which then forced Erica to flatMap them out. If we can remove this optionality from the sequence, we can remove the flatMap too. We changed the sequence to start from self instead (which isn’t optional), and added dropFirst() to remove that self:

let mySuperviews = sequence(first: self, next: { $0.superview }).flatMap({ $0 }).dropFirst()

Next, we killed the flatMap({ $0 }), because there are no nils to remove any more:

sequence(first: self, next: { $0.superview }).dropFirst()

This change led to this intermediate state, with plenty of code still left to trim:

public extension UIView {

	// Return nearest common ancestor between two views
	public func nearestCommonAncestor(with otherView: UIView) -> UIView? {
		
		// Two equal views are each other's NCA
		guard self != otherView else { return self }
		
		// Compute superviews
		let mySuperviews = sequence(first: self, next: { $0.superview }).dropFirst()
		let theirSuperviews = sequence(first: otherView, next: { $0.superview }).dropFirst()

		// Check for direct ancestry
		guard !mySuperviews.contains(otherView)
			else { return otherView }
		guard !theirSuperviews.contains(self)
			else { return self }

		// Check for indirect ancestry
		for view in mySuperviews {
			guard !theirSuperviews.contains(view)
				else { return view }
		}
		
		// No shared ancestry
		return nil
	}
}

At this point, we looked at the indirect ancestry component.

// Check for indirect ancestry
for view in mySuperviews {
	guard !theirSuperviews.contains(view)
		else { return view }
}

A for loop with an embedded test is a signal to use first(where:). The code simplified down to this, removing the loop and test:

if let view = mySuperviews.first(where: { theirSuperviews.contains($0) }) { return view }

Function references make this more elegant, readable, and clear:

if let view = mySuperviews.first(where: theirSuperviews.contains) { return view }

Our function now looks like this:

public extension UIView {

	// Return nearest common ancestor between two views
	public func nearestCommonAncestor(with otherView: UIView) -> UIView? {
		
		// Two equal views are each other's NCA
		guard self != otherView else { return self }

		// Compute superviews
		let mySuperviews = sequence(first: self, next: { $0.superview }).dropFirst()
		let theirSuperviews = sequence(first: otherView, next: { $0.superview }).dropFirst()
		
		// Check for direct ancestry
		guard !mySuperviews.contains(otherView)
			else { return otherView }
		guard !theirSuperviews.contains(self)
			else { return self }

		// Check for indirect ancestry
		if let view = mySuperviews.first(where: theirSuperviews.contains) { return view }
		if let view = theirSuperviews.first(where: mySuperviews.contains) { return view }
		
		// No shared ancestry
		return nil
	}
}

After this point, we stepped back and looked at the algorithm as a whole. We realized that if we include self and otherView in their respective superview sequences, the “direct ancestry” check and the “two equal views” check at the top would be completely subsumed by the first(where:) “indirect ancestry” checks. To perform this step, we first dropped the dropFirst():

sequence(first: self, next: { $0.superview })

And then we could kill the “direct ancestry” check:

// Check for direct ancestry
guard !mySuperviews.contains(otherView)
	else { return otherView }
guard !theirSuperviews.contains(self)
	else { return self }

And finally we could remove the first guard as well:

guard self != otherView else { return self }

After deleting them both, the function now looked like this:

public extension UIView {

	// Return nearest common ancestor between two views
	public func nearestCommonAncestor(with otherView: UIView) -> UIView? {
		
		// Compute superviews
		let mySuperviews = sequence(first: self, next: { $0.superview })
		let theirSuperviews = sequence(first: otherView, next: { $0.superview })
		
		if let view = mySuperviews.first(where: theirSuperviews.contains) { return view }
		if let view = theirSuperviews.first(where: mySuperviews.contains) { return view }
		
		// No shared ancestry
		return nil
	}
}

That was a major turning point in our understanding of the function. At this point, this code was starting to reveal its own internal structure. Each step clarifies the next potential refactoring to perform, to get closer to the heart of the function. For the next refactoring, Tim realized we could simplify the tail end of the function by applying a nil-coalescing operator:

return mySuperviews.first(where: theirSuperviews.contains) ?? theirSuperviews.first(where: mySuperviews.contains)

But here the first test before the nil-coalescing operator has covered all the views in both hierarchies. Because we’re looking for the first intersection between mySuperviews and theirSuperviews, there’s no reason to test both it and its opposite. We can drop everything after the ??:

public extension UIView {
	// Return nearest common ancestor between two views
	public func nearestCommonAncestor(with otherView: UIView) -> UIView? {
		let mySuperviews = sequence(first: self, next: { $0.superview })
		let theirSuperviews = sequence(first: otherView, next: { $0.superview })
		
		return mySuperviews.first(where: theirSuperviews.contains)
	}
}

The algorithm has revealed its beautiful internal symmetry now. Very clear intent, very clear algorithm, and each component is simple. It’s now more obvious how to tweak and modify this algorithm. For example,

If you don’t want the views self and otherView to be included in the calculation of ancestry, you can restore dropFirst() to the superview sequences.
If you want to know if the views have a common ancestor (rather than caring about which ancestor it is), you can replace the first(where:) with a contains(where:).
If you want to know all the common ancestors, you could replace the first(where:) with a filter(_:).

With the code in its original state, I couldn’t see before that these kinds of transformations were possible; now, they’re practically trivial.

From here, there are two potential routes.

First, there’s a UIView API for determining if one view is a descendant of another, which makes for a super readable solution:

extension UIView {
	// Return nearest common ancestor between two views
	public func nearestCommonAncestor(with other: View) -> View? {
		return sequence(first: self, next: { $0.superview })
			.first(where: { other.isDescendant(of: $0) })
	}
}

The second option is to explore performance. We noticed that theirSuperviews was only used for a contains check. If we wrap that sequence in a Set, existence lookup becomes O(1), and this whole algorithm gets blisteringly fast.

public extension UIView {
	// Return nearest common ancestor between two views
	public func nearestCommonAncestor(with otherView: UIView) -> UIView? {
		let mySuperviews = sequence(first: self, next: { $0.superview })
		let theirSuperviews = Set(sequence(first: otherView, next: { $0.superview }))
		return mySuperviews.first(where: theirSuperviews.contains)
	}
}

For view hierarchies that are pathologically deep (10,000 or so levels), this solution leaves the other one in the dust. Almost no view hierarchies contain that many layers, so this isn’t really a necessary optimization. However, if it were necessary, it would have been very hard to find without this refactoring process. Once we performed it, it become obvious what to tweak to speed things up.

Thomas Aquinas writes:

Properly speaking, truth resides in the intellect composing and dividing; and not in the senses; nor in the intellect knowing “what a thing is.”

This quote reflects the process of refactoring. If you’re doing it right, you don’t need to understand what the original code actually does. In the best of cases, you won’t even need to compile the code. You can operate on the code, composing and dividing, through a series of transformations that always leave the code in a correctly working state.

Perhaps you could have written the final version of this code from the very start. Perhaps it was obvious to you that this combination of APIs would yield the correct behavior in all cases. I don’t think I could have predicted that the original code would end up as an elegant one-line solution that handles all edge cases gracefully. I definitely couldn’t have predicted that there was a big performance optimization that changes this algorithm from O(n²) to O(n). Refactoring is an iterative process, and continual refinement reveals the code’s true essence.