[Dev Diaries] Force Directed Layout

Sometimes, the worlds of computer graphics and physics collide in the most beautiful ways.

Everybody loves UICollectionViews, but after a while, those grids tend to be a bit restrictive. The "flow" layout is fine and all, but what if we wanted to have the cells arranged in an "organic" manner?

Let's talk about forces

In nature, almost everything is about forces. Gravity is probably the one everyone is the most familiar with, and it goes straight down at roughly 9.8 m/s/s. What that means is that at the beginning, your object goes at 0 m/s, then after a second it goes at 9.8 m/s, then after another second, it goes at 19.6 m/s, etc. When an object is on the ground, the gravity still applies, but it's compensated by the floor pushing up with a force equal to the gravity.

Good ole Newton says that what matters is the sum of all the forces that apply to an object. From that you can deduce the acceleration the object is subjected to (in m/s/s), and by derivation, the speed of an object at a given time. Speed is the distance moved by unit of time, so from the speed, we can determine the position of an object on which the force applies.

F = ma \implies a = F/m

For the rest of this post, we will assume everything has a mass of 1, because it simplifies a bit the operations.

Exemple: gravity on a slope

In terms of code, I decided to have a struct representing forces in a way that will give me the operations I will need later:

  • adding two forces together
  • getting the angle of the force in a plane
  • getting the magnitude (length of the vector)
struct Force {
    let dx: CGFloat
    let dy: CGFloat
    static func +(lhs: Force, rhs: Force) -> Force {
        return Force(lhs.dx+rhs.dx,lhs.dy+rhs.dy)
    }
    
    static func /(lhs: Force, rhs: CGFloat) -> Force {
        return Force(lhs.dx/rhs, lhs.dy/rhs)
    }
    
    init(_ x: CGFloat, _ y: CGFloat) {
        dx = x
        dy = y
    }
    
    var angle : CGFloat {
        return CGFloat(atan2(Double(dy), Double(dx)))
    }
    
    var magnitude : CGFloat {
        return sqrt(dx*dx+dy*dy)
    }
}
Attraction

We'll need our cells to be attracted to a certain point in the plane. Ideally, if there's only one cell, it'll sit at that point, and if there are two, they will be as close as possible to there, while not overlapping (more on that later).

Gravity has flair, but its exponential nature is a bit too powerful. For force directed graphs (like the ones dot produces), we tend to go with its linear cousin: the spring. It also is easier to wrap your head around: you attach all the cells to the center with a spring, and they will try to get back there.

Spring force is expressed with Hooke's law

F = -k•∂
  • k is the stiffness of the spring
  • ∂ is how far we are from the "ideal" distance to the other end of the spring

You can see that compressing a spring produces an "extension" force (it pushes the two ends away because ∂ is negative), and the opposite "attraction" force (when ∂ is positive).

Repulsion

We could use the spring for repulsion as well, but it'd mean "wiring" all the cells with their nearest neighbours and always keeping track of which cells are attached to which other cells. It's too complex to manage, but fortunately, we have another physics law that repulses and looks quite natural: magnets.

Magnets attract obviously, but they also repulse quite heavily. And because we want that repulsion to be extremely high when the objects are super close and negligible when they are far away from each other, this time we kind of need an inverse square law. Coulomb's law to the rescue!

F = k \frac{Q_1 * Q_2}{r^2}
  • k isn't that important for us, but its value is 9×109 N⋅m2⋅C−2
  • The two Qs are the electric charges of the objects
  • r is the distance between them

You can see that objects with the same electric charge repulse each other, while objects with opposite charges attract each other.

So why didn't we use this for our attraction law? The big problem with using that kind of force is that it's very computationally expensive: every cell will affect every other cell. Calculating the resulting force is proportional to the square of the number of cells. Simple spring mechanics is a lot simpler to manage for attraction, but we won't escape the complexity of repulsion.

Sidenote: what's up with that k value? 😳
That's physics for you, I'm afraid. We need to match the formula to the real world, and that's the number that fits what we see.
For our purposes, we have made-up identical "charges" on our cells anyways, so we will have k(Q²/r²). Might as well ignore k altogether and go with Q³/r²
Of course, that value of Q has no "physical" value, it's just convenient.
Let's create a universe

So, we have decided on our forces, now we need something to apply them to. Meet Node, the structure which will handle the physics side of our layout. It needs to hold 3 informations:

  • the position the cell is at : x and y
  • some way to differentiate it from other cells (because of the repulsion bit) : a uuid
  • a way to link it to the data source of the UICollectionView : an IndexPath

Here's the boring first part:

struct Node : Equatable, Hashable {
    var x : CGFloat = 0
    var y : CGFloat = 0
    let uuid = UUID() // for identification purposes
    var indexPath : IndexPath
    
    init(x px: CGFloat = 0, y py: CGFloat = 0, for idx: IndexPath) {
        x = px
        y = py
        indexPath = idx
    }
    
    static func == (lhs: Node, rhs: Node) -> Bool {
        return lhs.uuid == rhs.uuid
    }
    // ...
}

The first interesting bit is the attraction function. It is extremely verbose for education purposes, you can compactify and optimize it quite a bit.

func attraction(center: CGPoint, stiffness: CGFloat) -> Force {
    // Hooke's Law: F = -k•∂ (∂ being the "ideal" distance minus the actual distance)
    let dx = x - center.x
    let dy = y - center.y
    let angle = CGFloat(atan2(Double(dy), Double(dx)))
    let delta = sqrt(dx*dx+dy*dy)
    let intensity = stiffness * delta
    let ix = abs(intensity * cos(angle))
    let fx : CGFloat
    if center.x > x { // positive force to the right
        fx = ix
    } else {
        fx = -ix
    }
    let iy = abs(intensity * sin(angle))
    let fy : CGFloat
    if center.y > y { // positive force to the bottom
        fy = iy
    } else {
        fy = -iy
    }
    return Force(fx,fy)
}

The really big one, although not that complicated from the code point of view, is the repulsion force. It has to take into account every single other node. As with the attraction, it is verbose for the sake of clarity, and can obviously be optimized. One such optimization is to return early with a 0 if the distance is "too great", but that's beyond the scope of what we are trying to do here.

func repulsion(others: [Node], charge: CGFloat) -> Force {
    var totalForce = Force(0,0)
    for n in others.filter({ (on) -> Bool in
        on.uuid != self.uuid // just in case
    }) {
        // Coulomb’s Law; F = k(Q1•Q2/r²)
        // Since we're dealing with arbitrary "charges" here, we'll simplify to F = C³/r²
        // We want repulsion (Q1=Q2) and not deal with big numbers, so that works
        var dx = x - n.x
        var dy = y - n.y
        if dx == 0 && dy == 0 { // wiggle a bit
            let room : CGFloat = 0.05
            dx += CGFloat.random(in: -room...room)
            dy += CGFloat.random(in: -room...room)
        }
        let angle = CGFloat(atan2(Double(dy), Double(dx)))
        let delta = max(0.000001,sqrt(dx*dx+dy*dy)) // do NOT divide by zero you fool
        let intensity = pow(charge,3)/(delta*delta)
        let ix = abs(intensity * cos(angle))
        let fx : CGFloat
        if n.x > x { // positive force to the left
            fx = -ix
        } else {
            fx = ix
        }
        let fy : CGFloat
        let iy = abs(intensity * sin(angle))
        if n.y > y { // positive force to the bottom
            fy = -iy
        } else {
            fy = iy
        }
        
        totalForce = totalForce + Force(fx,fy)
    }
    
    return totalForce
}

Note the check for divide-by-zero shenanigans (in case two cells overlap), that is covered in two different ways: we wriggle the cells a bit to avoid being exactly on top of each other and we don't divide by zero if the wriggling failed. The resulting force is the sum of being repulsed by every other cell.

Putting it all together, we'll just need to sum attraction and repulsion. For some reason, I have also included the possibility of adding a force to a node.

func globalForce(center: CGPoint, otherNodes: [Node], stiffness: CGFloat, charge: CGFloat) -> Force {
    let a = attraction(center: center, stiffness: stiffness)
    let r = repulsion(others: otherNodes, charge: charge)
    return a + r
}

static func +(lhs: Node, rhs: Force) -> Node {
    return Node(x: lhs.x+rhs.dx, y: lhs.y+rhs.dy, for: lhs.indexPath)
}

Because we have to compute the global force (which depends on every node) for every node, we end up with a square order complexity. But since the force calculation is independant for every node, we can spread the calculations of as many threads as we can to claw a bit of performance back. Thankfully I have a Task system handy.

Here is the function that takes all nodes, and computes where they should go next. springStiffness and electricCharge are instance variables of the layout. Its output is the pair composed of the new node positions and the total movement of all the nodes, for a reason that will become clear later.

func computeNewPositions(center: CGPoint, nodes: [Node]) -> (nodes: [Node], movement: CGFloat) {
    // if the total movement is less than threshold, will return nil
    var totalMovement : CGFloat = 0
    var newNodes : [Node] = []
    var computeTasks : [Task] = []
    let lock = NSLock()
    for n in nodes {
        let t = Task() {
            let f = n.globalForce(center: center, otherNodes: nodes, stiffness: self.springStiffness, charge: self.electricCharge)
            let nn = n + f
            lock.lock()
            newNodes.append(nn)
            totalMovement += f.magnitude
            lock.unlock()
        }
        computeTasks.append(t)
    }
    let waitSem = DispatchSemaphore(value: 0)
    let compute = Task.group(computeTasks)
    compute.perform { (outcome) in
        waitSem.signal()
    }
    waitSem.wait()
    
    return (newNodes, totalMovement)
}

Now's the time to implement a few of the overrides for our custom layout. Let's start with the obvious ones:

protocol ForceDirectedLayoutDelegate {
    func layoutDidFinish()
}
class ForceDirectedLayout : UICollectionViewLayout {
    // in case you want to know
    var delegate : ForceDirectedLayoutDelegate? = nil
    
    // maths
    var springStiffness : CGFloat = 0.02 // max(width,height)/1000 seems ok
    var electricCharge : CGFloat = 10 // max(width,height)/2 seems ok
    var cellSize : CGSize = CGSize(width: 20,height: 20) // obviously too small for real use

    // ...

    override func shouldInvalidateLayout(forBoundsChange newBounds: CGRect) -> Bool {
        return true // because the center changes...
    }
    
    override var collectionViewContentSize: CGSize {
        var totalRect = CGRect(origin: CGPoint.zero, size: self.collectionView?.frame.size ?? CGSize.zero)
        for cachedA in cachedAttributes.values {
            totalRect = totalRect.union(
                CGRect(x: cachedA.center.x-cachedA.size.width*0.5,
                       y: cachedA.center.y-cachedA.size.height*0.5,
                       width: cachedA.size.width, height: cachedA.size.height))
        }
        return totalRect.size
    }
}

The eagle eyed among you will have noticed a cachedAttributes instance variable. For some reason, UICollectionView itself doesn't cache the current position and size of a cell. If you call collectionView.layoutAttributesForItem(at: idx) it gives you a nil value, forcing you to regenerate the attributes at every stage (and there are many). I have decided to cache the cell attributes internally as a backup, and if Apple decides at some point to return a non-nil value, I will use it. But what it forces me to do is to make sure I detect cell deletion. If a cell is removed, I need to remove its cached attributes. You can skip that deletion part if you want, or code it more efficiently, it's just for completeness' sake.

/// Unfortunately, collectionView.layoutAttributesForItem doesn't seem to be caching the previous attributes
/// We do it ourselves as a backup
fileprivate var cachedAttributes = [IndexPath:UICollectionViewLayoutAttributes]()

/// Since we cache the data ourselves, we need to cleanup if elements are removed
override func prepare() {
    super.prepare()
    
    guard let collection = self.collectionView else {
        cachedAttributes.removeAll()
        return
    }
    let sectionCount = collection.dataSource?.numberOfSections?(in: collection) ?? 1
    var rowCounts = [Int:Int]()
    for s in 0..<sectionCount {
        rowCounts[s] = collection.dataSource?.collectionView(collection, numberOfItemsInSection: s) ?? 0
    }
    for removed in cachedAttributes.keys.filter({ (idx) -> Bool in
        return idx.section >= sectionCount || idx.row >= (rowCounts[idx.section] ?? 0) // hence the dictionary, no index out of bounds
    }) {
        cachedAttributes.removeValue(forKey: removed)
    }
}

And finally, the override that makes everything work: the layout function itself. Unfortunately, if a node changes position, it affects every other node, potentially, so we can't update only a partial list. We need to do them all every single time.

But wait. We have been talking about forces for 2000+ words here. Forces can help you figure out positions but they aren't positions themselves. Yep, that's right, we have an animated layout: from the current position of the cells, you calculate where the cells should go next. So... when does the animation end?

That's where the movement output of the computation function comes in. When we reach an equilibrium (like the floor force and the gravity exactly compensating each other), the movement drops to zero. Of course, in the real world, it's a neverending process but we can just decide that the layout stops animating as soon as the total movement of every cell falls below a certain threshold. I have decided arbitrarily that a third of a pixel is imperceptible enough, even on a 3x retina screen but if you want to stop the animation sooner, feel free to raise that a bit.

So, the mechanics will be as follows:

  • we take the current positions of all the cells
  • we compute their new positions
  • if they didn't move much, we stop
  • if they still are moving quite a bit, we instruct the collection view to keep going by invalidating the layout again
// unfortunately, every node affects every other node, so we can't do partial updates
override func layoutAttributesForElements(in rect: CGRect) -> [UICollectionViewLayoutAttributes]? {
    var attributes : [UICollectionViewLayoutAttributes] = []
    var nodes = [Node]()
    
    if let collectionView = self.collectionView {
        for i in 0..<(collectionView.dataSource?.collectionView(collectionView, numberOfItemsInSection: 0) ?? 0) {
            let idx = IndexPath(row: i, section: 0)
            let currentAttributes : UICollectionViewLayoutAttributes
            if let ca = (collectionView.layoutAttributesForItem(at: idx) ?? cachedAttributes[idx]) {
                currentAttributes = ca
            } else {
                // randomize start positions, just for funsies
                currentAttributes = UICollectionViewLayoutAttributes(forCellWith: idx)
                currentAttributes.center = CGPoint(
                    x: CGFloat.random(in: 0...self.collectionViewContentSize.width),
                    y: CGFloat.random(in: 0...self.collectionViewContentSize.height)
                )
                currentAttributes.size = cellSize
                cachedAttributes[idx] = currentAttributes
            }
            attributes.append(currentAttributes)
            nodes.append(Node(x: currentAttributes.center.x, y: currentAttributes.center.y, for: idx))
        }
        
        let center = self.collectionView?.absoluteCenter ?? CGPoint.zero
        let nextIteration : (nodes: [Node], movement: CGFloat)
        nextIteration = computeNewPositions(
            center: center,
            nodes: nodes)
        
        for n in nextIteration.nodes {
            if let attrsIdx = attributes.firstIndex(where: { $0.indexPath == n.indexPath }) {
                let attrs = attributes[attrsIdx]
                attrs.center = CGPoint(x: n.x, y: n.y)
                attributes[attrsIdx] = attrs
            }
        }
        
        // if it's still moving, keep going
        if nextIteration.movement > 0.3 { // subpixel animation
            DispatchQueue.main.asyncAfter(deadline: DispatchTime.now() + 0.05) {
                self.invalidateLayout()
            }
        } else {
            DispatchQueue.main.async {
                self.delegate?.layoutDidFinish()
            }
        }
        
    }
    
    return attributes
}

Please note that I position new cells randomly. They could appear in the middle of the other cells. Depending on the effect you want to go for, you can have they come from a specific point, or randomly from the borders, etc.

And that's it! I have made a quick example that adds round cells progressively to a collection view, and here's the result:

You can obviously get the whole thing on GitHub, prepared for use as a swift package for the future Xcode version.

A Little Bit Of Math #1

Aaaaaah the math tricks! When you know 'em, you love 'em, and when you don't, you pay for extra computing resources.

Today's math trick has to do with averages. Averages are easy, right? you take all the numbers in the list, you sum them, and then you divide by the count... Pft, that's no trick!

A_{list} = \frac{\sum_{i=0}^{list.size - 1} list[i]}{list.size}

Except... there is a little something called overflow. Let's take the case of integers, and let's assume we're working with UInt8 objects. What's the average of [233,212]? It is 222.5 which gets rounded to 223. But our good'ol summation doesn't work:

 1> let v1 : UInt8 = 233
 2> let v2 : UInt8 = 212
 3> let sum = v1 + v2

EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)

Depending on who you ask, 233+212 either wraps around or causes an error. 255 is the maximum value, after which there is nothing. Either way, we wouldn't be happy with the go-around either: 233+212 = 190, which gives an average of 95 when divided by 2.

Musical Interlude : "Zino, dear, I don't care, I can use BIGGER numbers!"

Yes, you can, up to a point. Most languages have a maximum integer width, and sure, you can probably find unbounded implementations of integers for your language. In Swift, you can check this out, it's really nice. BUT while it's technically possible to handle arbitrary precision, you start hitting all sorts of issues with storing that data ("I love using blobs in my database"), converting it for practical use ("My users can remember 200+ digits numbers easyyyyyyyyy"), etc. Plus, you generally don't want to replace every single Int use in your code by something coming from an external dependency, with all the headache that implies, just for the sake of type safety.

Enter the Maths (royal fanfare ♫♩🎺)

Let's start with a basic observation:

v1+v2 = ({\frac{v1}{2} + \frac{v2}{2}})*2

If I divide by two, then multiply by two, I've done nothing. In the case of integers, it's not quite true, as the division will be rounded to the closest value, but for big numbers, it's not that bad. But what does that give us?

Well...

\frac{v1}{2} + \frac{v2}{2} \lt Int_{max}

The sum of the two halves will fit in an integer, because each is guaranteed to be smaller than half the maximum. Right? Then we can multiply the result by 2 to get the sum, maybe. But it might overflow. Good thing we are trying to get the average, because we were about to divide by two, which cancels out the multiplication.

A_{(233,212)} = ((\frac{233}{2}+\frac{212}{2})*2)/2 = \frac{233}{2}+\frac{212}{2}

Musical Interlude: "Mind => Blown"

Sure, we kind of lose some precision: 233/2 will be rounded to 117 so the average calculated will be 223, but it could easily have been rounded down at some point.

Anyways... Onward and upward! What can we do with a big list of numbers? We could use the same trick, and just divide wholesale. The major issue is that we would severely compound the rounding errors. Imagine we're still playing with UInt8 elements and you have 200 of them. Any of them divided by 200 would result in 0 or maybe 1. Your average wouldn't look very good.

Cue the Return of the Maths (royal fanfare ♫♩🎺)

{\sum_{i=0}^{list.size - 1} list[i]} = {\sum_{i=0}^{list.size - 2} list[i]} + list[list.size - 1]

(As in (x+y+z+t) = (x+y+z) + t)

  • Okay, and?
  • Let's divide by list.size, and we get the average
A_{list} = \frac{\sum_{i=0}^{list.size - 2} list[i] + list[list.size-1]}{list.size}

The top-left part looks familiar, it's almost as if it was the average of the list minus the last element... 😬

All we would need to do is to divide by list.size - 1... But if we multiply and divide by the same thing... 🤔

\frac{(list.size-1)*\frac{\sum_{i=0}^{list.size - 2} list[i]}{list.size-1} + list[list.size-1]}{list.size}

Which is

A_{list} = \frac{(list.size-1)*A_{list-last} + list[list.size-1]}{list.size}

Musical Interlude: Smells like recursion

So... The code will basically look like this:

  • If the list is empty (because we're good programmers and handle edge cases), the result is 0
  • If the list contains one element, the average is easy
  • If the list contains two elements, we can use the divide by two trick, the rounding error shouldn't be that bad
  • If the list contains more elements, we do the average by aggregate, and hope the rounding errors will be somewhat contained.

Side note on the rounding errors: the bigger the divider, the higher the rounding error (potentially). But by doing a rolling average, we have a rounding error that worsens as we go through the list rather than being bad at every step. It's not ideal, but it's still better.

So, let's set the stage up: I have a list of big numbers I want the average of.

2988139172152746883
4545331521850540616
5693938727954663282
5884889191787885217
3111881160526182838
8720326064806005009
8427311181199404053
7983003740783657027
2965909035096967706
1211883882534796072
5703029716464526164
8424273336993151821
774296368044414872
14130533330426236
2230589047337383318
8337015733785964014
9153431205551083918
3249272057022384528
8254667294021634003
6758234862357239854

They are all Int64 integers, which is the highest bit native signed variant available (Int128 has been coming since 2017). They come from a PostgreSQL database that stores big numbers for a very good reason I won't get into.

Now, if I plug these numbers into an unbounded calculator, the average should be 5221577691680052871.55 or so I'm told.

My recursive Swift function looks like this:

func sumMean(_ input: [Int64]) -> Int64 {
    if input.count == 0 { // uninteresting
        return 0
    }
    if input.count == 1 { // easy
        return input[0]
    }
    
    // general trick : divide by two (will introduce rounding errors)
    if input.count == 2 {
        let i1 = input[0] / 2
        let i2 = input[1] / 2
        let mean = (i1+i2) // (/2, then *2)
        return mean
    }
    
    let depth = Int64(input.count) - 1
    // rolling average formula
    let last = input.last!
    var rest = [Int64](input)
    rest = rest.dropLast()
    
    let restMean = sumMean(rest)
    // should be (depth * restMean + last) / depth+1, but overflow...
    let num = (restMean/2) + ((last/2)/depth)
    let res = (num / (depth+1)) * depth * 2
    return res
}

The reason for why num and res exist is left as an exercise.

Here's the calling code and the output:

var numbers : [Int64] = [
    2988139172152746883,
    4545331521850540616,
    5693938727954663282,
    5884889191787885217,
    3111881160526182838,
    8720326064806005009,
    8427311181199404053,
    7983003740783657027,
    2965909035096967706,
    1211883882534796072,
    5703029716464526164,
    8424273336993151821,
    774296368044414872,
    14130533330426236,
    2230589047337383318,
    8337015733785964014,
    9153431205551083918,
    3249272057022384528,
    8254667294021634003,
    6758234862357239854
]

print(sumMean(numbers))
5221577691680052740

As expected, we have rounding errors creeping in. This isn't the exact mean, but it's close enough: the difference is 131.55, which is a whopping 0.0000000000000025193534936693344360660675977627565% deviation.

As a side note, ordering matters:

  • unordered and sorted crescendo yield the same error
  • ordered reversed yields a 169.55 error margin

Given the scale, it's not a big deal, but keep in mind that this trick is only useful for fairly large numbers in a fairly large list, not for the extremes.