[ML] Swift TensorFlow (Part 1)

First part of doing RNN text prediction with TensorfFlow, in Swift

Broad Strokes

For all intents and purposes, it's about statistics. The question we are trying to solve is either something along the lines of "given an input X, what is the most probable Y?", or along the lines of "given an input X, what is the probability of having Y?"

Of course, simple probability problems have somewhat simple solutions: if you take a game of chess and ask for a next move based on the current board, you can do all the possible moves and sort them based on the probability of having a piece taken off the board, for instance. If you are designing an autopilot of some kind, you have an "ideal" attitude (collection of yaw, pitch and roll angles), and you calculate the movements of the stick and pedals that will most likely get you closer to that objective. If your last shot went left of the target, chances are, you should go right. Etc etc etc.

But the most interesting problems don't have obvious causality. If you have pasta, tomatoes and ground meat in your shopping basket, maybe your next item will be onions, because you're making some kind of bolognese, maybe it will be soap, because that's what you need, maybe it will be milk, because that's the order of the shelves you are passing by.

Machine learning is about taking a whole bunch of hopefully consistent data (even if you don't know for sure that it's consistent), and use it to say "based on this past data, the probabilities for onions, soap and milk are X, Y, and Z, and therefore the most probable is onions.

The data your are basing your predictive model on is really really important. Maybe your next item is consistent with the layout of the shop. Maybe it is consistent with what other customers got. Maybe it's consistent to your particular habits. Maybe it's consistent with people who are in the same "category" as you (your friends, your socio-economic peers, your cultural peers, ... pick one or more).

So you want a lot of data to have a good prediction, but not all the data, because noise (random data) does not reveal a bias (people in the shop tend to visit the shelves in that order) or doesn't exploit a bias (people following receipes want those ingredients together).

Oh yes, there are biases. Lots of them. Because ML uses past data to predict the future, if the data we use was based on bad practices, recommendations won't be a lot better.

There is a branch of machine learning that starts ex nihilo but it is beyond the scope of this introduction, and generates data based on a tournament rather than on actual facts. Its principle is roughly the same, though.

So, to recap:

  • We start with a model with random probabilities, and a series of "truths" ( X leads to Y )
  • We try with a Z, see what the model predicts
  • We compare it to a truth, and fix the probabilities a little bit so that it matches
  • Repeat with as many Zs as possible to be fairly confident the prediction isn't too far off

If you didn't before, now you know why ML takes a lot of resources. Depending on the number of possible Xs, Ys and the number of truths, the probability matrix is potentially humongous. Operations (like fixing the probabilities to match reality) on such enormous structures aren't cheap.

If you want a more detailed oriented explanation, with maths and diagrams, you can read my other attempt at explaining how it works.

Swift TensorFlow

There are a few contenders in the field of "ML SDK", but one of the most known is TensorFlow, backed by Google. It also happens to have a Swift variant (almost every other ML environment out there is either Python or R).

And of course, the documentation is really... lacking, making this article more useful than average along the way.

In their "Why Swift?" motivation piece, the architects make a good case, if a little bit technical, as to why swift makes a good candidate for ML.

The two major takeaways you have to know going in are:

  • It's a different build of Swift. You cannot use the one that shipped with Xcode (yet)
  • It uses a lot of Python interoperability to work, so some ways of doing things will be a bit alien

The performance is rather good, comparable or better than the regular Python TensorFlow for the tasks I threw at it, so there's that.

But the documentation... My oh my.

Let's take an example: Tensor is, as the name of the framework implies, the central feature of the system. Its documentation is here: https://www.tensorflow.org/swift/api_docs/Structs/Tensor

Sometimes, that page greets me in Greek... But hey, why not. There is little to no way to navigate the hierarchy, other than going on the left side, opening the section (good luck if you don't already know if it's a class, a protocol or a struct you're looking for), and if you use the search field, it will return pages about... the Python equivalents of the functions you're looking for.

Clearly, this is early in the game, and you are assumed to know how regular TensorFlow works before attempting to do anything with STF.

But fret not! I will hold your hand so that you don't need to look at the doc too much.

The tutorials are well written, but don't go very far at all. Oh and if you use that triple Dense layers on more than a toy problem (flower classification that is based on numbers), your RAM will fill so fast that your computer will have to be rebooted. More on that later.

And, because the center of ML is that "nudge" towards a better probability matrix (also called a Tensor), there is the whole @differentiable thing. We will talk about it later.

A good thing is that Python examples (there are thousands of ML tutorials in Python) work almost out of the box, thanks to the interop.

Data Preparation

Which Learning will my Machine do?

I have always thought that text generation was such a funny toy example (if a little bit scary when you think about some of the applications): teach the machine to speak like Shakespeare, and watch it spit some play at you. It's also easy for us to evaluate in terms of what it does and how successful it is. And the model makes sense, which helps when writing a piece on how ML works.

A usual way of doing that is using trigrams. We all know than predicting the next word after a single word is super hard. And our brains tend to be able to predict the last word of a sentence with ease. So, a common way of teaching the machine is to have it look at 3 words to predict a 4th.

I am hungry -> because, the birds flew -> away, etc

Of course, for more accurate results, you can extend the number of words in the input, but it means you must have a lot more varied sentence examples.

What we need to do here is assign numbers to these words (because everything is numbers in computers) so that we have a problem like "guess the function f if f(231,444,12)->123, f(111,2,671)->222", which neural networks are pretty good at.

So we need data (a corpus), and we need to split it into (trigram)->result

Now, because we are ultimately dealing with probabilities and rounding, we need the input to be in Float, so that the operations can wriggle the matrices by fractions, and we need the result to be an Int, because we don't want something like "the result is the word between 'hungry' and 'dumb'".

The features (also called input) and the labels (also called outputs) have to be stored in two tensors (also called matrices), matching the data we want to train our model on.

That's where RAM and processing time enter the arena: the size of the matrix is going to be huge:

  • Let's say the book I chose to teach it English has 11148 words in it (it's Tacitus' Germany), that's 11148*3-2 trigrams (33442 lines in my matrices, 4 columns total)
  • The way neural networks function, you basically have a function parameter per neuron that gets nudged at each iteration. In this example, I use two 512 parameters for somewhat decent results. That means 2 additional matrices of size 33442*512.
  • And operations regularly duplicate these matrices, if only for a short period of time, so yea, that's a lot of RAM and processing power.

Here is the function that downloads a piece of text, and separates it into words:

func loadData(_ url: URL) -> [String] {
    let sem = DispatchSemaphore(value: 0)
    var result = [String]()
    
    let session = URLSession(configuration: URLSessionConfiguration.default)
//     let set = CharacterSet.punctuationCharacters.union(CharacterSet.whitespacesAndNewlines)
    let set = CharacterSet.whitespacesAndNewlines
    session.dataTask(with: url, completionHandler: { data, response, error in
        if let data = data, let text = String(data: data, encoding: .utf8) {
            let comps = text.components(separatedBy: set).compactMap { (w) -> String? in
                // separate punctuation from the rest
                if w.count == 0 { return nil }
                else { return w }
            }
             result += comps
       }
        
        sem.signal()
    }).resume()
    
    sem.wait()
    return result
}

Please note two things: I make it synchronous (I want to wait for the result), and I chose to include word and word, separately. You can keep only the words by switching the commented lines, but I find that the output is more interesting with punctuation than without.

Now, we need to setup the word->int and int->word transformations. Because we don't want to look at all the array of words every time we want to search for one, there is a dictionary based on the hashing of the words that will deal with the first, and because the most common words have better chances to pop up, the array for the vocabulary is sorted. It's not optimal, probably, but it helps makes things clear, and is fast enough.

func loadVocabulary(_ text: [String]) -> [String] {
    var counts = [String:Int]()
    
    for w in text {
        let c = counts[w] ?? 0
        counts[w] = c + 1
    }
    
    let count = counts.sorted(by: { (arg1, arg2) -> Bool in
        let (_, value1) = arg1
        let (_, value2) = arg2
        return value1 > value2
    })
    
    return count.map { (arg0) -> String in
        let (key, _) = arg0
        return key
    }
}

func makeHelper(_ vocabulary: [String]) -> [Int:Int] {
    var result : [Int:Int] = [:]
    
    vocabulary.enumerated().forEach { (arg0) in
        let (offset, element) = arg0
        result[element.hash] = offset
    }
    
    return result
}

Why not hashValue instead of hash? turns out, on Linux, which this baby is going to run on, the values are more stable with the latter rather than the former, according to my tests.

The data we will work on therefore is:

struct TextBatch {
    let original: [String]
    let vocabulary: [String]
    let indexHelper: [Int:Int]
    let features : Tensor<Float> // 3 words
    let labels : Tensor<Int32> // followed by 1 word
}

We need a way to initialize that struct, and a couple of helper functions to extract some random samples to train our model on, and we're good to go:

extension TextBatch {
    public init(from: [String]) {
        let v = loadVocabulary(from)
        let h = makeHelper(v)
        var f : [[Float]] = []
        var l : [Int32] = []
        for i in 0..<(from.count-3) {
            if let w1 = h[from[i].hash],
                let w2 = h[from[i+1].hash],
                let w3 = h[from[i+2].hash],
                let w4 = h[from[i+3].hash] {
                    f.append([Float(w1), Float(w2), Float(w3)])
                l.append(Int32(w4))
            }
        }
        
        let featuresT = Tensor<Float>(shape: [f.count, 3], scalars: f.flatMap { $0 })
        let labelsT = Tensor<Int32>(l)
        
        self.init(
            original: from,
            vocabulary: v,
            indexHelper: h,
            features: featuresT,
            labels: labelsT
        )
    }
    
    func randomSample(of size: Int) -> (features: Tensor<Float>, labels: Tensor<Int32>) {
        var f : [[Float]] = []
        var l : [Int32] = []
        for i in 0..<(original.count-3) {
            if let w1 = indexHelper[original[i].hash],
                let w2 = indexHelper[original[i+1].hash],
                let w3 = indexHelper[original[i+2].hash],
                let w4 = indexHelper[original[i+3].hash] {
                    f.append([Float(w1), Float(w2), Float(w3)])
                l.append(Int32(w4))
            }
        }

        var rf : [[Float]] = []
        var rl : [Int32] = []
        if size >= l.count || size <= 0 { 
            let featuresT = Tensor<Float>(shape: [f.count, 3], scalars: f.flatMap { $0 })
            let labelsT = Tensor<Int32>(l)
            return (featuresT, labelsT)
        }
        var alreadyPicked = Set<Int>()
        while alreadyPicked.count < size {
            let idx = Int.random(in: 0..<l.count)
            if !alreadyPicked.contains(idx) {
                rf.append(f[idx])
                rl.append(l[idx])
                alreadyPicked.update(with: idx)
            }
        }
        
        let featuresT = Tensor<Float>(shape: [f.count, 3], scalars: f.flatMap { $0 })
        let labelsT = Tensor<Int32>(l)
        return (featuresT, labelsT)
    }
    
    func randomSample(splits: Int) -> [(features: Tensor<Float>, labels: Tensor<Int32>)] {
        var res = [(features: Tensor<Float>, labels: Tensor<Int32>)]()
        var alreadyPicked = Set<Int>()
        let size = Int(floor(Double(original.count)/Double(splits)))
        var f : [[Float]] = []
        var l : [Int32] = []
        for i in 0..<(original.count-3) {
            if let w1 = indexHelper[original[i].hash],
                let w2 = indexHelper[original[i+1].hash],
                let w3 = indexHelper[original[i+2].hash],
                let w4 = indexHelper[original[i+3].hash] {
                    f.append([Float(w1), Float(w2), Float(w3)])
                l.append(Int32(w4))
            }
        }

        for part in 1...splits {
            var rf : [[Float]] = []
            var rl : [Int32] = []
            if size >= l.count || size <= 0 { 
                let featuresT = Tensor<Float>(shape: [f.count, 3], scalars: f.flatMap { $0 })
                let labelsT = Tensor<Int32>(l)
                return [(featuresT, labelsT)]
            }
            while alreadyPicked.count < size {
                let idx = Int.random(in: 0..<l.count)
                if !alreadyPicked.contains(idx) {
                    rf.append(f[idx])
                    rl.append(l[idx])
                    alreadyPicked.update(with: idx)
                }
            }
        
            let featuresT = Tensor<Float>(shape: [f.count, 3], scalars: f.flatMap { $0 })
            let labelsT = Tensor<Int32>(l)
            
            res.append((featuresT,labelsT))
        }
        return res
    }
}

In the next part, we will see how to set the model up, and train it.


Algorithm Of The Day : Combinatory

So, I'd like to take x elements out of an array of n elements. I know from statistical analysis that there are a precise number of combinations:

$$ \frac{n!}{x!.(n-x)!} $$

But what are they? There are a few classic algorithms to do that, but most of them are recursive and use an accumulator, not exactly Swift's forte, especially if you don't know the right keyword. Cue inout

So here's the rough idea: every time we select an item in the array, we have only n-1 items to pick from, and x-1 items to pick. So it's recursive in nature. But we have an array, which means that we can make every possible combinations using the first item in the array first, then start again by removing the first item, and we should never repeat the combinations.

Here's an example: let's take 3 elements out of an array of 4

John Paul Ringo George

----------------------

John Paul Ringo
John Paul       George
John      Ringo George
     Paul Ringo George

It definitely looks like loops: We do all of John's first, then when we're done we do Paul's, and then we're done because we don't have enough people to start with Ringo.

Another way of looking at it is "I've selected John, now start again with the rest of the list and pick 2", then "I've selected Paul, now start again with the rest of the list and pick 1", then "I've started with Ringo, now start again with the rest of the list and pick 1". When we're done with the lists starting with John, we remove him, start with Paul, and there's only one choice.

In swift, because of the extension mechanism, it's easy to generalize to every array, but we still need that recursion that needs both where we are and what's left to work. Then all we need to manage is the special edge cases:

  • there is no element in the array (because it's easy to deal with)
  • there is less elements in the array than we want (ditto)
  • there is exactly as many elements in the array as we want (well duh, there's only one possibility)

So here is the algorithm, with accumulators passed with inout (to modify them in the callee and the caller):

extension Array { // combinatory
    fileprivate func recArrangements(len: Int, start: Int, cur: inout [Element], result : inout [[Element]]) {
        if len == 0 {
            result.append([Element](cur))
        } else {
            var i = start
            while i <= (self.count-len) {
                cur[cur.count - len] = self[i]
                recArrangements(len: len-1, start: i+1, cur: &cur, result: &result)
                i += 1
            }
        }
    }
    
    func arrangements(of number: Int) -> [[Element]]? {
        if self.count == 0 { return nil }
        else if number > self.count { return nil }
        else if number == self.count { return [self] }
        var buffer = [Element](repeating: self[0], count: number)
        var result = [[Element]]()
        recArrangements(len: number, start: 0, cur: &buffer, result: &result)
        return result
    }
}

Proofs that it works:

> ["John", "Paul", "Ringo", "George"].arrangements(of: 3)
$R0: [[String]]? = 4 values {
  [0] = 3 values {
    [0] = "John"
    [1] = "Paul"
    [2] = "Ringo"
  }
  [1] = 3 values {
    [0] = "John"
    [1] = "Paul"
    [2] = "George"
  }
  [2] = 3 values {
    [0] = "John"
    [1] = "Ringo"
    [2] = "George"
  }
  [3] = 3 values {
    [0] = "Paul"
    [1] = "Ringo"
    [2] = "George"
  }
}
["Potassium",
"Calcium",
"Scandium",
"Titanium",
"Vanadium",
"Chromium",
"Manganese",
"Iron",
"Cobalt",
"Nickel",
"Copper",
"Zinc",
"Gallium",
"Germanium",
"Arsenic",
"Selenium",
"Bromine",
"Krypton"].arrangements(of: 3).count
$R1: Int? = 43758

Which fits $$\frac{18!}{8!.10!}=43758$$

Unfortunately, as with most recursive algorithms, its complexity is fairly horrendous... It is equivalent to 3 nested for loops (the way to write that code is left as an exercise), which means a whopping ... Then again, combinatory has a way of being exponential anyways. I wonder if there is a way to be more efficient.


[Dev Diaries] URL Shortener Style Things

UUIDs are fine, but who doesn't like a decent String instead? It's shorter and it doesn't scare the non-programmers as much!

UUIDs are 128 bits, and I want to use a 64 characters map: a to z, A-Z, and 0-9, plus - and + to round it up.

64 possibilities is equivalent to 6 bits of data, and UUIDs are made of 16 bytes (8 bits of data). What that gives me is a way to split 16 bytes into 22 sixytes (yes I invented that word. So what?)

| 8 8 8 _ | 8 8 8 _ | 8 8 8 _ | 8 8 8 _ | 8 8 8 _ | 8
| 6 6 6 6 | 6 6 6 6 | 6 6 6 6 | 6 6 6 6 | 6 6 6 6 | 6 6

Why? Because 3x8 = 6x4, same number of bits in both.

Now, we redistribute the bits around (Xs are the bits fron the bytes, Ys are the bits from the sixytes):

XXXXXX|XX XXXX|XXXX XX|XXXXXX
YYYYYY|YY YYYY|YYYY YY|YYYYYY

With some shifts and some binary or, we're down from a 36 hexadecimal character string with dashes to a 22 character with a very low probability of punctuation. Of course if you want to disambiguate the symbols like O and 0, you can change the character map, as long as your charmap stays 64 items long.

extension UUID {
    static let charmap = 
["a","b","c","d","e","f","g","h","i","j","k","l","m","n",
"o","p","q","r","s","t","u","v","w","x","y","z",
"A","B","C","D","E","F","G","H","I","J","K","L","M","N",
"O","P","Q","R","S","T","U","V","W","X","Y","Z",
"0","1","2","3","4","5","6","7","8","9","-","+"]
    static let charmapSet =
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-+"

    var tinyWord : String {
        let from = self.uuid
        let bytes = [from.0, from.1, from.2,from.3,from.4,from.5,from.6,from.7,from.8,from.9,
                     from.10, from.11, from.12,from.13,from.14,from.15]
        
        // split in 6-bits ints
        var sbytes : [UInt8] = []
        for i in 0..<5 {
            let b1 = bytes[i*3]
            let b2 = bytes[i*3+1]
            let b3 = bytes[i*3+2]
            
            let sb1 = b1 >> 2
            let sb2 = (b1 & 0x03) << 4 | (b2 >> 4)
            let sb3 = (b2 & 0x0f) << 2 | (b3 >> 6)
            let sb4 = (b3 & 0x3f)
            sbytes += [sb1,sb2,sb3,sb4]
        }
        // all done but the last byte
        sbytes.append(bytes[15]>>2)
        sbytes.append(bytes[15]&0x03)
         
        var result = ""
        for i in sbytes {
            result += UUID.charmap[Int(i)]
        }
        
        return result
    }
}

The reverse procedure is a bit longer, because we have to stage the values in groups of 4 sexytes for 3 bytes, and do a couple of sanity checks.

extension UUID {
    init?(tinyWord: String) {
        if tinyWord.count != 22 || !tinyWord.allSatisfy({ UUID.charmapSet.contains($0) }) { return nil }
        var current : UInt8 = 0
        var bytes : [UInt8] = []
        for (n,c) in tinyWord.enumerated() {
            guard let idx32 = UUID.charmap.firstIndex(of: String(c)) else { return nil }
            let idx = UInt8(idx32)
            if n >= 20 { // last byte
                if n == 20 {
                    current = idx << 2
                } else {
                    current |= idx
                    bytes.append(current)
                }
            } else if n % 4 == 0 { // first in cycle
                current = idx << 2
            } else if n % 4 == 1 { // combine
                current |= idx >> 4
                bytes.append(current)
                current = (idx & 0xf) << 4
            } else if n % 4 == 2 { // combine
                current |= (idx >> 2)
                bytes.append(current)
                current = (idx & 0x3) << 6
            } else {
                current |= idx
                bytes.append(current)
                current = 0
            }
        }
        
        // double check
        if bytes.count != 16 { return nil }
        
        self.init(uuid: (bytes[0], bytes[1], bytes[2], bytes[3], bytes[4], bytes[5], bytes[6], bytes[7], bytes[8], bytes[9],
                         bytes[10], bytes[11], bytes[12], bytes[13], bytes[14], bytes[15]))
    }
}

Let's test this!

let u = UUID()
let w = u.tinyWord
print(u.uuidString+" : \(u.uuidString.count)")
print(w+" : \(w.count)")
print(UUID(tinyWord: w)!)
30A5CB6E-778F-4218-A333-3BC8B5A40B65 : 36
mkxlBNEpqHIJmZViTAqlzb : 22
30A5CB6E-778F-4218-A333-3BC8B5A40B65

Now I have a "password friendly" way to pass UUIDs around. Is it a waste of time (because I could just pass the UUIDs around, they aren't that much longer)? Who knows? It makes my shortened URLs a bit less intimidating 😁


Coder & Codable

Working on CredentialsToken, it struck me as inconcievable that we couldn't serialize objects to dictionaries without going through JSON. After all, we had this kind of mapping in Objective-C (kind of), why not in Swift?

Thus started a drama in 3 acts, one wayyyyyyy more expository than the others.

TL;DR Gimme the code!

Obviously, someone has done it before. Swift is a few years old now and this is something a lot of people need to do (from time to time and only when absolutely needed, admittedly), right? JSON is what it is (🤢) but it's a standard, and we sometimes need to manipulate the data in memory without going through 2 conversions for everything (JSON <-> Data <-> String), right?

Open your favorite search engine and look for some Encoder class that's not JSON or Property List. I'll wait. Yea. As of this writing, there's only one, and I'm not sure what it does exactly: EmojiEncoder

So, next step is the Scouring of Stack Overflow. Plenty of questions pertaining to that problem, almost every single answer being along the lines of "look at the source code for JSONEncoder/JSONDecoder, it shouldn't be so hard to make one". But, I haven't seen anyone actually publishing one.

Looking at the source code for JSONDecoder is, however, a good idea, let's see if it's as simple as the "it's obvious" gang makes it to be.

Act 2: The Source

The JSONEncoder/JSONDecoder source is located here.

It's well documented and well referenced, and has to handle a ton of edge cases thanks to the formless nature of JSON itself (🤢).

To all of you who can read this 2500+ lines swift file and go "oh yea, it's obvious", congratulations, you lying bastards.

A Bit of Theory

At its heart, any parser/generator pair is usually a recursive, stack-based algorithm: let's look at a couple step-by-step examples.

Let's imagine a simple arithmetic program that need to read text input or spit text out. First, let's look at the data structure itself. Obviously, it's not optimal, and you need to add other operations, clump them together under an Operation supe-type for maximum flexibility, etc etc.

protocol Arith {
    func evaluate() -> Double
}

struct Value : Arith {
    var number : Double
    
    func evaluate() -> Double {
        return number
    }
}

struct OpPlus : Arith {
    var left : Arith
    var right : Arith
    
    func evaluate() -> Double {
        return left.evaluate() + right.evaluate()
    }
}

let op = OpPlus(left: OpPlus(left: Value(number: 1), right: Value(number: 1)), right: OpPlus(left: Value(number: 1), right: Value(number: 1)))

op.evaluate() // 4

How would we go about printing what that might look like as user input? Because those last couple of lines are going to get our putative customers in a tizzy...

"Easy", some of you will say! Just a recursive function, defined in the protocol that would look like this:

    func print() -> String

In Value, it would be implemented thus:

    func print() -> String {
        return String(number)
    }

And in OpPlus:

   func print() -> String {
        return "(" + left.print() + " + " + right.print() + ")"
    }

The end result for the example above would be "((1.0 + 1.0) + (1.0 + 1.0))"

The stack here is implicit, it's actually the call stack. left.print() is called before returning, the result is stored on the call stack, and when it's time to assemble the final product, it is popped and used.

That's the simple part, anyone with some experience in formats will have done this a few times, especially if they needed to output some debug string in a console. Two things to keep in mind:

  • we didn't have to manage the stack
  • there is no optimization of the output (we left all the parentheses, even though they weren't strictly needed)

How would we go about doing the reverse? Start with "((1.0 + 1.0) + (1.0 + 1.0))" and build the relevant Arith structure out of it? Suddenly, all these implicit things have to become fully explicit, and a lot fewer people have done it.

Most of the developers who've grappled with this problem ended up using yacc and lex variants, which allows to automate big parts of the parsing and making a few things implicit again. But for funsies, we'll try and thing about how those things would work in an abstract (and simplified) way.

I'm a program reading that string. Here's what happens:

  • An opening parenthesis! This is the beginning of an OpPlus, I'll create a temporary one, call it o1 and put it on the stack.
  • Another... Damn. OK, I'll create a second one, call it o2, put it on the stack.
  • Ah! a number! So, this is a Value. I'll create it as v1 and put it on the stack
  • A plus sign. Cool, that means that whatever I read before is the left side of an OpPlus. What's the currently investigated operation? o2. OK then, o2.left = v1
  • Another number. It's v2
  • Closing parenthesis! Finally. So the most recent OpPlus should have whatever is on top of the stack as the right side of the plus. o2.right = v2, and now the operation is complete, so we can pop it and carry on. We remove v1 and v2 from the stack.
  • A plus sign! Really? Do I have an open OpPlus? I do! it's o1, and it means that o2 is its left side. o1.left = o2
  • and we continue like this...
(I know actual LALR engineers are screaming at the computer right now, but it's my saga, ok?)

It's not quite as easy as a recursive printing function, now, is it? This example doesn't even begin to touch on most parsing issues, such as variants, extensible white space, and malformed expressions.

Why Is it Relevant?

The Encoder/Decoder paradigm of Swift 4 borrows very heavily from this concept. You "consume" input, spitting the transformed output if and when there is no error in the structure, recursively and using a stack. In the JSON implementation, you can see clearly that the *Storage classes are essentially stacks. The encode functions take items of a given structure, disassemble them, and put them on the stack, which is collapsed at the end to produce whatever it is you wanted as output, while decode functions check that items on stack match what is expected and pop them as needed to assemble the structures.

The main issue that these classes have to deal with is delegation.

The core types ( String, Int, Bool, etc...) are easy enough because there aren't many ways to serialize them. Some basic types, like Date are tricky, because they can be mapped to numbers (epoch, time since a particular date, etc) or to strings (iso8601, for instance), and have to be dealt with explicitely.

The problem lies with collections, i.e. arrays and dictionaries. You may look at JSON and think objects are dictionaries too, but it's not quite the case... 🤮

Swift solves this by differenciating 3 coding subtypes:

  • single value (probably a core type, or an object)
  • unkeyed (an array of objects) - which is a misnomer, since it has numbers as keys
  • keyed (a dictionary of objects, keyed by strings)

Any Encoder and Decoder has to manage all three kinds. The recursion part of this is that there is a high probability that a Codable object will be represented by a keyed decoder, with the property names as keys and the attached property values.

Our Value struct would probably be represented at some point by something that looks like ["number":1], and one of the simplest OpPlus by something like ["left":["number":1], "right":["number":1]]. See the recursion now? Not to mention, any property could be an array or a dictionary of Codable structures.

Essentially, you have 4 classes (more often than not, the single value is implemented in the coder itself, making it only 3 classes), that will be used to transcode our input, through the use of a stack, depending on what the input type is:

  • if it's an array, we go with the UnkeyedEncodingContainerProtocol
  • if it's a dictionary, we go with the KeyedEncodingContainerProtocol
  • if it's an object, we go with SingleValueEncodingContainerProtocol
    * if it's a core type, we stop the recursion and push a representation on the stack, or pop it from the stack
    * if it's a Codable object, we start a keyed process on the sub-data

Said like that, it's easy enough. Is coding it easy?

Act 3: The Code

You have managed to wade through all this without having to pop pill after pill to either stay awake or diminish the planetary size of your headache? Congratulations!

So is it that easy? Yes and no. All of the above does allow to follow along the code of how it works but there are a few caveats to write the Codable <-> [String:Any?] classes. It's all about the delegation and the (not so) subtle difference between an object and a dictionary.

If we look at our Value structure, it is "obvious" that it is represented by something like ["number":1]. What if we have nullable properties? What do we do with [] or ["number":1,"other":27]? The class with its properties and the dictionary are fundamentally different types, even though mapping classes to dictionaries is way easier than the reverse. On the other hand, type assumptions on dictionaries are way easier than structures. All 3 exemples above are indubitably dictionaries, whereas the constraint on any of them to be "like a Value" is a lot harder.

Enter the delegation mechanism. There is no way for a generic encoder/decoder to know how many properties a structure has and what their types may be. So, the Codable type requires your data to explain the way to map your object to a keyed system, through the decode(from: Decoder) and encode(to: Encoder) functions.

If you've never seen them, it's because you can afford to use only structs, which generate them automagically (you bastard).

In essence, those functions ask of you to take your properties (which have to be Codable) and provide a key to store or retrieve them. You will be the one who are going to ensure that the dictionary mapping makes sense.

Conclusion, Epilogue, All That Jazz

OK, so, either I'm dumb and it really was obvious, but it so happens that after 5 years, no one has ever coded it because no one needed it, or everyone has their own "obvious" implementation and no one published it. Or I'm not that dumb and that project will serve a purpose for somebody.

There are, however, a few particularities to my implementation that stem from choices I made along the way.

Certain types are "protected", that is they aren't (de)coded using their own implementation of Codable. For instance, Date is transformed into the number of milliseconds since its reference date, but given that we serialize to and from dictionaries in memory, there's no need to do it. They are considered as "core" types, even though they aren't labelled as such in the language. Those exception include:

  • Date / NSDate
  • Data / NSData
  • URL / NSURL
  • Decimal / NSDecimalNumber

Unlike JSON, they don't need to be transformed into an adjacent type, they are therefore allowed to retain their own.

The other elephant in the room is polymorphic in nature: if I allow decoding attemps of Any, or encoding attempts of Any, my functions can look wildly different:

  • decode can return a Codable, an array of Codable or a dictionary with Codable values
  • same goes for encode which should be consuming all 3 variants, plus nil or nullable parameters.

There is therefore an intermediary type to manage those cases. It's invisible from the outside in the case of decode, the function itself deciding what it's dealing with, but for encode, the function needs to return a polymorphic type, rather than an Any?.

My choice has been to use the following enumeration:

public enum CoderResult {
    case dictionary([String:Any?])
    case array([Any?])
    case single(Any)
    case `nil`
}

With attached types, you know exactly what you're getting:

public func encode<T : Encodable>(_ value: T) throws -> CoderResult { ... }

let r = try? DictionaryCoder.encode(v) ?? .nil
switch r {
    case .dictionary(let d): // d contains the [String:Any?] representation
    case .array(let a): // a contains the [Any?] representation
    case .single(let v): // v is the single value, which is kind of useless but there nonetheless
    case .nil: // no output
}

The repository is available on Github