Coder & Codable

Working on CredentialsToken, it struck me as inconcievable that we couldn't serialize objects to dictionaries without going through JSON. After all, we had this kind of mapping in Objective-C (kind of), why not in Swift?

Thus started a drama in 3 acts, one wayyyyyyy more expository than the others.

TL;DR Gimme the code!

Obviously, someone has done it before. Swift is a few years old now and this is something a lot of people need to do (from time to time and only when absolutely needed, admittedly), right? JSON is what it is (🤢) but it's a standard, and we sometimes need to manipulate the data in memory without going through 2 conversions for everything (JSON <-> Data <-> String), right?

Open your favorite search engine and look for some Encoder class that's not JSON or Property List. I'll wait. Yea. As of this writing, there's only one, and I'm not sure what it does exactly: EmojiEncoder

So, next step is the Scouring of Stack Overflow. Plenty of questions pertaining to that problem, almost every single answer being along the lines of "look at the source code for JSONEncoder/JSONDecoder, it shouldn't be so hard to make one". But, I haven't seen anyone actually publishing one.

Looking at the source code for JSONDecoder is, however, a good idea, let's see if it's as simple as the "it's obvious" gang makes it to be.

Act 2: The Source

The JSONEncoder/JSONDecoder source is located here.

It's well documented and well referenced, and has to handle a ton of edge cases thanks to the formless nature of JSON itself (🤢).

To all of you who can read this 2500+ lines swift file and go "oh yea, it's obvious", congratulations, you lying bastards.

A Bit of Theory

At its heart, any parser/generator pair is usually a recursive, stack-based algorithm: let's look at a couple step-by-step examples.

Let's imagine a simple arithmetic program that need to read text input or spit text out. First, let's look at the data structure itself. Obviously, it's not optimal, and you need to add other operations, clump them together under an Operation supe-type for maximum flexibility, etc etc.

protocol Arith {
    func evaluate() -> Double
}

struct Value : Arith {
    var number : Double
    
    func evaluate() -> Double {
        return number
    }
}

struct OpPlus : Arith {
    var left : Arith
    var right : Arith
    
    func evaluate() -> Double {
        return left.evaluate() + right.evaluate()
    }
}

let op = OpPlus(left: OpPlus(left: Value(number: 1), right: Value(number: 1)), right: OpPlus(left: Value(number: 1), right: Value(number: 1)))

op.evaluate() // 4

How would we go about printing what that might look like as user input? Because those last couple of lines are going to get our putative customers in a tizzy...

"Easy", some of you will say! Just a recursive function, defined in the protocol that would look like this:

    func print() -> String

In Value, it would be implemented thus:

    func print() -> String {
        return String(number)
    }

And in OpPlus:

   func print() -> String {
        return "(" + left.print() + " + " + right.print() + ")"
    }

The end result for the example above would be "((1.0 + 1.0) + (1.0 + 1.0))"

The stack here is implicit, it's actually the call stack. left.print() is called before returning, the result is stored on the call stack, and when it's time to assemble the final product, it is popped and used.

That's the simple part, anyone with some experience in formats will have done this a few times, especially if they needed to output some debug string in a console. Two things to keep in mind:

  • we didn't have to manage the stack
  • there is no optimization of the output (we left all the parentheses, even though they weren't strictly needed)

How would we go about doing the reverse? Start with "((1.0 + 1.0) + (1.0 + 1.0))" and build the relevant Arith structure out of it? Suddenly, all these implicit things have to become fully explicit, and a lot fewer people have done it.

Most of the developers who've grappled with this problem ended up using yacc and lex variants, which allows to automate big parts of the parsing and making a few things implicit again. But for funsies, we'll try and thing about how those things would work in an abstract (and simplified) way.

I'm a program reading that string. Here's what happens:

  • An opening parenthesis! This is the beginning of an OpPlus, I'll create a temporary one, call it o1 and put it on the stack.
  • Another... Damn. OK, I'll create a second one, call it o2, put it on the stack.
  • Ah! a number! So, this is a Value. I'll create it as v1 and put it on the stack
  • A plus sign. Cool, that means that whatever I read before is the left side of an OpPlus. What's the currently investigated operation? o2. OK then, o2.left = v1
  • Another number. It's v2
  • Closing parenthesis! Finally. So the most recent OpPlus should have whatever is on top of the stack as the right side of the plus. o2.right = v2, and now the operation is complete, so we can pop it and carry on. We remove v1 and v2 from the stack.
  • A plus sign! Really? Do I have an open OpPlus? I do! it's o1, and it means that o2 is its left side. o1.left = o2
  • and we continue like this...
(I know actual LALR engineers are screaming at the computer right now, but it's my saga, ok?)

It's not quite as easy as a recursive printing function, now, is it? This example doesn't even begin to touch on most parsing issues, such as variants, extensible white space, and malformed expressions.

Why Is it Relevant?

The Encoder/Decoder paradigm of Swift 4 borrows very heavily from this concept. You "consume" input, spitting the transformed output if and when there is no error in the structure, recursively and using a stack. In the JSON implementation, you can see clearly that the *Storage classes are essentially stacks. The encode functions take items of a given structure, disassemble them, and put them on the stack, which is collapsed at the end to produce whatever it is you wanted as output, while decode functions check that items on stack match what is expected and pop them as needed to assemble the structures.

The main issue that these classes have to deal with is delegation.

The core types ( String, Int, Bool, etc...) are easy enough because there aren't many ways to serialize them. Some basic types, like Date are tricky, because they can be mapped to numbers (epoch, time since a particular date, etc) or to strings (iso8601, for instance), and have to be dealt with explicitely.

The problem lies with collections, i.e. arrays and dictionaries. You may look at JSON and think objects are dictionaries too, but it's not quite the case... 🤮

Swift solves this by differenciating 3 coding subtypes:

  • single value (probably a core type, or an object)
  • unkeyed (an array of objects) - which is a misnomer, since it has numbers as keys
  • keyed (a dictionary of objects, keyed by strings)

Any Encoder and Decoder has to manage all three kinds. The recursion part of this is that there is a high probability that a Codable object will be represented by a keyed decoder, with the property names as keys and the attached property values.

Our Value struct would probably be represented at some point by something that looks like ["number":1], and one of the simplest OpPlus by something like ["left":["number":1], "right":["number":1]]. See the recursion now? Not to mention, any property could be an array or a dictionary of Codable structures.

Essentially, you have 4 classes (more often than not, the single value is implemented in the coder itself, making it only 3 classes), that will be used to transcode our input, through the use of a stack, depending on what the input type is:

  • if it's an array, we go with the UnkeyedEncodingContainerProtocol
  • if it's a dictionary, we go with the KeyedEncodingContainerProtocol
  • if it's an object, we go with SingleValueEncodingContainerProtocol
    * if it's a core type, we stop the recursion and push a representation on the stack, or pop it from the stack
    * if it's a Codable object, we start a keyed process on the sub-data

Said like that, it's easy enough. Is coding it easy?

Act 3: The Code

You have managed to wade through all this without having to pop pill after pill to either stay awake or diminish the planetary size of your headache? Congratulations!

So is it that easy? Yes and no. All of the above does allow to follow along the code of how it works but there are a few caveats to write the Codable <-> [String:Any?] classes. It's all about the delegation and the (not so) subtle difference between an object and a dictionary.

If we look at our Value structure, it is "obvious" that it is represented by something like ["number":1]. What if we have nullable properties? What do we do with [] or ["number":1,"other":27]? The class with its properties and the dictionary are fundamentally different types, even though mapping classes to dictionaries is way easier than the reverse. On the other hand, type assumptions on dictionaries are way easier than structures. All 3 exemples above are indubitably dictionaries, whereas the constraint on any of them to be "like a Value" is a lot harder.

Enter the delegation mechanism. There is no way for a generic encoder/decoder to know how many properties a structure has and what their types may be. So, the Codable type requires your data to explain the way to map your object to a keyed system, through the decode(from: Decoder) and encode(to: Encoder) functions.

If you've never seen them, it's because you can afford to use only structs, which generate them automagically (you bastard).

In essence, those functions ask of you to take your properties (which have to be Codable) and provide a key to store or retrieve them. You will be the one who are going to ensure that the dictionary mapping makes sense.

Conclusion, Epilogue, All That Jazz

OK, so, either I'm dumb and it really was obvious, but it so happens that after 5 years, no one has ever coded it because no one needed it, or everyone has their own "obvious" implementation and no one published it. Or I'm not that dumb and that project will serve a purpose for somebody.

There are, however, a few particularities to my implementation that stem from choices I made along the way.

Certain types are "protected", that is they aren't (de)coded using their own implementation of Codable. For instance, Date is transformed into the number of milliseconds since its reference date, but given that we serialize to and from dictionaries in memory, there's no need to do it. They are considered as "core" types, even though they aren't labelled as such in the language. Those exception include:

  • Date / NSDate
  • Data / NSData
  • URL / NSURL
  • Decimal / NSDecimalNumber

Unlike JSON, they don't need to be transformed into an adjacent type, they are therefore allowed to retain their own.

The other elephant in the room is polymorphic in nature: if I allow decoding attemps of Any, or encoding attempts of Any, my functions can look wildly different:

  • decode can return a Codable, an array of Codable or a dictionary with Codable values
  • same goes for encode which should be consuming all 3 variants, plus nil or nullable parameters.

There is therefore an intermediary type to manage those cases. It's invisible from the outside in the case of decode, the function itself deciding what it's dealing with, but for encode, the function needs to return a polymorphic type, rather than an Any?.

My choice has been to use the following enumeration:

public enum CoderResult {
    case dictionary([String:Any?])
    case array([Any?])
    case single(Any)
    case `nil`
}

With attached types, you know exactly what you're getting:

public func encode<T : Encodable>(_ value: T) throws -> CoderResult { ... }

let r = try? DictionaryCoder.encode(v) ?? .nil
switch r {
    case .dictionary(let d): // d contains the [String:Any?] representation
    case .array(let a): // a contains the [Any?] representation
    case .single(let v): // v is the single value, which is kind of useless but there nonetheless
    case .nil: // no output
}

The repository is available on Github