Introducing FuzzyTests

TL;DR: Grab it here : Github repo

Unit testing is painful amirite?

Writing good tests for your code very often means spending twice as much time coding them than on the things you test themselves.

It is good practice though to verify as much as possible that the code you write is valid, especially if that code is going to be public or included in someone else's work.

In my workflow I insist on the notion of ownership :

The bottomline for me is this: if there are several people on a project, I want clearly defined ownership. It's not that I won't fix a bug in someone else's code, just that they own it and therefore have to have a reliable way of testing that my fix works.
Tests solve part of that problem. My code, my tests. If you fix my code, run my tests, I'm fairly confident that you didn't wreck the whole thing. And that I won't have to spend a couple of hours figuring out what it is that you did.

This a a very very very light constraint when you compare it to methodologies like TDD, but it's a required minimum for me.

Plus, it's not that painful, except...

Testing every case

In my personal opinion, the tests that are hardest to do right are the ones that have a very large input range, with a few failure/continuity points.

If, for instance, and completely randomly, of course, you had an application where the tilt of the phone changes the state of the app (locked/unlocked, depending on whether the phone is lying flat-ish on the table or not:

  • from -20ΒΊ to 20ΒΊ the app is locked
  • from 160ΒΊ to 200ΒΊ the app is locked
  • the rest of the time it's not locked
  • All of that modulo 360, of course

So you have a function that takes the current pitch angle, and returns if we should lock or not:

func pitchLock(_ angle: Double) -> Bool {
  // ...
}

Does it work? Does it work modulo 360? What would a unit test for that function even look like? A for loop?

I have been looking for a way to do that kind of test for a while, which is why I published HoledRange (now Domains πŸ˜‡) a while back, as part of my hacks.

What I wanted is to write my tests kind of like this (invalid code on so many levels):

for x in [-1000.0...1000.0].randomSelection {
  let unitCircleAngle = x%360.0
  if unitCircleAngle >= 340 || unitCircle <= 20 {
    XCTAssert(pitchLock(x))
  } else if unitCircleAngle >= 160 && unitCircle <= 200 {
    XCTAssert(pitchLock(x))
  } else {
    XCTAssertFalse(pitchLock(x))
  }
}

This way of testing, while vaguely valid, leaves so many things flaky:

  • how many elements in the random selection?
  • how can we make certain values untestable (because we address them somewhere else, for instance)
  • what a lot of boilerplate if I have multiple functions to test on the same range of values
  • I can't reuse the same value for multiple tests to check function chains

Function builders

I have been fascinated with @_functionBuilder every since it was announced. While I don't feel enthusiastic about SwiftUI (in french), that way to build elements out of blocks is something I have wanted for years.

Making them is a harrowing experience the first time, but in the end it works!

What I wanted to use as syntax is something like this:

func myPlus(_ a: Int, _ b: Int) -> Int

DomainTests<Int> {
    Domain(-10000...10000)
    1000000
    Test { (a: Int) in
        XCTAssert(myPlus(a, 1) == a+1, "Problem with value\(a)")
        XCTAssert(myPlus(1, a) == a+1, "Problem with value\(a)")
    }
    Test { (a: Int) in
        let random = Int.random(in: -10000...10000)
        XCTAssert(myPlus(a, random) == a+random, "Problem with value\(a)")
        XCTAssert(myPlus(random, a) == a+random, "Problem with value\(a)")
   }
}.random()

This particular DomainTests runs 1000000 times over $$D=[-10000;10000]$$ in a random fashion.

Note the Test builder that takes a function with a parameter that will be in the domain, and the definition that allows to define both the test domain (mandatory) and the number of random iterations (optional).

If you want to test every single value in a domain, the bounding needs to be Strideable, ie usable in a for-loop.

DomainTests<Int> {
    Domain(-10000...10000)
    Test { (a: Int) in
        XCTAssert(myPlus(a, 1) == a+1, "Problem with value\(a)")
        XCTAssert(myPlus(1, a) == a+1, "Problem with value\(a)")
    }
    Test { (a: Int) in
        let random = Int.random(in: -10000...10000)
        XCTAssert(myPlus(a, random) == a+random, "Problem with value\(a)")
        XCTAssert(myPlus(random, a) == a+random, "Problem with value\(a)")
   }
}.full()

Conclusion

A couple of hard working days plus a healthy dose of using that framework personally means this should be ready-ish for production.

If you are a maths-oriented dev and shiver at the idea of untested domains, this is for you 😬

[Dev Diary] Vanilla Is The Best Flavor

I have a weird thing with the multiplication of command-line tools and gizmos: I forget them.

Do I want to run supercool gitlab commands? Hell yea! Do I need to install 12 utilities (or code a new one) to archive every project older than a year? I hope not...

The setup

I am a sucker for well documented fully linted code. But the thing is, all the gizmos that help me do that have to be installed in the system or in my ~/bin and I have to remember to update them, and I have to install them on my CD machine, and on every new environment I setup, and make sure they are still compatible with the toolchain, and it freaks me out, ok?

Plus,watching the students try to do it is painful.

So, given a 100% vanilla swift-capable environment, can I manage to run documentation and linting?

The idea

We have Swift Package Manager, which is now a first-class citizen in XCode, but it can't run shell script phases without some nasty hacks.

What if some targets were (wait for it) built to do the documentation and the linting?

Linting

One of the most popular linters out there is swiftlint, and it supports SPM. It can also build a library instead of an executable, which means one of my targets could just run the linting and output it in the terminal.

In the Package.swift file, all I needed to do was add the right dependency, and the right product and voila!

let package = Package(
	name: "WonderfulPackage",
    products: [
    	// ...
         .executable(name: "Lint", targets: ["Lint"])
   	],
    dependencies: [
        // Dependencies declare other packages that this package depends on.
        // .package(url: /* package url */, from: "1.0.0"),
		// ... normal dependencies
        .package(url: "https://github.com/realm/SwiftLint", from: "0.39.0")
    ],
    targets: [
    	// ... normal targets
        .target(
            name: "Lint",
            dependencies: ["SwiftLintFramework"]),
	]
)
Package.swift

Now, SPM is very strict with paths, so I had to put a file named main.swift in the Sources/<target>/ directory, in this case Sources/Lint.

Running the linter is fairly straightforward, and goes in the main.swift file:

// Lint command main
// runs SourceDocs
import Foundation
import SwiftLintFramework

let config = Configuration(path: FileManager.default.currentDirectoryPath+"/.swiftlint.yml",
                           rootPath: FileManager.default.currentDirectoryPath,
                           optional: true,
                           quiet: true,
                           enableAllRules: false,
                           cachePath: nil,
                           customRulesIdentifiers: [])

for lintable in config.lintableFiles(inPath: FileManager.default.currentDirectoryPath, forceExclude: false) {
    let linter = Linter(file: lintable, configuration: config)
    let storage = RuleStorage()
    let collected = linter.collect(into: storage)
    let violations = collected.styleViolations(using: storage)
    if !violations.isEmpty {
        print(EmojiReporter.generateReport(violations))
    }
}

print("πŸŽ‰ All done!")
Sources/Lint/main.swift

Setup the .swiftlint file as usual, and run the command via swift run Lint

Sources/WonderfulPackage/main.swift
⛔️ Line 15: Variable name should be between 3 and 40 characters long: 'f'
⚠️ Line 13: Arguments can be omitted when matching enums with associated types if they are not used.
⚠️ Line 12: Line should be 120 characters or less: currently 143 characters

Documentation

Documentation is actually trickier, because most documentation tools out there aren't built in swift, or compatible with SPM. Doxygen and jazzy are great, but they don't fit my needs.

I found a project that was extremely promising called SourceDocs by Eneko Alonso, but it isn't a library, so I had to fork it and make it into one (while providing a second target to generate the executable if needed). One weird issue is that SPM doesn't like subtargets to bear the same name so I had to rename a couple of them to avoid conflict with Swift Argument Parser (long story).

I finally found myself in the same spot than with the linter. All I needed to do was create another target, and Bob's you're uncle. Well actually he was mine. I digress.

let package = Package(
	name: "WonderfulPackage",
    products: [
    	// ...
         .executable(name: "Docs", targets: ["Docs"])
   	],
    dependencies: [
        // Dependencies declare other packages that this package depends on.
        // .package(url: /* package url */, from: "1.0.0"),
		// ... normal dependencies
        .package(url: "https://github.com/krugazor/SourceDocs", from: "0.7.0")
    ],
    targets: [
    	// ... normal targets
        .target(
            name: "Docs",
            dependencies: ["sourcedocslib"])
	]
)
Package.swift

Another well-placed main file:

// Docs command main
// runs SourceDocs
import Foundation
import SourceDocs

do {
    switch try SourceDocs().runOnSPM(moduleName: "WonderfulPackage",
                                     outputDirectory: FileManager.default.currentDirectoryPath+"/Documentation") {
    case .success:
        print("Successful run of the documentation phase")
    case .failure(let failure):
        print(failure.localizedDescription)
    }
} catch {
    print(error.localizedDescription)
}
Sources/Docs/main.swift

Now, the command swift run Docs generates the markdown documentation in the Documentation directory.

Parsing main.swift (1/1)
Removing reference documentation at 'WonderfulPackage/Documentation/KituraStarter'... βœ”
Generating Markdown documentation...
  Writing documentation file: WonderfulPackage/Documentation/WonderfulPackage/structs/WonderfulPackage.md βœ”
  Writing documentation file: WonderfulPackage/Documentation/WonderfulPackage/README.md βœ”
Done πŸŽ‰
Successful run of the documentation phase

Conclusion

βœ… Vanilla swift environment
βœ… No install needed
βœ… Works on Linux and MacOS
βœ… Integrated into SPM
⚠️ When running in XCode, the current directory is always wonky for packages

[Utilities] Time Tracking Structure

Every now and again (especially when training a model), I need to have a guesstimate as to how long a "step" takes, and how long the process will take, so I wrote myself a little piece of code that does that. Because I've had the question multiple times (and because I think everyone codes their own after a while), here's mine. Feel free to use it

/// Structure that keeps track of the time it takes to complete steps, to average or estimate the remaining time
public struct TimeRecord {
    /// The number of steps to keep for averaging. 5 is a decent default, increase or decrease as needed
    /// Minimum for average is 2, obvioulsy
    public var smoothing: Int = 5 {
        didSet {
            smoothing = max(smoothing, 2) // minimum 2 values
        }
    }
    /// dates for the steps
    private var dates : [Date] = []
    /// formatter for debug print and/or display
    private var formatter = DateComponentsFormatter()
    public var formatterStyle : DateComponentsFormatter.UnitsStyle {
        didSet {
            formatter.allowedUnits = [.hour, .minute, .second, .nanosecond]
            formatter.unitsStyle = formatterStyle
        }
    }
    
    public init(smoothing s: Int = 5, style fs: DateComponentsFormatter.UnitsStyle = .positional) {
        smoothing = max(s, 2)
        formatterStyle = fs
        formatter = DateComponentsFormatter()
        // not available everywhere
        // formatter.allowedUnits = [.hour, .minute, .second, .nanosecond]
        formatter.allowedUnits = [.hour, .minute, .second]
        formatter.zeroFormattingBehavior = .pad
        formatter.unitsStyle = fs
    }
    
    /// adds the record for a step
    /// - param d: the date of the step. If unspecified, current date is taken
    mutating func addRecord(_ d: Date? = nil) {
        if let d = d { dates.append(d) }
        else { dates.append(Date()) }
        while(dates.count > smoothing) { dates.remove(at: 0) }
    }
    
    /// gives the average delta between two steps (in seconds)
    var averageDelta : Double {
        if dates.count <= 1 { return 0.0 }
        var totalTime = 0.0
        for i in 1..<dates.count {
            totalTime += dates[i].timeIntervalSince(dates[i-1])
        }
        
        return totalTime/Double(dates.count)
    }
    
    /// gives the average delta between two steps in human readable form
    /// - see formatterStyle for options, default is "02:46:40"
    var averageDeltaHumanReadable : String {
        let delta = averageDelta
        return formatter.string(from: delta) ?? ""
    }
    
    /// given a number of remaining steps, gives an estimate of the time left on the process (in s)
    func estimatedTimeRemaining(_ steps: Int) -> Double {
        return Double(steps) * averageDelta
    }
    
    /// given a number of remaining steps, gives an estimate of the time left on the process in human readable form
    /// - see formatterStyle for options, default is "02:46:40"
    func estimatedTimeRemainingHumanReadable(_ steps: Int) -> String {
        let delta = estimatedTimeRemaining(steps)
         return formatter.string(from: delta) ?? ""
    }
}

When I train a model, I tend to use it that way:

// prepare model
var tt = TimeRecord()
tt.addRecord()

while currentEpoch < maxEpochs {
  // train the model
  tt.addRecord()
  if currentEpoch > 0 && currentEpoch % 5 == 0 {
  	print(tt.averageDeltaHumanReadable + " per epoch, " 
    	+ tt.(estimatedTimeRemainingHumanReadable(maxEpochs - currentEpoch) + " remaining"
    )
  }
}

Random Wednesday

I had absolutely no idea that /dev/random was so controversial

> That's all good and nice, but even the man page for /dev/(u)random contradicts you! Does anyonewho knows about this stuff actually agree with you?

No, it really doesn't. It seems to imply that /dev/urandom is insecure for cryptographic use, unless you really understand all that cryptographic jargon.

Sick burn

From Myths about /dev/urandom

[ML] Swift TensorFlow (Part 3)

This is the last part of a 3-parts series. In part 1, I tried to make sense of how it works and what we are trying to achieve, and in part 2, we set up the training loop.

Model Predictions

We have a trained model. Now what?

Remember, a model is a series of giant matrices that take an input like you trained it on, and spits out the list of probabilities associated with the outputs you trained it on. So all you have to do is feed it a new input and see what it tells you:

let input = [1.0, 179.0, 115.0]
let unlabeled : Tensor<Float> = Tensor<Float>(shape: [1, 3], scalars: input)
let predictions = model(unlabeled)
let logits = predictions[0]
let classIdx = logits.argmax().scalar! // we take only the best guess
print(classIdx)
17

Cool.

Cool, cool.

What?

Models deal with numbers. I am the one who assigned numbers to words to train the model on, so I need a translation layer. That's why I kept my contents structure around: I need it for its vocabulary map.

The real code:

let w1 = "on"
let w2 = "flocks"
let w3 = "settlement"

var indices = [w1, w2, w3].map {
    Float(contents.indexHelper[$0.hash] ?? 0)
}

var wordsToPredict = 50
var sentence = "\(w1) \(w2) \(w3)"

while wordsToPredict >= 0 {
    let unlabeled : Tensor<Float> = Tensor<Float>(shape: [1, 3], scalars: indices)
    let predictions = model(unlabeled)
    for i in 0..<predictions.shape[0] {
        let logits = predictions[i]
        let classIdx = logits.argmax().scalar!
        let word = contents.vocabulary[Int(classIdx)]
        sentence += " \(word)"
        
        indices.append(Float(classIdx))
        indices.remove(at: 0)
        wordsToPredict -= 1
    }
}

print(sentence)
on flocks settlement or their enter the earth; their only hope in their arrows, which for want of it, with a thorn. and distinction of their nature, that in the same yoke are also chosen their chiefs or rulers, such as administer justice in their villages and by superstitious awe in times of old.

Notice how I remove the first input and add the one the model predicted at the end to keep the loop running.

Seeing that, it kind of makes you think about the suggestions game when you send text messages eh? 😁

Model Serialization

Training a model takes a long time. You don't want a multi-hour launch time on your program every time you want a prediction, and maybe you even want to keep updating the model every now and then. So we need a way to store it and load it.

Thankfully, tensors are just matrices, so it's easy to store an array of arrays of floats, we've been doing that forever. They are even Codable out of the box.

In my particular case, the model itself needs to remember a few things to be recreated:

  • the number of inputs and hidden nodes, in order to recreate the Reshape and LSTMCell layers
  • the internal probability matrices of both RNNs
  • the weigths and biases correction matrices

Because they are codable, any regular swift encoder will work, but I know some of you will want to see the actual matrices, so I use JSON. It is not the most time or space efficient, it does not come with a way to validate it, and JSON is an all-around awful storage format, but it makes a few things easy.

extension TextModel { // serialization
    struct TextModelParams : Codable {
        var inputs : Int
        var hidden : Int
        var rnn1w : Tensor<Float>
        var rnn1b : Tensor<Float>
        var rnn2w : Tensor<Float>
        var rnn2b : Tensor<Float>
        var weights : Tensor<Float>
        var biases : Tensor<Float>
    }
    func serializedParameters() throws -> Data {
        return try JSONEncoder().encode(TextModelParams(
        inputs: self.inputs,
        hidden: self.hidden,
        rnn1w: self.rnn1.cell.fusedWeight,
        rnn1b: self.rnn1.cell.fusedBias,
        rnn2w: self.rnn2.cell.fusedWeight,
        rnn2b: self.rnn1.cell.fusedBias,
        weights: self.weightsOut,
        biases: self.biasesOut))
    }
    
    struct TextModelSerializationError : Error { }
    init(_ serialized: Data) throws {
        guard let params = try? JSONDecoder().decode(TextModelParams.self, from: serialized) else { throw TextModelSerializationError() }
        
        inputs = params.inputs
        hidden = params.hidden
        reshape = Reshape<Float>([-1, inputs])
        
        var lstm1 = LSTMCell<Float>(inputSize: 1, hiddenSize: hidden)
        lstm1.fusedWeight = params.rnn1w
        lstm1.fusedBias = params.rnn1b
        var lstm2 = LSTMCell<Float>(inputSize: hidden, hiddenSize: hidden)
        lstm2.fusedWeight = params.rnn2w
        lstm2.fusedBias = params.rnn2b
        
        rnn1 = RNN(lstm1)
        rnn2 = RNN(lstm2)
        
        weightsOut = params.weights
        biasesOut = params.biases
        correction = weightsOut+biasesOut
   }
}

My resulting JSON file is around 70MB (25 when bzipped), so not too bad.

When you serialize your model, remember to serialize the vocabulary mappings as well! Otherwise, you will lose the word <-> int translation layer.

That's all , folks!

This was a quick and dirty intro to TensorFlow for some, Swift for others, and SwiftTensorflow for most.

It definitely is a highly specialized and quite brittle piece of software, but it's a good conversation piece next time you hear that ML is going to take over the world.

Feel free to drop me comments or questions or corrections on Twitter!