[Dev Diaries] Todotxt

For reasons I took an interest in playing with Todo.txt. Naturally, I went and looked for a swift package capable of importing and exporting this format. It is surprisingly hard to find something that is somewhat recent and full-featured.

Anyways, I went with the intention of contributing to this package, both because I like its simplicity and the fact that it uses the new DSL semantics for the regexps. And boy, oh boy, it tested me.

The trouble with my brain

We are all products of our history. Regular expressions don't faze me one bit. I used to write parts of compilers, lexers and yaccers are old friends, and the whole lambda reduction way regexps work isn't new.

BUT I know that they are a major hindrance for a lot of newcomers. Their syntax is unlike anything they have ever encountered (well, except for when they use * to mean "zero or more characters"), and don't get me started on the whole capture groups ( especially in Javascript ).

So I came into this project thinking it would be half an hour of clever regexps, and that's it.

Oh noes, there's Builders

In theory, I love builders. SwiftUI may not be my favorite way to build UIs, but it's fun and it works. Ditto for Ktor's HTML/CSS DSLs.

For regular expressions though... to me they fail to capture (haha) the mechanics, and will work only for relatively simple expressions.

let regex = Regex {
    "$"
    Capture {
      OneOrMore(.digit)
      "."
      Repeat(.digit, count: 2)
    }
    Anchor.endOfLine
}

This is equivalent to \$([0-9]+\.[0-9][0-9])$ I guess, and it's actually an example of why the DSL system is easier: $ and . have meaning in regexps and have to be escaped. It makes that line totally undecipherable for most newcomers.

Let's set aside the fact that nesting too many builders leads to incredibly long type inference (when it doesn't fail) and compile time. It comes with the territory, SwiftUI afficionados will tell you.

Let's also set aside the compactness of the regexps versus the DSL type because it's irrelevant to the point. In any language, you can write your code as compact and illegible as you like, it doesn't mean it's better. The tradeoffs you choose for your project in terms of inclusiveness (towards more junior members, or at least with regard to the tech stack you are using), and maintainability (look at you being all clever and writing some l33tk0d3 that you won't be able to debug in 6 months time... I learned that lesson a while ago and have decided that in most cases, readability is better than cleverness).

The problem (for me) with DSLs as applied to regular expressions is that it fails to capture the mechanics of backtracking and cursor position.

How regular expressions kind of work

As anyone who's taken an algorithmics class will tell you, strings are a pain. They can mostly only been read linearly (going from one character to the next or the previous), they have encodings and special cases (pun intended), and what they contain mean something different to the programmer than the actual text it contains (looking at you JSON).

"17.51" kind of is a number, and most definitely a string. If an error creeps in, it becomes something really really tedious to deal with. Like "17.,51". What the hell does THAT mean? 17.0 followed by 51? Or is the comma used in the French way to separate the decimal part, and . for multiplication? 17 x 0.51 ? And don't get me started on dates...

And so we have to define rules and use pattern matching. You can say that the string contains elements separated by commas, and each element is a number. And if it isn't, raise an error.

  • Look for every combination of n digits, followed by an optional point, followed by m optional digits, followed by a comma OR the end of the line
  • You should end up with a list of n∂.m∂ strings, that you can now parse to floating point
  • That should give an expression that looks like this:
    ([0-9]+\.{0,1}[0-9]*)(,|$)

There's a lot of problems to unpack here, like the fact that we kind of have to capture the group matching the separator/end of line to keep the expression simple, the weird dance of square and curly brackets and parenthesiseseseses, the whys and the hows of + and *, and so much more. But this will suffice for a brief explanation of how it works under the hood.

See, the point of it all is state machines and backtracking. When you use the regexp on a string, it keeps track of what it's done, what it's trying to do and where to go next. And if the current attempt fails, it reverts back to a previous state. The experts in regexps will groan and moan because I explain things too casually, and they should, but you don't need the nitty gritty details of how it actually works to grasp what I'm getting at.

So let's look at "17.,.6,51", and try to apply the regular expression.

  • We are at character 0,  and we are looking for 1 or more digits.
  • We find 17, so we look for 0 or 1 dot.
  • We find one, so we have 17., and we look for 0 or more digits
  • We find none, so we're still good, and we look for either a comma or the end of the line
  • We find a comma, so we have a complete match: 17.,
  • We push the result in the list of "found things", and advance the character to the end of the match, 3
  • We look again for a digit or more
  • We find none, so it can't be a match, so we advance by one character, and try again
  • Ditto for the dot
  • We're now at character 5, and try again
  • We have a digit, so we start a new match: 6
  • 0 or more dots, check. 0 or more digits, check. Comma, check. So we have a new match: 6
  • Start again, and we find 51
  • In the end we have 3 matches: 17., 6, and 51 (plus the comma or end of line, that is captured in a separate group)

Hold on. It was .6 not 6! Well... yes, but not according to our rules. We have to have a digit before the dot.

Never you mind, you say, we'll say we can have 0 or more instead of one or more:
([0-9]*\.{0,1}[0-9]*)(,|$)

Yes, and it almost works the same:

  • Up to 51, we get almost identical results, except we do capture .6 instead of 6
  • BUT we're not done. We try the regexp again at the end of the string, because we have to look at it all, and we get:
  • There is 0 digit, followed by 0 dot, followed by 0 digit, followed by the end of the line. So there's an extra match: ""...

The two important parts to grasp from all this are that starting at the character we are at, we try to match the pattern to the rest of the string, and if it fails, we advance by one character, and try again. Current state, and backtracking to try again.

The weeds: lookaheads and negations

We need to talk about a couple of special cases:

  • What if we need to not match something (reject some characters, for example)?
  • What if we need to match things only if up ahead in the string there's something else (or not something else)?

Like, say, the dot should appear only if there are digits afterwards (we say that 17. isn't a valid number but 17.0, and .6, are). Or if we are dealing with dates (😖), the rules for matching depend on whether or not we have a time afterwards.

Using "normal" regexps, negation it is kind of easy: [^abc] will match anything but a, b or c. Note that the caret that is used to usually indicate "beginning of a line" is used for negation within the match... Yea. Oh and you have to list every character you do not wish to match.

So kind of a solution is to use "lookaheads". They are basically patterns that match (or not) things before or after the current character, but do not move the current character cursor. They kind of start their own internal state, disregarding the global state of the parent expression.

.*(?=,) means trying to find a comma, and matching anything ( .* ) up till then.

"17.,.6,51" and this pattern will match "17.,.6". Why is it stopping at the last comma instead of the first? Because .* is greedy, it tries to get as many characters as possible. Comma is a character, so it matches, until it can't anymore because it's the last comma.

We can amend the .* to a .*? which will act as a lazy matcher, stopping at the first occurence, and giving us 17., .6, and two empty strings. Because, yes, after we found 17. we try again starting there, and it does match "anything with a comma at the end".

We can also look backwards and see if anything before the current character matches something, with .*(?<=,), and both negations of these with... !. Because yea, caret was really a weird idea. It'd look like this: .*(?!,) and .*(?!<,).

That was a long weeds expedition.

And the DSLs?

Ok, so the regexp mechanics is messy and requires a lot of concentration, so why the hell not make it prettier and/or more legible? After all I did say that I value maintainability over compactness most of the time.

Here's a regexp to match the title of a todo.txt file:

        let reference = Reference(String.self)
        let titlematch = OneOrMore {
          ChoiceOf {
            One(.whitespace)
            OneOrMore(.word)
          }
          NegativeLookahead {
            OneOrMore {
              OneOrMore(.word)
              One(":")
            }
          }
        }
        let regex = Regex {
          Anchor.startOfLine
          ZeroOrMore {
            ChoiceOf {
              One("x ")
              OneOrMore {
                "("
                (try? Regex("[A-Z]"))!
                ") "
              }
              One(.iso8601Date(timeZone: .gmt))
              One(.whitespace)
            }
          }
          NegativeLookahead {
            ChoiceOf {
              One(.iso8601Date(timeZone: .gmt))
              OneOrMore {
                One(.whitespace)
                OneOrMore(.word)
                One(":")
              }
            }
          }
          Capture(titlematch, as: reference,
                  transform: { word -> String in String(word) })
        }
        
        let match = input.firstMatch(of: regex)
        if let match {
          return match[reference].trimmingCharacters(in: CharacterSet.whitespacesAndNewlines) // because we capture the last space
        } else {
          return nil
        }

reference is the variable that should hold the mattern once it's matched. It's weird, but not any weirder than the numbered capture groups, so there.

Because of the Builder's very complex type inference, I had to split the "two" regexps, otherwise, the compiler failed to build. That's a legibility problem because that thing I store in titlematch is really a part of the bigger regexp, and the call site is now murkier. Plus it's a variable, so does it mean I can use it multiple times? In capture groups? Unclear, so probably not.

Then there is .word... it doesn't mean match a word. It means match a character that could be part of a word. Yes. A single character. When you're not paying attention, it can bite you really bad.

I'll set aside the fact that I capture an extra space every now and again. I'm sure that there is a way to avoid doing it, but the regexp was complicated enough as it was and I spent way too much time on it that I care to admit. It's a peculiarity of todo.txt rather than of the regex system, but a space is used to separate the elemets of the format, except when it's in the title.

Also, why is there an actual regular expression in there? Because insofar as I can understand it, you can't express [A-Z] (any capital latin letter) with this system. It's either .word (any character that can be part of a word, uppercase or not), or something else that's not a character.

Also, also, One doesn't accept a closure like OneOrMore does, so I'm forced to accept multiple priorities ( (LETTER) ) instead of only one, for some reason.

Let's dig into that one, shall we? So we're looking for some text that is after the start of the line, after the optional x that marks completion of the task, the (LETTER) marking the priority, and after the optional two dates (creation and optionally completion), but before any special character ( + and @ denote project and context, respectively) or key:value pair.

If you got that by looking at the code instantly, congratulations. I didn't. And I wrote it.

For reference, the regexp would look something like that (I don't include the date formatting, because I'm lazy and would probably do it after I extracted the matches):
^(x |\([A-Z]\) |[0-9-]| )+([\w ]+)(?! \w+:\w+)

OK, sure, this expression is a bit cryptic, and captures an extra space as well for the title. Because I didn't go into details for the date, it also doesn't quite work, because it lets dates with less or more than 3 components pass, instead of only valid ones.

But these 40-ish lines really aren't that clear to me in the things they will and will not match. Especially with the callsite being split into variables, but that's on Swift, not on the DSL thing. To my mind, the regexp is something that you "slide along" your string, stopping when it matches. Yes it's made of blocks and captures and weird characters, but it's kind of the same "thing" as a string, whereas that visual block syntax seems kind of "orthogonal" to the string, not of the same type, and therefore makes it harder for me to think about them.

But it's probably me and my old habit-encrusted brain. Drop me a line if you find an error, or if you disagree with decent arguments!


[Dev Diaries] Code Coverage?

As you probably know by now, I'm fairly obsessed with tools that give me metrics about quality: linting, docs, tests...

Unfortunately, code coverage is fairly hard to get with SPM in a way that is usable.

What?

Code coverage is the amount of code in the package that is covered with your tests. If you run your tests, are these lines run? Are those? Your tests pass, and that's fine, but have you forgotten to test anything?

You can enable it in Xcode using the option "gather code coverage" in the scheme you are running tests for, which allows you to find a decent visualization if you know where to look for it (a new gutter appears in your code editors).

In Swift Package Manager though, it's fairly obscure:

  • first you have to use the --enable-code-coverage option of the testing phase
  • then you have to grab the output json path by using --show-codecov-path
  • then you get a collection of things that is unique to SPM, and therefore unusable elsewhere

Now, if you look closely at the output, you can see it's fairly close to the lcov format, which is more or less a standard.

Let's make a script!

Because I'm very attached to my packages running on both Linux and MacOS, I need to grab the correct values from the environment (makes them dockerizable too).

I need:

  • the output of the swift test phase
  • llvm-cov which is used by the swift toolchain and can extract usable information
  • A few frills here and there

Looking here and there if anything existed already, I stumbled upon a good writeup setting some of the bricks up. I would suggest reading this first if you want to get the nitty gritty details.

Reusing parts of this and making my own script that can spit either human readable or lcov-compatible, and can work either on Linux or MacOS, and dockerizable, here's what I end up with:

#!/bin/bash

swift test --enable-code-coverage > /dev/null 2>&1

if [ $? -ne 0 ]; then
echo "tests unsuccessful"
exit 1
fi

BIN_PATH="$(swift build --show-bin-path)"
XCTEST_PATH="$(find ${BIN_PATH} -name '*.xctest')"

if [[ "$OSTYPE" == "darwin"* ]]; then
	COV_BIN="/usr/bin/xcrun llvm-cov"
	MODULE="$(basename $XCTEST_PATH .xctest)"
	XCTEST_PATH="$XCTEST_PATH/Contents/MacOS/$MODULE"
else
	COV_BIN=`which llvm-cov || echo "false"`
fi

if [ $# -eq 0 ]; then
	$COV_BIN report -ignore-filename-regex=".build|Tests" \
	-instr-profile=.build/debug/codecov/default.profdata -use-color \
	"$XCTEST_PATH" 
elif [ $# -eq 1 ] && [ $1 = "lcov" ]; then
	$COV_BIN export -ignore-filename-regex=".build|Tests" \
	-instr-profile=.build/debug/codecov/default.profdata \
	--format=lcov "$XCTEST_PATH" 
else
	echo "Usage:"
	echo "codecov.sh [lcov]"
	echo "use without argument for human readable output"
	echo "use with lcov as argument for lcov format output"
	exit 1
fi
codecov.sh

codecov.sh spits something like:

Filename                      Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SEKRET.swift                       67                20    70.15%          23                 4    82.61%         157                27    82.80%
Extensions.swift                   15                 3    80.00%           2                 0   100.00%          20                 0   100.00%
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                              82                23    71.95%          25                 4    84.00%         177                27    84.75%

codecov.sh lcov spits the corresponding lcov output.

Hurray for automation!


[Dev Diaries] NSLogger is merged

The changes I made to make NSLogger SPM compatible are now in the master branch of the official repo. Update your dependencies ☺️


[Dev Diaries] SPM'ing NSLogger

I know and have fun as often as I can with Florent Pillet, another member of the tribe of "dinosaurs" still kicking around.

I really like one of his projects that contributed to his notoriety : NSLogger. Logging has always been a pain in the neck, and this tool provided us all with a way to get it done efficiently and properly. The first commit on the github repo is from 2010, and I have a strong suspicion it's been in production since before that in one form or another.

Anyhoo, I like Florent, I  like NSLogger, but I hate what Cocoapods (and to a lesser extent Carthage) do to my projects. It's too brittle and I strongly dislike things that mess around with the extremely complicated XML that is a pbxproj. They do however serve an admirable purpose: managing dependencies in a way that doesn't require me to use git submodules in every one of my projects.

So, I rarely use NSLogger. SHAME! SHAME! <insert your own meme here>

With the advent of (and subsequent needed updates to) Swift Package Manager, we now have an official way of managing and supporting dependencies, but it has its own quirks that appently make it hard to "SPM" older projects.

Let's see what we can do about NSLogger.

Step 1 : The Project Structure

SPM can't mix Obj-C code and Swift code. It's always been pretty hacky anyways, with the bridging headers and the weird steps hidden by the toolchain, so we need to make it explicit:

  • One target for the Objective-C code (imaginatively named NSLoggerLibObjC)
  • One target for the Swift code (NSLogger) that depends on NSLoggerLibObjC
  • One product that builds the Swift target

One of the problems is that all that code is mixed in the folders, because Xcode doesn't care about file placement. SPM, on the other hand does.

So, let's use and abuse the path and sources parameters of the target. The first one is to provide the root where we look for files to compile, and the second one lists the files to be compiled.

  • LoggerClient.m for NSLoggerLibObjC
  • NSLogger.swift for NSLogger

Done. Right?

Not quite.

Step 2 : Compilation Quirks

The Obj-C lib requires ARC to be disabled. Easy to do in Xcode, a bit harder in SPM.

We need to pass the -fno-objc-arc flag to the compiler. SPM doesn't make it easy or obvious to do that, for a variety of reasons, but I guess mostly because you shouldn't pass compiler flags at all in an ideal world.

But (especially in 2020), looking at the world, ideal it ain't.

We have to use the (not so aptly named) cSetting option of the target, and use the very scary CSetting.unsafeFlags parameter for that option. Why is it unsafe, you might ask? Weeeeeeeeell. It's companies' usual way of telling you "you're on your own with this one". I'm fine with that.

Another compilation quirk is that Obj-C code relies (like its ancestor, C) on the use of header files to make your code usable as a dependency.

Again, because Xcode and SPM treat the file structure very differently, just saying that every header should be included in the resulting library is a bad idea: the search is recursive and in this particular case, would result in having specific iOS or MacOS (yes, capitalized, because sod that change) test headers exposed as well.

In the end, I had to make the difficult choice of doing something super ugly:

  • move the public headers in their own directory
  • use symlinks to their old place so's not to break the other parts of the project

If anyone has a better option that's not heavily more disruptive to the organization of the project, I'm all ears.

Step 3 : Final Assembly

So we have the Swift target that depends on the Obj-C one. Fine. But how do we use that dependency?

"Easy" some will exclaim (a bit too rapidly) "you just import the lib in the swift file!"

Yes, but then it breaks the other projects, which, again, we don't want to do. Minimal impact changes. Legacy. Friend.

So we need a preprocessing macro, like, say, SPMBuild, which would indicate we're building with SPM rather than Xcode. Sadly, this doesn't exist, and given the rate of change of the toolchain, I don't want to rely too heavily on the badly documented Xcode proprocessor macros that would allow me to detect a build through the IDE.

Thankfully, in the same vein as cSettings, we have a swiftSettings parameter to our target, wich supports SwiftSetting.define options. Great, so I'll define a macro, and test its existence in the swift file before importing the Obj-C part of the project.

One last thing I stumbled upon and used despite its shady nature: there is an undocumented decorator for import named @_exported which seems extraneous here, but has some interesting properties: it kinda sorta exposes what you import as part of the current module, flattening the dependency graph.

To be honest, I didn't know about it, it amused me, so I included it.

Wrap Up

In order to make it work directly from the repo, rather than locally, I also had to provide a version number. I chose to go with the next patch number instead of aggrandizing myself with a minor or even a major version.

Hopefully, these changes don't impact the current project at all, and allows me to use it in a way I like better (and is officially supported), and I hope Florent will not murder me for all of that. He might even decide to accept my pull request. We'll see.

In the meantime, you can find all the changes above and a usable SPM package in my fork.


Introducing FuzzyTests

TL;DR: Grab it here : Github repo

Unit testing is painful amirite?

Writing good tests for your code very often means spending twice as much time coding them than on the things you test themselves.

It is good practice though to verify as much as possible that the code you write is valid, especially if that code is going to be public or included in someone else's work.

In my workflow I insist on the notion of ownership :

The bottomline for me is this: if there are several people on a project, I want clearly defined ownership. It's not that I won't fix a bug in someone else's code, just that they own it and therefore have to have a reliable way of testing that my fix works.
Tests solve part of that problem. My code, my tests. If you fix my code, run my tests, I'm fairly confident that you didn't wreck the whole thing. And that I won't have to spend a couple of hours figuring out what it is that you did.

This a a very very very light constraint when you compare it to methodologies like TDD, but it's a required minimum for me.

Plus, it's not that painful, except...

Testing every case

In my personal opinion, the tests that are hardest to do right are the ones that have a very large input range, with a few failure/continuity points.

If, for instance, and completely randomly, of course, you had an application where the tilt of the phone changes the state of the app (locked/unlocked, depending on whether the phone is lying flat-ish on the table or not:

  • from -20º to 20º the app is locked
  • from 160º to 200º the app is locked
  • the rest of the time it's not locked
  • All of that modulo 360, of course

So you have a function that takes the current pitch angle, and returns if we should lock or not:

func pitchLock(_ angle: Double) -> Bool {
  // ...
}

Does it work? Does it work modulo 360? What would a unit test for that function even look like? A for loop?

I have been looking for a way to do that kind of test for a while, which is why I published HoledRange (now Domains 😇) a while back, as part of my hacks.

What I wanted is to write my tests kind of like this (invalid code on so many levels):

for x in [-1000.0...1000.0].randomSelection {
  let unitCircleAngle = x%360.0
  if unitCircleAngle >= 340 || unitCircle <= 20 {
    XCTAssert(pitchLock(x))
  } else if unitCircleAngle >= 160 && unitCircle <= 200 {
    XCTAssert(pitchLock(x))
  } else {
    XCTAssertFalse(pitchLock(x))
  }
}

This way of testing, while vaguely valid, leaves so many things flaky:

  • how many elements in the random selection?
  • how can we make certain values untestable (because we address them somewhere else, for instance)
  • what a lot of boilerplate if I have multiple functions to test on the same range of values
  • I can't reuse the same value for multiple tests to check function chains

Function builders

I have been fascinated with @_functionBuilder every since it was announced. While I don't feel enthusiastic about SwiftUI (in french), that way to build elements out of blocks is something I have wanted for years.

Making them is a harrowing experience the first time, but in the end it works!

What I wanted to use as syntax is something like this:

func myPlus(_ a: Int, _ b: Int) -> Int

DomainTests<Int> {
    Domain(-10000...10000)
    1000000
    Test { (a: Int) in
        XCTAssert(myPlus(a, 1) == a+1, "Problem with value\(a)")
        XCTAssert(myPlus(1, a) == a+1, "Problem with value\(a)")
    }
    Test { (a: Int) in
        let random = Int.random(in: -10000...10000)
        XCTAssert(myPlus(a, random) == a+random, "Problem with value\(a)")
        XCTAssert(myPlus(random, a) == a+random, "Problem with value\(a)")
   }
}.random()

This particular DomainTests runs 1000000 times over $$D=[-10000;10000]$$ in a random fashion.

Note the Test builder that takes a function with a parameter that will be in the domain, and the definition that allows to define both the test domain (mandatory) and the number of random iterations (optional).

If you want to test every single value in a domain, the bounding needs to be Strideable, ie usable in a for-loop.

DomainTests<Int> {
    Domain(-10000...10000)
    Test { (a: Int) in
        XCTAssert(myPlus(a, 1) == a+1, "Problem with value\(a)")
        XCTAssert(myPlus(1, a) == a+1, "Problem with value\(a)")
    }
    Test { (a: Int) in
        let random = Int.random(in: -10000...10000)
        XCTAssert(myPlus(a, random) == a+random, "Problem with value\(a)")
        XCTAssert(myPlus(random, a) == a+random, "Problem with value\(a)")
   }
}.full()

Conclusion

A couple of hard working days plus a healthy dose of using that framework personally means this should be ready-ish for production.

If you are a maths-oriented dev and shiver at the idea of untested domains, this is for you 😬