[Dev Diaries] ELIZA

Back in the olden days...

Before the (oh so annoying) chatbots, before conversational machine-learning, before all of that, there was... ELIZA.

It is a weird little part of computer history that nerds like me enjoy immensely, but that is fairly unknown from the public.

If I ask random people when they think chatting with a bot became a Thing, they tend to respond "the 90s" or later (usually roughly ten years after they were born, for weird psychological reasons).

But back in the 60s, the Turing Test was a big thing indeed. Of course, nowadays, we know that this test, as it was envisionned, isn't that difficult, but back then it was total fiction.

Enters Joseph Weizenbaum, working at the MIT in the mid 60s, who decided to simplify the problem of random conversation by using a jedi mind trick: the program would be a stern doctor, not trying to ingratiate itself to the user. We talk to that kind of terse and no nonsense people often enough that it could be reasonably assumed that it wouldn't faze a normal person.

It's not exactly amicable, but it was convincing enough at the time for people to project some personnality onto it. It became a real Frankenstein story: Weizenbaum was trying to show how stupid it was, and the concept behind man-machine conversations, but users kept talking to it, sometimes even confiding as they would to a doctor. And the more Weizenbaum tried to show that it was a useless piece of junk with the same amount of intelligence as your toaster, the more people became convinced this was going to revolutionize the psychiatry world.

Weizenbaum even felt compelled to write a book about the limitations of computing, and the capacity of the human brain to anthropomorphise the things it interacts with, as if to say that to most people, everything is partly human-like or has human-analogue intentions.

He is considered to be one of the fathers of artificial intelligence, despite his attempts at explaining to everyone that would listen that it was somewhat a contradiction in terms.

Design

ELIZA was written in SLIP, a language that worked as a subset or an extension or Fortran and later ALGOL, and was designed to facilitate the use of compounded lists (for instance (x1,x2,(y1,y2,y3),x3,x4)), which was something of a hard-ish thing to do back in the day.

By modern standards, the program itself is fairly simplistic:

  • the user types an input
  • the input is parsed for "keywords" that ELIZA knows about (eg I am, computer, I believe I, etc), which are ranked more or less arbitrarily
  • depending on that "keyphrase", a variety of options are available like I don't understand that or Do computers frighten you?

Where ELIZA goes further than a standard decision tree, is that it has access to references. It tries to take parts of the input and mix them with its answer, for example: I am X -> Why are you X?

It does that through something that would become regular expression groups, and then transforming certain words or expressions into their respective counterparts.

For instance, something like I am like my father would be matched to ("I am ", "like my father"), then the response would be ("Why are you X?", "like my father"), then transformed to ("Why are you X?", "like your father"), then finally assembled into Why are you like your father?

Individually, both these steps are simple decompositions and substitutions. Using sed and regular expressions, we would use something like

$ sed -n "s/I am \(.*\)/Why are you \1?/p"
I am like my father
Why are you like my father?
$ echo "I am like my father" | sed -n "s/I am \(.*\)/Why are you \1?/p" | sed -n "s/my/your/p"
Why are you like your father?

Of course, ELIZA has a long list of my/your, me/you, ..., transformations, and multiple possibilities for each keyword, which, with a dash of randomness, allows the program to respond differently if you say the same thing twice.

But all in all, that's it. ELIZA is a very very simple program, from which emerges a complex behavior that a lot of people back then found spookily humanoid.

Taking a detour through (gasp) JS

One of the available "modern" implementations of ELIZA is in Javascript, as are most things. Now, those who know me figure out fairly quickly that I have very little love for that language. But having a distaste for it doesn't mean I don't need to write code in it every now and again, and I had heard so much about the bafflement people feel when using regular expressions in JS that I had to try myself. After all, two birds, one stone, etc... Learn a feature of JS I do not know, and resurrect an old friend.

As I said before, regular expressions (or regexs, or regexps) are relatively easy to understand, but a lot of people find them difficult to write. I'll just give you a couple of simple examples to get in the mood:

[A-Za-z]+;[A-Za-z]+

This will match any text that has 2 words (whatever the case of the letters) separated by a semicolon. Note the differenciating between uppercase and lowercase.
Basically, it says that I want to find a series of letters on length at least 1 (+) followed by ; followed by another series of letters of length at least 1

.*ish

Point (.) is a special character that means "any character", and * means "0 or more", so here I want to find anything ending in "ish"

Now, when you do search and replace (is is the case with ELIZA) or at least search and extract, you might want to know what is in this .* or [A-Za-z]+. To do that you use groups:

(.*)ish

This will match the same strings of letters, but by putting it in parenthesiseseseseseseseseses (parenthesiiiiiiiiiiiii? damn. anyway), you instruct the program to remember it. It is then stored in variables with the very imaginative names of \1, \2, etc...

So in the above case, if I apply that regexp to "easyish", \1 will contain "easy"

Now, because you have all these special characters like point and parenthesis and  whatnot, you need to differenciate when you need the actual "." and "any character". We escape those special characters with \.

([A-Za-z]+)\.([A-Za-z]+)

This will match any two words with upper and lower case letters joined by a dot (and not any character, as would be the case if I didn't use \), and remember them in \1 and \2

Of course, we have a lot of crazy special cases and special characters, so, yes, regexps can be really hard to build. For reference, the Internet found me a regexp that looks for email adresses:

(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

Yea... Moving on.

Now, let's talk about Javascript's implementation of regular expressions. Spoiler alert, it's weird if you have used regexps in any other language than perl. That's right, JS uses the perl semantics.

In most languages, regular expressions are represented by strings. It is a tradeoff that means you can manipulate it like a string (get its length, replace portions of it, have it built out of string variables etc), but it makes escaping nighmareish:

"^\\s*\\*\\s*(\\S)"

Because \ escapes the character that follows, you need to escape the escaper to keep it around: if you want \. as part of your regexp, more often than not, you need to type "\\." in your code. It's quite a drag, but the upside is that they work like any other string.

Now, in JS (and perl), regexps are a totally different type. They are not between quotes, but between slashes (eg /^(([^<>()\[\]\\.,;:\s@"]+(\.[^<>()\[\]\\.,;:\s@"]+)*)|(".+"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/). On one hand, you don't have to escape the slashes anymore and they more closely resemble the actual regexp, but on the other hand, they are harder to compose or build programmatically.

As I said, it's a different tradeoff, and to each their own.

Where it gets bonkers is how you use them. Because the class system is... what it is, and because there is no operator overload, you can't really get the syntactic elegance of perl, so it's kind of a bastard system where you might type something like

var myRe = /d(b+)d/;
var isOK = "cdbbdbsbz".match(); // not null because "dbbd" is in the string

match and matchAll aren't too bad, in the sense that they return the list of matching substrings (here, only one), or null, so it does have kind of a meaning.

The problem arises when you need to use the dreaded exec function in order to use the regexp groups, or when you use the g flag in your regexp.

The returned thing (I refuse to call it an object) is both an array and a hashmap/object at the same time.

In result[0] you have the matched substring (here it would be "dbbd"), and in result[X] you have the \X equivalents (here \1 would be "bb", so that's what you find in result[1]). So far so not too bad.

But this array also behaves like an object: result.index gives you the index of "the match" which is probably the first one.

Not to mention you use string.match(regex) and regex.exec(string)

const text = 'cdbbdbsbz';
const regex = /d(b+)d/g;
const found = regex.exec(text);

console.log(found);
console.log(found.index);
console.log(found["index"]);
Array ["dbbd", "bb"]
1
1

So, the result is a nullable array that sometimes work as an object. I'll let that sink in for a bit.

This is the end

Once I got the equivalence down pat, it was just a matter of copying the data and rewriting a few functions, and ELIZA was back, as a libray, so that I could use it in CLI tools, iOS apps, or MacOS apps.

When I'm done fixing the edge cases and tinkering with the ranking system, I might even publish it.

In the meantime, ELIZA and I are rekindling an old friendship on my phone!


[Dev Diaries] CredentialsToken 0.2.0

I needed to write a login mechanism for a Kitura project, and decided to dust off the old project and bring it into 2020.

Changes include:

  • Update to Swift 5.2
  • Update dependencies
  • Added a cross-check possibility between app and client tokens

Grab it on github


[Rant] Faith And Programming

Faith (the general brain thing that makes us think something is true even if there is no proof of it) is trickling into programming at alarming rate. I see it in blog posts, I see it in youtube videos and tutorials, I see it in my students hand-ins, and it tickles my old-age-induced arthtritis enough to make me want to write about it.

What I mean by faith

Let's start with definitions (from various dictionaries):

Faith: something that is believed especially with strong conviction

Not super helpful, as faith is more or less defined as "belief" + "strong conviction"

Conviction: a strong persuasion or belief

Again with the recursive definition...

Belief: something that is accepted, considered to be true, or held as an opinion

So, faith is a belief that is strongly believed. And a belief is something that's either thought to be true, or an opinion. Yea, ok. Thanks, dictionaries.

And that's why I need to define the terms for the benefit of this essay: because everything related to religious faith or political belief is so loaded these days, those vague definitions are weaponized in the name of usually bad rethoric. I'll drop any hit of religious or political wording if I can, because I think words should have meaning when you're trying to communicate honestly.

So here are the definitions I will subscribe to for the rest of this post:

  • Belief: fact or opinion thought to be true, with supporting sources to quote (eg: I believe the Earth orbits around the Sun, and I have plenty of sources to back me up, but I haven't seen it myself). It is stronger than an opinion (eg: I am of the opinion that the cat I live with is stupid), but weaker than a fact I know (eg: it is currently sunny where I am). Essentially, I use the word belief to mean something I'm convinced of, and think can be proven to be true (or false, as the case may be).
  • Faith: fact or opinion thought to be true, that is either unprovable (eg: Schrodinger's cat is alive), not yet proven (eg: we can land people on Mars), or dependant on other people's opinion (eg: my current relationship will last forever)

The subtle difference between these two words, for me, hinges on the famous "leap of faith". Sometimes, it doesn't matter if something is provable or not for us to think of as "true". That's when we leave the belief to enter the faith. Most aspirational endeavors come from faith rather than beliefs... Faith that the human species isn't suicidal, faith that researchers do their thing with the best intentions in mind, faith that my students will end up liking what I teach them...

So what does faith have to do with programming?

After all, when you do some programming, facts will come and hit you in the face pretty fast: if your program is wrong, then the thing doesn't compile, or crashes, or produces aberrant results.

Yeeeeeeeeees... and no.

Lower levels of programming are like that. Type the wrong keyword, forget a semi-colon, have a unit test fail, and yes, the computer will let you know that you were wrong.

It is a logical fallacy know as fallacy of composition to take the properties of a subset and assume they are true for the whole.

Here, thinking that if there is no compiler or test error means that the program is valid doesn't take into account so many things it's not funny: there could be bugs in the compiler or the tests - they are programs too, after all -, the ins-and-outs of the program could be badly defined, the algorithm used could have race conditions or otherwise dangerous edge cases that are not tested, etc.

But when you talk about immensely more complex systems than simple if x then y, the undecidibility knocks and says hello.

And here comes the faith. Because we cannot test everything and prove everything, we must have faith that what we did is right (or right enough). Otherwise, we can't produce anything, we can't move forward with our projects, and we can't collaborate.

There are multiple acts of faith we take for granted when we write a program:

  • The most important one is that what we are trying to do is doable. Otherwise, what's the point?
  • What we are trying to do not only is doable, but doable in a finite (and often set) amount of time.
  • The approach that we chose to do it with will work
  • and its cousin: the approach that we chose is the best one
  • The tech/framework/library/language we use will allow us to succeed
  • and its cousin: the tech/framework/library/language we use is the best one
  • If push comes to shove, anything that stumps us along the way will have a solution somewhere (search engine, person, ...)

This is not a complete list by any mean, but these are the ones I find the most difficult to talk about with people these days.

Because they are in the realm of faith, it is incredibly difficult to construct an argument that will change the opinion of someone, and doesn't boil down to "because I think so".

For instance, I like the swift language. I think it provides safety without being too strict, and I think it is flexible enough to construct pretty much anything I need. But what can I say to someone who doesn't like it for (good) reasons of their own to convince them that they should like it, without forcing them (for instance, by having super useful libraries that only exist in swift)?

And that's the second most dangerous fallacy of our domain: because some things are easier to do with this choice doesn't mean that it is the best choice.

The inevitable digression

I have conversations about front-end development a lot. A loooooooooooooooooooooooooooooooooooooot.

Web front-end, mobile front-end, desktop front-end, commandline front-end, hybrid front-end, ye-olde-write-once-run-anywhere front-end, you name it.

Because browsers are everywhere, and because browsers can all do more or less the same thing, it should be easier to write a program that runs in the browser, right?

For certain things, that's a definite yes. For other things, for now at least, that's a definite no. And everything in between.

First of all, the browser is an application on the host. It has to be installed, maintained, and configured by the end-user. Will your front still work when they have certain add-ons? Does the web browser have access to some OS features you need for your project? Is it sufficiently up to date for your needs?

Second, because the web browser is pretty much on every device, it tends to support things that are common across all these devices. If your project targets only browsers that have that specific extension, that only works on Linux, then... are you really being cross-platform?

The same reasoning applies to most, if not all, WORA endeavors. As long as we don't all have a single type of device with the same set of software functionnalities, there won't be a way to have a single codebase.

And you may think "oh that will happen eventually". That's another item of faith I encounter fairly often. But... I'm not convinced. The hardware manufacturers want differences in their lineup. Otherwise, how can they convince you to buy the latest version? Isn't it because there are differences with the old one? And even if you assume that the OS on top of it has top notch dedicated engineers that will do their damndest best to make everything backwards compatible, isn't that ultimately impossible?

Ah HA! Some of you are saying. It doesn't matter because we have WebAssembly! We can run every OS within the browser, and therefore eliminate the need to think about these things!

Do we, though? OK, it alleviates some headaches like some libraries only being available on some platforms, but it doesn't change the fact that WebAssembly, or ASM.js or whatever else, cannot conjure up a new hardware component or change the way the OS works under the browser. It still is constrained to the maximum common feature set.

And I'm sure, at this point, the most sensitive among you think that I'm anti-web. Not at all! In the same way I think web front-end isn't a panacea, I think native mobile or desktop front-end isn't an all-encompassing solution either.

If your project doesn't make any kind of sense in an offline scenario, then you better have strong hardware incentives to write it using native code.

Native programming is more idiosyncratic, for starter. I know of at least twelve different ways on the Mac alone to put a simple button on screen. Which is the best? It depends. Which is the easiest? It depends. Which will be the most familiar to the user? It depends. Which is the most common? Depends on the parameters of your search.

To newcomers, it is frustrating, and it is useless, and I understand why they think that. It is perceived as a simple task that requires a hellish amount of work to accomplish. And to an extent, this is the truth.

But there is a nugget of reason behind this madness. History plays a part, sensibilities pay a part, different metrics for what "best" is play a part.

To me, arguing that this piece of tech is better than this one is like arguing that this musical instrument is better than this one.

Can you play notes on all of them? Sure. Can you tweak the music so that it's still recognizable, even when played on a completely different instrument? Yep, that's a sizeable portion of the music industry.

Can you take an orchestra and rank all the instruments from best to worst in a way that will convince everyone you talk to? I doubt it. You can of course rank them along your preferences, but you know it's not universal.

Would anyone argue that the music should make the instruments indistinguishable from one another? I doubt it even more.

For me, a complex software product is like a song. You can replace an electric guitar by an acoustic one fairly easily and keep the song more or less intact, but replacing the drum kit with a flute will change things drastically, to the point where I would argue it's not the same song anymore.

So why insist (on every side of the debate) that all songs, everywhere, everywhen, should be played using a single kind of instrument?

Faith is a spectrum, and we need it

Back to the list of items of faith I gave earlier, I do genuinely believe that some are essential.

We need to have faith in our ability to complete the task, and we need to have faith in the fact that what we do will ultimately matter. Otherwise, nothing would ever ship. These faiths sould be fairly absolute and unshakeable, because otherwise, as I said, what's the point of doing anything?

The other points I want to push back on. A little bit. Or at least challenge the absolutism I see in too many places

The tools we use will get us there, and/or they are the best for getting us there

If you haven't skipped over the earlier digression, you'll know I feel very strongly about the tribal wars going on around languages/stacks/frameworks. I am on the record saying things like "X has an important role to play, I just don't like using it".

I also am a dinosaur who has waded through most families of programming languages and paradigms from assembly on micro controllers (machine language) to AppleScript (almost human language), and have worked on projects with tight hardware constraints (embedded programming, or IoT like the kids call it now), to no constraint whatsoever (purely front web projects), and most things in between.

There is comfort in familiarity. It's easy to morph the belief that you can do everything that is asked of you with your current tools into the faith that you will always be able to do so.

I have two personal objections that hold me back in that regard. First, I have been working professionally in this field long enough to have personally witnessed at least 3 major shifts in ways projects are designed, led, and implemented. The toolkit and knowledge I had even 5 years ago would probably be insufficient to do something pushing the envelope today. If I want to be part of the pioneers on the edge of what is doable, I need to constantly question the usefulness of what I currently know.

Now the good news is, knowledge in science (and Computer Science has that in its name) is incremental, more often than not. What I know now should be a good enough basis to learn the new things. This is faith too, but more along the lines of "yea, I think I'm not too stupid to learn" than "it will surely get me glory, fame and money".

So my first objection revolves mostly around the "always" part, because I think very few things are eternal, and I know that programming techniques definitely aren't.

The second one is more subtle: the premise that you can do everything with one set of tools is, to my mind, ludicrous. Technically, yes, you can write any program iso-functional to any other program, using whatever stack. If you can write it in Java, you can write it in C. If you can write it in assembly, you can write it in JavaScript. If you can write it in Objective-C, you can write it in Swift. The how will be different, but it's all implementation details. If you pull enough back, you'll have the same outputs for the same inputs.

But it doesn't mean there aren't any good arguments for or against a particular stack in a particular context, and pretending that "it's all bits in the end" combined with "it's what I'm more comfortable with" is the best argument is nonsensical.

To come back to that well, in the music instrument analogy, it would be akin to say that because you know how to play the recorder, and not the guitar, any version of "Stairway to Heaven" played on the recorder is intrisincly better. And that's quite a claim to make.

You can say it's going to be done faster because it takes a while to learn the guitar. You can say it's the only way it can be done with you because you can't play the guitar. You can even say that you prefer it that way because you hate the guitar as an instrument. But, seriously, the fact that you can't do chords on a recorder is enough to conclude that it's a different piece of music.

In that particular example, maybe you can be convinced to learn the piano, which makes chords at least doable with a normal set of mouths and fingers, since you hate the idea of learning the guitar. Maybe learn the bagpipes, which I believe are a series of recorders plugged into a balloon that does have multiple mouths.

I'll let that image sink in for a little while longer...

Next time you see a bagpipe player, don't mention my name, thanks.

Anyhoo

The faith in one's abilities should never be an argument against learning something new, in my opinion. If only to confirm that, yes, the other way is indeed inferior by some metric that makes sense in the current context.

Which allows me to try and address the elephant in the room:

Yes you should have some faith that your technical choices are good enough, and maybe even the best. But it should be the kind of faith that welcomes the challenge from other faiths.

The answer is waiting for me on the internet

That one irks me greatly.

When you learn to program, the tutorials, the videos, the classes even, have a fixed starting and ending points. The person writing or directing it lead you somewhere, they know what shape the end result should be.

Their goal is not to give you the best way, or prove that it's the only way, to do a thing. They are just showing you a way, their way of doing it. Sometimes it is indeed the best or the only way, but it's very very very rare. Or it's highly contextual: this is the best way to do a button given those sets of constraints.

But, because these articles/videos/classes are short, in the grand scheme of things, they can't address everything comparable and everything in opposition to what's being shown. People who do those things (myself included when I give a class) can't spend 90% of the time showing every single alternative.

The other variable in this discussion is that, when you learn programming, the problems are setup in such a way that it is possible to glue pieces together. Again, it's the time factor, but put another way: would you rather have the people ingesting your content focus on the core of what you are saying, or having them slowed down or even stopped because the other piece they are supposed to rely on doesn't work?

Expediency. A scenario for a tutorial, video, class, and even your favorite Stack Overflow answer, will make a bunch of simplifications and assumptions in order to get at the core of the message. These simplifications and assumptions are usually implicit, and yet, they help shape the content.

So, where you're new, you are being shown or told a lot of things at once (programming relies on a lot of decisions made by someone else), simple things that fit neatly in someone else's narrative for expediency's sake, and guide you to a higher knowledge that has a lot of assumptions attached. And you will never be told about them.

It's not surprising, therefore, that a lot of newcomers to programming think that writing a program is just about finding the mostly-right-shaped bricks on the internet and assembling them.

Weeeeeeell... Programming definitely has that. We rely on a lot of code we haven't written ourselves.

But it's not just that. Very often, the context, the constraints, or the goal itself, of your program is unique to that particular case. And because it's unique, no one on the internet has the answer for you.

You can be told that a redis+mongo+postgres+angular+ionic technological stack is the best by people you trust. And that's fine, they do believe that (probably). But there are so many assumptions and so much history behind that conclusion that it should be considered suspect. Maybe that, for your project, postgres+react+native works better, and takes shorter to program. How would <insert name of random person on the web> actually know the details of your set of constraints? It's not that they don't want to, necessarily, but they didn't think about your problem before giving out their solution right?

So, maybe you think their problem is close enough to yours, and that's fair enough. But how do you know? Did you critically look at all the premise and objectives they set for themselves? Or did you think that if 4 words in the title of their content match 4 words in your problem, that's good enough? If you're honest, it's going to be the second one.

Internet is a wonderful resource. I use it all the time to look up how different people deal with problems similar to mine. But unless the objective is small enough and I'm fairly certain it's close enough, I will not copy their code into mine. They can definitely inspire my solution, though. But inspiring and being my solution are about as close as "Stairway to Heaven" played on the recorder vs the guitar are.

Faith must be tempered by science to become experience

You've waded through 3000+ words from a random guy on the Internet, and you're left with the question: "what do you want from me? I have deadlines, I have little time to research the perfect solution or learn everything from scratch every two years, why do you bother me with your idealistic (and inapplicable) philosophy of software development?"

Look. I know. I've been through the early stages of learning how to code, and I'm not old enough not to remember. I also have deadlines, very little free time to devote to learning a new language or framework just for the sake of comparison, etc etc.

My point is not that everyone should do 4 months of research every time they write 5 lines of code. The (sometimes really small) time you can spend on actually productive code writing is part of the constraints. Even if Scala would be a better fit for my project, I don't have time to learn it for Friday. I get that. But I also am very keenly aware that there could be a much better way to do it than the one I'm stuck with.

The thing is, if you double down on your technological choices or methodology despite the recurrent proof that it is the wrong one, your faith is misplaced. It's as simple as that.

I used to think I was great at managing my time. Then I had one major issue. Then another. Then another. Then a critical one. Only a fool would keep believing that I'm great at managing my time. So I changed my tools and my methodology. Same thing for languages and frameworks and starting points.

The problem doesn't lie with the preferences you have or the successes you had doing something that was a bad idea. It lies with not looking back, measuring what you could have done better and how, and changing your faith. Science. Numbers. Objective truths. You look back even at a successful (for a given metric, usually "we managed to ship") project, and you can always find pain points and mistakes.

The idea is to learn from these mistakes and not being stuck with an opinion that has been shown to be wrong. Even if you have a soft spot for a piece of tech, it doesn't mean you should stop looking for alternatives.

Faith is necessary, faith in your abilities, faith in the tech you're using. But faith needs to be reevaluated, and needs to be confronted to be either strengthened or discarded. That's what experience is.


[Dev Diaries] Code Coverage?

As you probably know by now, I'm fairly obsessed with tools that give me metrics about quality: linting, docs, tests...

Unfortunately, code coverage is fairly hard to get with SPM in a way that is usable.

What?

Code coverage is the amount of code in the package that is covered with your tests. If you run your tests, are these lines run? Are those? Your tests pass, and that's fine, but have you forgotten to test anything?

You can enable it in Xcode using the option "gather code coverage" in the scheme you are running tests for, which allows you to find a decent visualization if you know where to look for it (a new gutter appears in your code editors).

In Swift Package Manager though, it's fairly obscure:

  • first you have to use the --enable-code-coverage option of the testing phase
  • then you have to grab the output json path by using --show-codecov-path
  • then you get a collection of things that is unique to SPM, and therefore unusable elsewhere

Now, if you look closely at the output, you can see it's fairly close to the lcov format, which is more or less a standard.

Let's make a script!

Because I'm very attached to my packages running on both Linux and MacOS, I need to grab the correct values from the environment (makes them dockerizable too).

I need:

  • the output of the swift test phase
  • llvm-cov which is used by the swift toolchain and can extract usable information
  • A few frills here and there

Looking here and there if anything existed already, I stumbled upon a good writeup setting some of the bricks up. I would suggest reading this first if you want to get the nitty gritty details.

Reusing parts of this and making my own script that can spit either human readable or lcov-compatible, and can work either on Linux or MacOS, and dockerizable, here's what I end up with:

#!/bin/bash

swift test --enable-code-coverage > /dev/null 2>&1

if [ $? -ne 0 ]; then
echo "tests unsuccessful"
exit 1
fi

BIN_PATH="$(swift build --show-bin-path)"
XCTEST_PATH="$(find ${BIN_PATH} -name '*.xctest')"

if [[ "$OSTYPE" == "darwin"* ]]; then
	COV_BIN="/usr/bin/xcrun llvm-cov"
	MODULE="$(basename $XCTEST_PATH .xctest)"
	XCTEST_PATH="$XCTEST_PATH/Contents/MacOS/$MODULE"
else
	COV_BIN=`which llvm-cov || echo "false"`
fi

if [ $# -eq 0 ]; then
	$COV_BIN report -ignore-filename-regex=".build|Tests" \
	-instr-profile=.build/debug/codecov/default.profdata -use-color \
	"$XCTEST_PATH" 
elif [ $# -eq 1 ] && [ $1 = "lcov" ]; then
	$COV_BIN export -ignore-filename-regex=".build|Tests" \
	-instr-profile=.build/debug/codecov/default.profdata \
	--format=lcov "$XCTEST_PATH" 
else
	echo "Usage:"
	echo "codecov.sh [lcov]"
	echo "use without argument for human readable output"
	echo "use with lcov as argument for lcov format output"
	exit 1
fi
codecov.sh

codecov.sh spits something like:

Filename                      Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SEKRET.swift                       67                20    70.15%          23                 4    82.61%         157                27    82.80%
Extensions.swift                   15                 3    80.00%           2                 0   100.00%          20                 0   100.00%
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                              82                23    71.95%          25                 4    84.00%         177                27    84.75%

codecov.sh lcov spits the corresponding lcov output.

Hurray for automation!


[Dev Diaries] NSLogger is merged

The changes I made to make NSLogger SPM compatible are now in the master branch of the official repo. Update your dependencies ☺️