[AoC 2022] Recap

TL;DR

I'm not as rusty as I thought I'd be. And YES that kind of challenge has a place in the coding world (see conclusion)

Just like every year, I had a blast banging my head on the Advent of Code calendar. It so happens that this year, I had a lot less brain power to focus on them due to the exam season at school and the ensuing panicky students, but it could also be because my brain isn't up to spec.

Some of my students are/were doing the challenges, so I didn't want to post anything that would help them, but now that the year is almost over, I wanted to go over the puzzles and give out impressions (and maybe hints).

Easing in

Days 1 to 7 were mostly about setting up the stage, getting into the habit of parsing the input and using the right kind of structure to store the data.

Nothing super hard, it was "just" lists, hashmaps, and trees, until day 4. Day 4 was especially funny to me, because I wrote HoledRange / Domain just for that purpose (disjointed ranges and operations on them). Except I decided to do this year's calendar in Julia, and the library I wrote is for Swift. Just for kicks, I rewrote parts of the library, and I might even publish it.

Days 5, 6 and 7 highlighted the use of stacks, strings, and trees again. Nothing too hard.

Getting harder

My next favorite is day 9. It's about a piece of rope you drag the head of, and have to figure out what the tail does. If you've ever played zig-zaging a shoelace you'll know what I mean. String physics are fun, especially in inelastic cases.

Many ways to do that, but once you realize how the tail catches up to the head when the latter is moved, multi-segmented chains are just a recursive application of the same.

I was waiting for a day 10-like puzzle, as there tends to be one every year, and I majored in compilers all those years ago. State machines, yuuuuuusssssssss.

A lot of puzzles involve path finding after that, which isn't my strong suit for some reason. But since the algorithms are already out there (it was really funny to see the spike in google searches for Dijkstra and A*), it's "just" a matter of encoding the nodes and the edges in a way that works.

Day 13 is fun, if only because it can be instantly solved in some language with eval, which will treat the input as a program. I still wrote my own comparison functions, because I like manipulating numbers, lists and inequalities.

Day 14 is "sand simulation", that is grains of sand that settle in a conical shape that keeps expanding laterally. Once you find the landing point on each ledge and the maximum width of the pile, there's a calculable result. Otherwise, running the simulation works too, there aren't that many grains. For part 2, I just counted the holes rather than the grains.

Day 15 is about union and intersections of disjointed ranges again, except in 2D. Which, with the Manhattan distance approximation, gets back to 1D fairly quickly.

Day 16 stumped quite a few people, because of the explosive nature of path searching. Combinatorics are pretty hard to wrap your head around. Personally, I went for "reachability" within the remaining time, constructed my graph, and explored. It was probably non optimal.

Day 17 make me inordinately proud. Nuff said.

Catching up

Because of the aforementioned  workload, I was late by that point, so I decided to take my time and not complete the challenge by Xmas. Puzzles were getting hard, work was time-consuming, so the pressure needed to go down.

Because of the 3D background that I had, I tackled day 18 with raytracing, which is way over-engineered, but reminded me of the good ole times. Part 2 was trickier with that method, because suddenly I had 2 kinds of "inside".

Day 19 was path finding again, the trick being how to prune paths that didn't lead in a good direction. Probably the one that used up the most memory, and therefore the one I failed the most.

Because of my relative newness to Julia, I had to go through many hoops to solve day 20. As it turns out, screw_dog over on mastodon gave me the bit I lacked to solve it simply, although way after I solved it using other means.

Day 21 goes back to my compiler roots and tree optimizations, and Julia makes the huge integer manipulation relatively easy, so, there. Pretty proud of my solution:

Part 1:   3.709 ms (31291 allocations: 1.94 MiB)
Part 2:   4.086 ms (31544 allocations: 1.95 MiB)

Which, on my relatively slow mac mini is not bad at all! Symbolic linear equation solving (degree one, okay) is a fun thing to think about. I even think that the algorithm I devised would work on trees where the unknown appears on both sides of the tree. Maybe I'll test it some day.

Day 22. Aaaaaaaah day 22. Linked lists get you all the way if and only if you know how to fold a cube from a 2D pattern. I don't so, as many of the other participants, I hardcoded the folding. A general solution just eluded me. It's on my todo list of reading for later.

Day 23 is an interesting variant of Conway's Game of Life, and I don't believe there is a way to simplify a straight up simulation, but I fully accept I could be wrong. So I used no trick, and let the thing run for 40s to get the result.

Day 24 was especially interesting for me, for all the wrong reasons. As I mentioned, graph traversal isn't my forte. But the problem was setup in a way that "worked" for me: pruning useless paths was relatively easy, so the problem space didn't explode too quickly. I guess I should use the same method on previous puzzles that I was super clumsy with.

Finally day 25 is a straight up algorithmic base conversion problem that's a lot of fun for my brain. If you remember how carry works when adding or subtracting numbers, it's not a big challenge, but thinking in base 5 can trip you up.

Conclusion

I honestly didn't believe I could hack it this year. I don't routinely do that kind of problem anymore, I have a lot of things going on at school, on top of dealing with the long tail of Covid and its effects on education. Family life was a bit busy with health issues (nothing life threatening, but still time consuming), and the precious little free time that I had was sure to be insufficient for AoC.

I'm glad I persevered, even if it took me longer than I wished it had. I'm glad I learned how to use Julia better. And I'm happy I can still hack it.

I see here and there grumblings about formal computer science. During and after AoC, I see posts, tweets, toots, etc, saying that the "l33t c0d3" is useless in practical, day-to-day, professional development. Big O notation, formal analysis, made up puzzles that take you into voluntarily difficult territories, all these things aren't a reflection of the skills that are needed nowadays to write good apps, to make good websites, and so on.

It's true. Ish.

You can write code that works without any kind of formal training. Today's computing power and memory availability makes optimization largely irrelevant unless you are working with games or embedded systems, or maybe data science. I mean, we can use 4GB of temporary memory for like 1/4 of a second to parse and apply that 100kB json file, and it has close to no impact on the perceived speed of our app, right? Right. And most of the clever algorithms are part of the standard library anyway, or easily findable.

The problem, as usual, is at scale. The proof-of-concept, prototype, or even 1.0 version, of the program may very well work just fine with the first 100 users, or 1000 or whatever the metric is for success. Once everything takes longer than it should, there are only 3 solutions:

  • rely on bigger machines, which may work for a time, but ultimately does not address the problem
  • scale things horizontally, which poses huge synchronization issues between the shards
  • reduce the technical debt, which is really hard

The first two rely on compute power being relatively cheap. And most of us know about the perils of infrastructure costs. That meme regularly makes the rounds.

It's not about whether you personally can solve some artificially hard problem using smart techniques, so that's ok if you can't do every puzzle in AoC or other coding challenges. It's not about flexing with your big brain capable of intuiting the bigO complexity of a piece of code. It's about being able to think about these problems in a way that challenges how you would normally do it. It's about expanding your intuition and your knowledge about the field you decided to work in.

It's perfectly OK for an architect to build only 1 or 2 level houses, there's no shame in it. But if that architect ever wants to build a 20+ stories building, the way to approach the problem is different.

Same deal with computer stuff. Learning is part of the experience.


[Dev Diaries] Advent of Code

I've been really interested in Julia for a while now, tinkering here and there with its quirks and capabilities.

This year, I've decided to try and do the whole of Advent of Code using that language.

First impressions are pretty good:
- map, reduce, and list/array management in general are really nice, being first-class citizens. I might even get over the fact that indices start at 1
- automatic multithreading when iterating over collections means that some of these operations are pretty speedy
- it's included in standard jupyterhub images, meaning that my server install gives me access to a Julia environment if I am not at my computer for some reason

Now it's kind of hard to teach old dogs new tricks, so I'm sure I misuse some of the features by thinking in "other languages". We'll see, but 4 days in, I'm still fairly confident.


Procedure All The Things! 🥳

Blender 3.1 is out!, and I made a thing, and that makes me so happy. (👍≖‿‿≖)👍

The laundry list of enhancements and features is very long, but 2 are especially important to me right now:

  • support for Metal acceleration on M1 (I can finally make the fans on my new laptop go)
  • procedural geometry nodes

Now, if you read what I write every once in a while, you'll know that I'm a big fan of anything that automates and facilitates manual tasks. I'm a developer after all.

Back on the old blog, I used to rave about the superior procedural texture system in blender that mostly allowed me to get free of images and UV maps. Granted, I'm no artist, and I tend towards mechanical/naturalistic scenes where procedural textures can be used. It's just so great to set a few parameters and see a decent result without having to spend hours in a pixel editor. Plus, it works at any resolution. So, there's that.

Back when Blender got its revamp and started getting some attention again, the white whale on the forums and various conversations was "when will we be able to node all the things?". It's is a very hard problem to solve, and the blender community has been at it like maniacs.

And... geometry nodes are finally here. I can finally replace most of my particle systems with a nice, clean, geometry node.

On a lark, I decided to retake my huge island project, a re-imagining of the Island of Myst that used to take 5h/frame to render, because of the millions of leaves, blades of grass, and rocks that the particle system generated. It was very hard to work with because the RAM usage would explode and renders would fail sometimes. Without culling and boolean operations dealing with the field of view, the scene used in the vicinity of 24G, making it all but impossible to render on a GPU that I can afford.

After a little experimenting to get my feet wet, I managed to visually get roughly the same result on less than 2.2G of RAM. That freed me to add... more geometry and more complexity 🤓

And because of the M1 architecture, the ceiling of VRAM usage is very high anyways, so I let it rip and got even more geometry in.

There is somewhere between 2 and two and a half billion polygons that I know of. Probably more.

Here is the original I based my scene on:

From the Wikipedia page

The end result is this (warning -- 4K image):

Myst Library

It uses 3.7G of VRAM (😳), about an hour and a half to render on a laptop (🤪), it's not even as complex as I can make it go (🤓), it's all geometry, no bump map tricks (🤩), and I could finally check that there were fans in my laptop (🙃)

I could do better on the render side, the complexity of the lighting means a lot of artifacts, and, of course, as I said before, I'm no artist, so some of the geometry is a bit iffy. Plus it's only 1024 samples, because I wanted to be fast-ish.

But Blender continues to impress, and who knows? Maybe they'll add a Do-What-I-Want-No-What-I-Type button in a future release that will magically enhance my skills.


[Dev Diaries] ELIZA

Back in the olden days...

Before the (oh so annoying) chatbots, before conversational machine-learning, before all of that, there was... ELIZA.

It is a weird little part of computer history that nerds like me enjoy immensely, but that is fairly unknown from the public.

If I ask random people when they think chatting with a bot became a Thing, they tend to respond "the 90s" or later (usually roughly ten years after they were born, for weird psychological reasons).

But back in the 60s, the Turing Test was a big thing indeed. Of course, nowadays, we know that this test, as it was envisionned, isn't that difficult, but back then it was total fiction.

Enters Joseph Weizenbaum, working at the MIT in the mid 60s, who decided to simplify the problem of random conversation by using a jedi mind trick: the program would be a stern doctor, not trying to ingratiate itself to the user. We talk to that kind of terse and no nonsense people often enough that it could be reasonably assumed that it wouldn't faze a normal person.

It's not exactly amicable, but it was convincing enough at the time for people to project some personnality onto it. It became a real Frankenstein story: Weizenbaum was trying to show how stupid it was, and the concept behind man-machine conversations, but users kept talking to it, sometimes even confiding as they would to a doctor. And the more Weizenbaum tried to show that it was a useless piece of junk with the same amount of intelligence as your toaster, the more people became convinced this was going to revolutionize the psychiatry world.

Weizenbaum even felt compelled to write a book about the limitations of computing, and the capacity of the human brain to anthropomorphise the things it interacts with, as if to say that to most people, everything is partly human-like or has human-analogue intentions.

He is considered to be one of the fathers of artificial intelligence, despite his attempts at explaining to everyone that would listen that it was somewhat a contradiction in terms.

Design

ELIZA was written in SLIP, a language that worked as a subset or an extension or Fortran and later ALGOL, and was designed to facilitate the use of compounded lists (for instance (x1,x2,(y1,y2,y3),x3,x4)), which was something of a hard-ish thing to do back in the day.

By modern standards, the program itself is fairly simplistic:

  • the user types an input
  • the input is parsed for "keywords" that ELIZA knows about (eg I am, computer, I believe I, etc), which are ranked more or less arbitrarily
  • depending on that "keyphrase", a variety of options are available like I don't understand that or Do computers frighten you?

Where ELIZA goes further than a standard decision tree, is that it has access to references. It tries to take parts of the input and mix them with its answer, for example: I am X -> Why are you X?

It does that through something that would become regular expression groups, and then transforming certain words or expressions into their respective counterparts.

For instance, something like I am like my father would be matched to ("I am ", "like my father"), then the response would be ("Why are you X?", "like my father"), then transformed to ("Why are you X?", "like your father"), then finally assembled into Why are you like your father?

Individually, both these steps are simple decompositions and substitutions. Using sed and regular expressions, we would use something like

$ sed -n "s/I am \(.*\)/Why are you \1?/p"
I am like my father
Why are you like my father?
$ echo "I am like my father" | sed -n "s/I am \(.*\)/Why are you \1?/p" | sed -n "s/my/your/p"
Why are you like your father?

Of course, ELIZA has a long list of my/your, me/you, ..., transformations, and multiple possibilities for each keyword, which, with a dash of randomness, allows the program to respond differently if you say the same thing twice.

But all in all, that's it. ELIZA is a very very simple program, from which emerges a complex behavior that a lot of people back then found spookily humanoid.

Taking a detour through (gasp) JS

One of the available "modern" implementations of ELIZA is in Javascript, as are most things. Now, those who know me figure out fairly quickly that I have very little love for that language. But having a distaste for it doesn't mean I don't need to write code in it every now and again, and I had heard so much about the bafflement people feel when using regular expressions in JS that I had to try myself. After all, two birds, one stone, etc... Learn a feature of JS I do not know, and resurrect an old friend.

As I said before, regular expressions (or regexs, or regexps) are relatively easy to understand, but a lot of people find them difficult to write. I'll just give you a couple of simple examples to get in the mood:

[A-Za-z]+;[A-Za-z]+

This will match any text that has 2 words (whatever the case of the letters) separated by a semicolon. Note the differenciating between uppercase and lowercase.
Basically, it says that I want to find a series of letters on length at least 1 (+) followed by ; followed by another series of letters of length at least 1

.*ish

Point (.) is a special character that means "any character", and * means "0 or more", so here I want to find anything ending in "ish"

Now, when you do search and replace (is is the case with ELIZA) or at least search and extract, you might want to know what is in this .* or [A-Za-z]+. To do that you use groups:

(.*)ish

This will match the same strings of letters, but by putting it in parenthesiseseseseseseseseses (parenthesiiiiiiiiiiiii? damn. anyway), you instruct the program to remember it. It is then stored in variables with the very imaginative names of \1, \2, etc...

So in the above case, if I apply that regexp to "easyish", \1 will contain "easy"

Now, because you have all these special characters like point and parenthesis and  whatnot, you need to differenciate when you need the actual "." and "any character". We escape those special characters with \.

([A-Za-z]+)\.([A-Za-z]+)

This will match any two words with upper and lower case letters joined by a dot (and not any character, as would be the case if I didn't use \), and remember them in \1 and \2

Of course, we have a lot of crazy special cases and special characters, so, yes, regexps can be really hard to build. For reference, the Internet found me a regexp that looks for email adresses:

(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

Yea... Moving on.

Now, let's talk about Javascript's implementation of regular expressions. Spoiler alert, it's weird if you have used regexps in any other language than perl. That's right, JS uses the perl semantics.

In most languages, regular expressions are represented by strings. It is a tradeoff that means you can manipulate it like a string (get its length, replace portions of it, have it built out of string variables etc), but it makes escaping nighmareish:

"^\\s*\\*\\s*(\\S)"

Because \ escapes the character that follows, you need to escape the escaper to keep it around: if you want \. as part of your regexp, more often than not, you need to type "\\." in your code. It's quite a drag, but the upside is that they work like any other string.

Now, in JS (and perl), regexps are a totally different type. They are not between quotes, but between slashes (eg /^(([^<>()\[\]\\.,;:\s@"]+(\.[^<>()\[\]\\.,;:\s@"]+)*)|(".+"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/). On one hand, you don't have to escape the slashes anymore and they more closely resemble the actual regexp, but on the other hand, they are harder to compose or build programmatically.

As I said, it's a different tradeoff, and to each their own.

Where it gets bonkers is how you use them. Because the class system is... what it is, and because there is no operator overload, you can't really get the syntactic elegance of perl, so it's kind of a bastard system where you might type something like

var myRe = /d(b+)d/;
var isOK = "cdbbdbsbz".match(); // not null because "dbbd" is in the string

match and matchAll aren't too bad, in the sense that they return the list of matching substrings (here, only one), or null, so it does have kind of a meaning.

The problem arises when you need to use the dreaded exec function in order to use the regexp groups, or when you use the g flag in your regexp.

The returned thing (I refuse to call it an object) is both an array and a hashmap/object at the same time.

In result[0] you have the matched substring (here it would be "dbbd"), and in result[X] you have the \X equivalents (here \1 would be "bb", so that's what you find in result[1]). So far so not too bad.

But this array also behaves like an object: result.index gives you the index of "the match" which is probably the first one.

Not to mention you use string.match(regex) and regex.exec(string)

const text = 'cdbbdbsbz';
const regex = /d(b+)d/g;
const found = regex.exec(text);

console.log(found);
console.log(found.index);
console.log(found["index"]);
Array ["dbbd", "bb"]
1
1

So, the result is a nullable array that sometimes work as an object. I'll let that sink in for a bit.

This is the end

Once I got the equivalence down pat, it was just a matter of copying the data and rewriting a few functions, and ELIZA was back, as a libray, so that I could use it in CLI tools, iOS apps, or MacOS apps.

When I'm done fixing the edge cases and tinkering with the ranking system, I might even publish it.

In the meantime, ELIZA and I are rekindling an old friendship on my phone!


Introducing FuzzyTests

TL;DR: Grab it here : Github repo

Unit testing is painful amirite?

Writing good tests for your code very often means spending twice as much time coding them than on the things you test themselves.

It is good practice though to verify as much as possible that the code you write is valid, especially if that code is going to be public or included in someone else's work.

In my workflow I insist on the notion of ownership :

The bottomline for me is this: if there are several people on a project, I want clearly defined ownership. It's not that I won't fix a bug in someone else's code, just that they own it and therefore have to have a reliable way of testing that my fix works.
Tests solve part of that problem. My code, my tests. If you fix my code, run my tests, I'm fairly confident that you didn't wreck the whole thing. And that I won't have to spend a couple of hours figuring out what it is that you did.

This a a very very very light constraint when you compare it to methodologies like TDD, but it's a required minimum for me.

Plus, it's not that painful, except...

Testing every case

In my personal opinion, the tests that are hardest to do right are the ones that have a very large input range, with a few failure/continuity points.

If, for instance, and completely randomly, of course, you had an application where the tilt of the phone changes the state of the app (locked/unlocked, depending on whether the phone is lying flat-ish on the table or not:

  • from -20º to 20º the app is locked
  • from 160º to 200º the app is locked
  • the rest of the time it's not locked
  • All of that modulo 360, of course

So you have a function that takes the current pitch angle, and returns if we should lock or not:

func pitchLock(_ angle: Double) -> Bool {
  // ...
}

Does it work? Does it work modulo 360? What would a unit test for that function even look like? A for loop?

I have been looking for a way to do that kind of test for a while, which is why I published HoledRange (now Domains 😇) a while back, as part of my hacks.

What I wanted is to write my tests kind of like this (invalid code on so many levels):

for x in [-1000.0...1000.0].randomSelection {
  let unitCircleAngle = x%360.0
  if unitCircleAngle >= 340 || unitCircle <= 20 {
    XCTAssert(pitchLock(x))
  } else if unitCircleAngle >= 160 && unitCircle <= 200 {
    XCTAssert(pitchLock(x))
  } else {
    XCTAssertFalse(pitchLock(x))
  }
}

This way of testing, while vaguely valid, leaves so many things flaky:

  • how many elements in the random selection?
  • how can we make certain values untestable (because we address them somewhere else, for instance)
  • what a lot of boilerplate if I have multiple functions to test on the same range of values
  • I can't reuse the same value for multiple tests to check function chains

Function builders

I have been fascinated with @_functionBuilder every since it was announced. While I don't feel enthusiastic about SwiftUI (in french), that way to build elements out of blocks is something I have wanted for years.

Making them is a harrowing experience the first time, but in the end it works!

What I wanted to use as syntax is something like this:

func myPlus(_ a: Int, _ b: Int) -> Int

DomainTests<Int> {
    Domain(-10000...10000)
    1000000
    Test { (a: Int) in
        XCTAssert(myPlus(a, 1) == a+1, "Problem with value\(a)")
        XCTAssert(myPlus(1, a) == a+1, "Problem with value\(a)")
    }
    Test { (a: Int) in
        let random = Int.random(in: -10000...10000)
        XCTAssert(myPlus(a, random) == a+random, "Problem with value\(a)")
        XCTAssert(myPlus(random, a) == a+random, "Problem with value\(a)")
   }
}.random()

This particular DomainTests runs 1000000 times over $$D=[-10000;10000]$$ in a random fashion.

Note the Test builder that takes a function with a parameter that will be in the domain, and the definition that allows to define both the test domain (mandatory) and the number of random iterations (optional).

If you want to test every single value in a domain, the bounding needs to be Strideable, ie usable in a for-loop.

DomainTests<Int> {
    Domain(-10000...10000)
    Test { (a: Int) in
        XCTAssert(myPlus(a, 1) == a+1, "Problem with value\(a)")
        XCTAssert(myPlus(1, a) == a+1, "Problem with value\(a)")
    }
    Test { (a: Int) in
        let random = Int.random(in: -10000...10000)
        XCTAssert(myPlus(a, random) == a+random, "Problem with value\(a)")
        XCTAssert(myPlus(random, a) == a+random, "Problem with value\(a)")
   }
}.full()

Conclusion

A couple of hard working days plus a healthy dose of using that framework personally means this should be ready-ish for production.

If you are a maths-oriented dev and shiver at the idea of untested domains, this is for you 😬