maths - Jack Of Spades In Wonderland

04 Feb 2025 on meta, maths, opinion, rant, programming

Why Is Science "Boring"?

OK, so, now that I don't spend my days teaching anymore, my friends and family are amazed at the fact that I can't stop learning new things, starting new projects, getting up at 6am and carrying on carrying on.

The worst part? I've kept the habit of a teacher and try to enthuse people about the stuff I discover / learn about... To mixed success.

This has led me to thinking about a worrying trend I find in online and offline conversations that may or may not be linked to conspiracy theories and "fringe" opinions.

The urge to publish was definitely tickled by this toot

Folks who don't understand AI like it more but once they understand it they find it less appealing

Where are you coming from?

First off, it's quite a big topic, one that can probably fill a few PhD thesises (thesii?) for a sociology/psychology department. This are just my thoughts about it, based on anecdotal evidence and cursory research. Feel free to hit me up about it, I'm totally open (and maybe even hoping) to be proven wrong.

As someone trained in linguistics, I feel the need to make sure the terms I'm using are somewhat well defined:

"Science" as in the process of learning or discovery (in a greek sense closer to Μάθημα - Máthēma, something like the act of learning) is kind of alive and kicking, albeit in a weird way
"Science" as in the established knowledge (in a greek sense closer to Γνόσι - Gnósī, whether collectively or privately) is very much in a state of active battle, and probably the one I have the most thoughts about

I'm sure some people will be thinking I'm preaching to the choir here, but I've seen with my own eyes the decline of visibility (in a positive sense) for the experts. Let me paint the scene for you:

We are in a meeting with half a dozen experts of their field and a couple of manager/investor types. The experts are disagreeing on many things but not on the most fundamental one: the proposal will not work. After the meeting, they are branded as killjoys, a vague social post arguing that the proposal is a good idea is brandished, and the decision to move forward is taken, to predictable results.

I'm not saying it's always the case, mind you. But more often than not, I have sat in auditoriums, meetings, what have you, and looked at a group of non-experts slowly but surely convincing themselves that the people they invited to talk (and sometimes even paid to do so) were wrong.

For someone who grew up in a time when science communicators were mostly researchers talking about their active research on live television, this is quite a culture shock. That's why I thoroughly enjoy youtube channels like Computerphile, who show actual research scientist being passionate, whether or not they are what is considered to be "good communicators" (and they are quite decent communicators, too!)

But the fact is, we have clearly entered an era where science communicators are permanent fixture. And I'm not throwing shade at them at all! They do a great and needed job.

But, when you ask them, one of the very first thing most of them will say is that they aren't "real" scientists (I beg to differ), and that they are like a gateway drug: get people interested in the topic so that they will dig deeper in the "actual" science.

What it feels like is that this worthy goal is never achieved.

The 10-minutes version of what could fill hours of lectures is enough: Γνόσι has been achieved, and Μάθημα is satisfied.

The problem is that it cuts both ways: a skilled communicator promoting pseudoscience will get the apparent same result as the one talking about what actual science says.

Milo Rossi (aka Miniminuteman on YouTube) has a fascinating lecture on his own field, right there on his channel

His central thesis seems to be that because communicating instantly to millions of people has become cheap, easy, and to some extent make enough money and/or fame back to the creator to encourage them to "challenge" the status quo, and find their audiences.

And, just like him, I find our position delicate: how do we promote the experts' words while not gatekeeping - preventing genuinely curious people to go and do their own research and experiments in good faith (as opposed to the derided "do your own research" mantra used by conspiracy theorists everywhere)?

Fine, but are you an expert in expertise?

Oof, good point, well made.

Here are my bona fides, and my experience on the matter. I will let you decide if it invalidates everything I have to say about it not.

I have been teaching classes in maths, computer science, data, and (machine) learning, as well as training other people in giving classes in front of audiences, for a couple of decades. There are definitely areas where I do decent, and others where I totally suck. You'll have to ask my students if I was overall a good teacher or not.

But objectively, a large majority of the people I trained went on to have a career in the field that I was in charge of teaching them, so I think it went decent. And while there were many, many, many, complaints, these are students we're talking about, and the power dynamic at play - me grading them based on "arbitrary" goals - means a somewhat antagonistic relationship, even when friendly. The grading game is a very heavy bias in your appreciation of teachers. I know. I even tested it: classes without an evaluation at the end tended to be lauded wayyyyyyyyy more than "strict" and friendly teachers.

Before we carry on with the central thesis of this post that will probably end up being way too long, here's a couple of observations I made as the head teacher / managing teacher of the curriculum:

I had the chance of heading a department where the jobs the students would get at the end paid very well. (Too well? Yes probably, especially given the fact we still refuse to be an industry)
Students attending class had a variety of incentives to specialize in CompSci, but let's not kid ourselves: making money was part of the equation.
The highest paying jobs are usually the ones requiring the most expertise, or at least this used to be the case.
So therefore, students should be trying to acquire as much expertise as possible, as fast as possible, to land those high-paying (and/or fun, and/or motivating, ...) jobs

Right?

Except: Students had a tendency to "check out" fairly rapidly

"Copypasta" has been a thing forever, but when your exam takes place on a machine literally designed to make the process of finding something on the web and copying it in your production easier (and now AI that automates - badly - most of it), the incentive to "waste" an hour on something that would "grade" the same in a minute is hard to find.
And during class, because the slides are available, or it's recorded, or there's a good chance to find a 10 minute video on "the same topic" online, why not do something more fun, and catch up later?
If the passing grade is 10/20, and having more isn't rewarded, why go the extra mile?

Subsequent conversations with them yielded interesting observations, on top of those fairly obvious ones. What they told me they yearned for was "engagement" or "entertainment". and it's a whole Thing. Make learning fun, they say!

While I do agree with the sentiment (learning is fun, to me), I find myself obligated to point out that in this particular instance, it doesn't have to be fun: they were ostensibly there to get a good paying job (among other things). I mean, I'm training people to compete with me for jobs and contracts, and I have to make them feel entertained?

Regardless of the validity of that demand, I always did strive to make the learning fun, because it's fun - to me. But I always found the reversal of expectations weird: here you are, asking me for a favor, and I have to thank you by entertaining you?

I totally understand how that may sound gatekeeper-y and demeaning. And, again, I never actually subscribed to that attitude, but when you think about it for a minute, there definitely is a difference between learning something that will be used to earn a living, and learning things without ulterior motives - at the beginning at least. Because of my well-ingrained biases regarding the former, I will keep the rest of this post to the latter: learning stuff that may not net an immediate advantage.

You're gonna talk about history now, aren't you?

Yes. Giving context and a bit of epistemology is part of the scientific method. Plus it's a fascinating topic in and of itself. It will happen.

I'm not going back too far though, only to the printing press, and the Renaissance (or Early Modern Era as I learned it should probably be called, because it was only a rebirth for Western Europe, not really the whole world where they didn't have our wonderfully repressive period known colloquially as the Dark Ages).

Scientists then, like now, had feuds about their hypothesises (hypothesi?), and wrote scathing articles, pamphlets, and letters, about their colleagues who were obviously wrong. Because of the wide dissemination of ideas and a general curiosity about how the world actually worked in a more practical way than "oh well, it's the will of a deity" or "well, my grandparents did it that way so it must carry on the same", there was a crowd - yes an actual crowd - coming to hear the latest discoveries at the various academies.

It's kind of like a Youtube channel, but you had to go to an auditorium once a month or whatever, to hear, usually from the mouth of the actual scientist or one of his proxies (student, secretary of the academy, colleague) what the exciting new scientific discovery was. And if you couldn't make it, there were local clubs and academies that would debate the written reports of the lecture.

Scientists were household names, and pride of their countries. You have a Gauss? Well we have a Newton! A lot of streets, buildings, and areas, are still named after them, and you were expected to be able to quote them before entering a debating arena.

And yet, there were close to no intermediary to the content of their theories: the people read the articles or letters verbatim. Science communication was without added value. If you wanted to participate in those discussions, you had to learn the whole thing, not just the cliff notes version.

The weird part is that while this feels super elitist - you have to be a world class mathematician to contribute to a discussion of the latest math discovery - it was fairly open. As long as you had enough free time (that is elitist, for different reasons) to learn, and could back your claims up in front of renowned scientists with the proper methodology, you were part of the club. A lot of scientists up until the Industrial Revolution were not just scientists. Enormous scientific discoveries and contributions were made by people with practical knowledge that wanted to find out the why, or people with a lot of free time just engaging in science out of curiosity.

That popular interest in discoveries and science continued unabated up until very recently. Scientific magazines, and, as I said before, scientists on TV, were very popular, up until the complete takeover of the Internet as a source of knowledge.

For reference, here are the circulation numbers for the scientific magazines here in France (may I suggest diving into those stats? Apart from being paper scans and sometimes hard to decipher, there aren't any regular format or nomenclature. Makes for a fun afternoon of research!):

"Science & Technique" category, in circulation numbers / year

What's surprising to me is the resurgence super recently. But you can clearly see the peak in 1981 among a global rise in interest about science & tech magazines, then the peak / dip around the Y2K bug, and since 2014-ish a sharp decline.

Obviously, this is only the number for France, and if you take a peek at the data itself, you'll see that there is a margin for error (what exactly is included in that category is arbitrary to a point). But it matches my gut feeling, so there is that.

Of course, nowadays, the undisputed royalty of scientific communication is the Internet. Science communicators have setup many very interesting channels, as I mentioned before, and most of the good ones are quite humble despite the fact that they have a viewership that the equivalent press would have considered a dream only a couple of decades ago.

Scientific honesty starts with, and carries on with, "I may be wrong" - always. Even when your theory (in the scientific sense, not in the vernacular "hypothesis" sense) seems to tick all the boxes and match all the data, there is a small chance that it could be at worse wrong except in this particular case, and at best right but only as as part of a more general theory. Again, watch people like Milo Rossi to see how they deconstruct their own certainties and doubts publicly.

So, with that historical context in place, let's talk about the iceberg.

The iceberg as in the thing that sunk the "insubmersible" Titanic?

More as in the thing that is mostly underwater. Because we have an Iceberg Problem.

The openness of science is still a thing most scientists strive for, but in order to understand - or even contribute - to modern research, you can't really do it in your free time and as a dilettante anymore, however brilliant.

Let me give you a personal example: I tried to teach a class on native mobile development. Despite what most platform vendors (and now IA vendors) would have you believe, software development isn't easy. Sure, if you have an image of something you want to put on the web (a landing page, a presentation page, anything that you could compose in a word processor or a page layout piece of software), making "a web site" that matches the visuals is fairly simple.

And, if you're only interested in what everybody else is doing, there's a good chance that someone out there does it already and could either sell it to you or give it for free.

This blog is easy to "make": I open a web browser, and type something in the window and blammo, you can read it. Except... this is only the tip of the iceberg.

Your web browser has more than 20 millions of lines of code, it runs on an operating system that is a couple of orders of magnitude more complex than that. And on my end, the server also has millions of lines of code. All of that requires maintenance by someone you don't know, and, more likely than not, that you haven't paid.

That's something of a marvel to me that despite the poor design choices we inherited, and the average number of bugs per 10k lines of code, when you write <center>something</center>, it is somewhat in the middle of the page. If it's not, then you're in for a whoooooooooooole lot of debugging time.

Back to my native development class: it's "lower level" than the web, which, in practice, means that there is one or two less layers of millions of lines of code for my "hello world" to be displayed somewhat properly. Oh and you add some real life issues such as battery and network performance into the mix, which you don't really have to care about "on the web".

What it meant is that my first class struggled with concepts and underpinnings that they never had to think about before. So I had to add more hours to the class. Then to add some pre-requisites in the years before. Then a whole new kind of course around designing and debugging code.

Because, you see, there's a whole lot of things under the hood that work just fine for 80% of what you want to do. The rest is trying to figure out why it doesn't and decide whether it requires a workaround of some weirdness in the underlying structure, or a completely new thing that has to be built from scratch.

In practice, what it means is that you can cobble something that looks like it's working fine enough without much training, but as your work is exposed to more users and situations, the probability of it breaking goes up exponentially. And you may not have any idea why.

Anyways, enough of my own field, it holds mostly true for every single one nowadays: in order to understand the current physics, the level of maths and prior work needed to "get it" is astounding. And it's an immediate turn off for a lot of folks.

Plus, because a lot of the most graspable concepts have been known and integrated into our lives for so long, the incentive to understand the current forefront of a scientific field boils sometimes down to "do I have time?" which correlates neatly to "what purpose does it serve?" and "how does it benefit me?"

If you knew the most current theories of physics, say 300 years ago, it might give you an edge in your daily life, like using levers instead of wrecking your back. Then it was about how does it help me "professionally"? Like using pulleys in your factory instead of 20 sweaty dudes. Nowadays? Practical applications of physics research may help someday a mega corporation that will sell me something that makes my life a lot easier.

So, it's all about curiosity, except when it's your job to be at the edge of what can be done.

But people still consume a lot of "educational" content, are they not?

Time for a tangent. Curiosity has always felt like a two-headed thing to me. There's "genuine" curiosity - as in "how is it possible? I'd like to know more about how or why this thing is" - which is almost always the starting point of research both personal and institutional, and there's ἐκρυπτός (ekrīptós), hermeticism, esotericism, arcanism, which are more about what I know that you don't, and is about power.

Because, yes, knowledge is power, even when it's not science. Power over people, power over situations, power over the machine,... If you know stuff, you're less passive, less at the mercy of whatever or whoever is guiding the situation you are in. And it's quite intoxicating to feel in control, or at least "understand the plan" for most people.

Just look at the success of gossip, magic tricks and the like! And in my field it's especially prevalent. If a company/freelancer doesn't tell the potential customer that it would probably cost a lot less to do their thing using a very well maintained and fairly easy to learn piece of free software, they may be able to extract quite a lot of monies from them.

That's why I'm particularly careful with science vulgarization in particular, and "knowledge" communication in general. Again, many people have written books and made videos and all that about it, but the incentives of the person giving you information factor in how I parse said information.

Now, the issue is, it's exactly what pseudoscience aficionados say as well. "The establishment (whatever that is) is lying to you, seek your own truth."

Let's talk incentives then shall we?

On the one hand, you have governmental and/or scientific cabals, that invest billions of monies into maintaining a fiction about some lie being the truth that involves thousands or millions of people that have to be bribed, coerced, intimidated, and what have you, in order to keep the "population" (ie you, and me) in a state of... compliance? Tranquility?

On the other hand, you have that person on the internet that knows the truth that is hidden and benefits financially and in terms of fame from your viewership.

Oh and those all powerful governments who managed to keep a lid on things for so long and at great expense somehow fail to take down the videos of Joe Schmock from Nowherville.

While doubting governments (because, yes, they do lie, hide and/or mismanage the truth sometimes) is totally fair in many many areas, science isn't "governmental". A random research team in Brazil do their thing, formulate hypotheseseseesses (hypothesi?), include them in a theory, and publish it for another set of random teams in Japan and Kenya to conform or deny.

You're telling me that the French government, which is publicly fighting with many governments over a lot of issues (and has warred because of it, too) is somehow coordinating with the UK to hide the truth about the moon landings while accusing them of lying about almost everything else?

Sorry about the choice of examples, but it's currently the 6 Nations tournament, and we French and English love to hate each other during tournaments. Especially Rugby. So, there. Apologies all around.

How does that relate to science being "too boring"?

Glad you asked. So, in my mind, the appetite for "trivia" or for useful knowledge (the kind that gives you a job, or power, or fame, or monies) hasn't abated at all. But actual modern science requires effort and dedication to understand.

Going back to my own field, if I can earn $10k with a customer who wants a website by doing $30 worth of work, why wouldn't I? (I mean except for honesty, ethics, love of humanity, long term reputation effects, risk of being found out as a fraud and a swindler, and many many more reasons)

And if I can tell that customer that they will never have the time to learn all the esoteric knowledge that justifies that price tag and they can't have a decent picture of the truth in my words, I win, right? (Power, again)

The reality is, most of my knowledge is boring tedium (for most people) about bits and bytes, algorithms that were thought of and perfected over a long period of time, a lot of "yea we do this like that because it's what we've always done, so all the tools are geared that way, like it or not", a dash of general understanding of maths and physics, and lots and lots and lots and lots of lessons learned from past mistakes.

In order to know as much as me, you don't have to be particularly smart. You have to put in the hours and do the mistakes. Experience is a catalog of dead ends that I know to avoid and you don't. It's as simple as that, mostly.

And most of science is that way: it's a pyramid that is now so high that no one has the time to learn from top to bottom, except for people whose job it is. But there is power to be gained from the tip. Funding for researchers, fame for communicants (and monies too), glory for nations, and, yes, all of the above for fakers and frauds too.

Once you start establishing your bona fides by explaining the layers your current expertise relies on, you are "boring" compared to your competition - the people who may not know as much as you, but are far more entertaining in their presentation - who just doesn't care about being right, just about being right enough.

And so we have pseudoscience that promises "secret knowledge" and power over your "sheeple" condition.

And we have noise about competing theories or discoveries for research grants.

And we have hiring issues.

In all my years training people, it's a sad observation to realize that it's not necessarily the most knowledgeable who get the high-paying job. I've met and trained genuinely curious, smart and motivated people who struggled to get hired. I've met single minded researchers that were working on something that could benefit humanity as a whole get their research grant denied, or so heavily hogtied with strings attached that it made the act of searching meaningless. It's not necessarily true for all the smart and/or knowledgeable people I've met, but the percentage is high enough to make me sad.

The scientific method works so well because it's predictable, open, and slow. As opposed to surprising, arcane, and fast. And therefore, yes, it's boring.

But it's boring in the same way that doing the scales and noodling on your instrument for hours before you take the stage is boring. It's boring in the same way the characters in your novel / movie / tv show slowly work towards winning at the end.

It's boring if you consider only the process and the ongoing effects, rather than the potential for apotheosis.

You said "AI" earlier. That was just clickbait, right?

Point of origin, rather than clickbait. And one I will address now.

Back in the Before Times, I was asked to moderate a panel on the use of AI in finance. You can still see the video here.

These people have invested a lot of time, money and expertise, into trying to make products using AI, with customers who are very picky and take big risks when using their expertise in the field.

AI as it understood these days (LLMs, agents, domain-specific models,...) works of a pyramid of technologies and assumptions, some more hidden than others. What do I mean by that?

Without delving too deep in the technical side, the pipeline is as follows:

Gather a lot of data (assumption: this data is relevant to the rules, that data isn't)
From this data try to infer correlations and "rules" (assumption: there is a rule albeit an arcane one)
Compile these rules into a model (assumption: the modelling will capture the rules)
"Apply" the model to new data, and witness that if you follow those rules, the output should be X (assumption: new data will fit in the ruleset of the model)

For Large Language Models, the data is text. The system tries to infer words (or sequences of words) based on whatever words were given as input. If you build your model on questions and answers, what you get is something that will try to guess the probability on a sequence of words being the correct answer to another sequence of words that are considered to be a question.

That model will not be able to handle anything that is not a question, or that is a question but on a topic is has never seen before. Technically it's impossible.

And herein lies my problem with the current marketing of AI as a panacea: because most people do not understand the pipeline underneath, they cannot interpret the output properly: the model will always give you an answer, but with probabilities attached.

In my example, if I give my question/answer model the input "my dog barks at birds and the sky is blue", it might output something along the lines of "the sky is blue because of refraction in the atmosphere that scatters the red light" but with a probability (aka confidence) of 0.4.

If you've ever seen a matching algorithm (recommendation for buying stuff, dating sites, whatever), the underlying principle is the same... it tries to match an output with your input.

Now if you use the user-friendly version of whatever LLM you want, it won't give you that. It will output with absolute authority the answer that best matches your input, based on whatever data it was trained on. You usually do not get the nitty gritty probability numbers, and therefore how "confident" the model is it performed correctly.

On top of that, the companies that sell you access to their LLMs also have algorithms and rules that will parse the model's output and modify it, while hiding the inner workings, unless you write code to tap into the lower level - their APIs.

Their incentive is to look authoritative so that the product itself doesn't appear wonky (can you imagine a car salesman that would tell you the car may work fine a lot of times but they don't know if it will work on your driveway?) by hiding all the genuinely cool stuff underneath, and make it appear as magical and useful as possible.

BUT in order to have a somewhat reliable model, it has to be Large, because the moment the model encounters something completely outside of what it knows the probabilities of, it will output nonsense. And that precludes most of the personal experimentations that curious people may have.

The systems, principles and techniques are simple. Data in, a bit of probabilistic maths, and data (with attached probability) out. Having access to enough training data, and the computing power to find out the relationship between every possible input and every possible output is staggeringly difficult.

So you can't really check if the model was properly trained (or train it yourself), or even if the output has a high confidence, because the one thing that models can't do is say "I don't know". They will tell you instead gibberish, with a low confidence rating, which is their way of saying that they can't fulfill your desires. And that part is hidden for a large majority of the users.

And so we're back to asserting stuff with confidence, and the difficulty of distinguishing an actual expert from someone who can say stuff nice.

Not clickbait, alright?

I don't get it. Is science boring?

I don't think so. It requires a bit more effort nowadays than witnessing parlor tricks or being told in 10 minutes about a thing. But it also ranges quite far. We do things daily that were unthinkable a couple of generations ago.It's just that, because the effect is mundane, it's lost a bit of appeal. But it's still so fascinating and so cool, if you look at how such a mundane thing is done. Just last year in 2024, we accomplished things as a species that are completely and totally amazing, if you stop and think about it:

We're hopefully, finally, on the cusp of preventing HIV. A disease that infects more than a million people every year and kills two thirds of that. Every. Single. Year. And we are making progress in understanding the virus and fighting it. The cure isn't here yet, but we're getting better at preventing and fighting it.

Japan managed to soft land (ie not crash) a human sized object on the moon with a precision of a 100 meters, after a trip of 400 000 kms! That's like putting the billiard ball in the pocket from 440 km away. How's that not amazing?

Speaking of space, our species launched a spacecraft the size of car on a journey that will take 6 years, just to take a better look at a bunch of moons in orbit around Jupiter. We can do that.

We also found the oldest settlement to date in the Amazon rainforest, and we didn't even have to remove the trees to do so. Like, these cities were built and abandoned millenia ago, they are immensely difficult to access, and there was a decent chance it was just a fluke of nature to see patterns in the jungle, but with LIDAR and lots of careful analysis, those ruins are within the realm of human knowledge once again.

And I'm sure that in a field that you take an interest in, there were many advances and discoveries.

I may submit to the unknown, but never to the unknowable
- Yama, Lord of Light, Roger Zelazny - 1967

30 Dec 2022 on algorithms, coding, CS, maths, meta, opinion, programming

[AoC 2022] Recap

TL;DR

I'm not as rusty as I thought I'd be. And YES that kind of challenge has a place in the coding world (see conclusion)

Just like every year, I had a blast banging my head on the Advent of Code calendar. It so happens that this year, I had a lot less brain power to focus on them due to the exam season at school and the ensuing panicky students, but it could also be because my brain isn't up to spec.

Some of my students are/were doing the challenges, so I didn't want to post anything that would help them, but now that the year is almost over, I wanted to go over the puzzles and give out impressions (and maybe hints).

Easing in

Days 1 to 7 were mostly about setting up the stage, getting into the habit of parsing the input and using the right kind of structure to store the data.

Nothing super hard, it was "just" lists, hashmaps, and trees, until day 4. Day 4 was especially funny to me, because I wrote HoledRange / Domain just for that purpose (disjointed ranges and operations on them). Except I decided to do this year's calendar in Julia, and the library I wrote is for Swift. Just for kicks, I rewrote parts of the library, and I might even publish it.

Days 5, 6 and 7 highlighted the use of stacks, strings, and trees again. Nothing too hard.

Getting harder

My next favorite is day 9. It's about a piece of rope you drag the head of, and have to figure out what the tail does. If you've ever played zig-zaging a shoelace you'll know what I mean. String physics are fun, especially in inelastic cases.

Many ways to do that, but once you realize how the tail catches up to the head when the latter is moved, multi-segmented chains are just a recursive application of the same.

I was waiting for a day 10-like puzzle, as there tends to be one every year, and I majored in compilers all those years ago. State machines, yuuuuuusssssssss.

A lot of puzzles involve path finding after that, which isn't my strong suit for some reason. But since the algorithms are already out there (it was really funny to see the spike in google searches for Dijkstra and A*), it's "just" a matter of encoding the nodes and the edges in a way that works.

Day 13 is fun, if only because it can be instantly solved in some language with eval, which will treat the input as a program. I still wrote my own comparison functions, because I like manipulating numbers, lists and inequalities.

Day 14 is "sand simulation", that is grains of sand that settle in a conical shape that keeps expanding laterally. Once you find the landing point on each ledge and the maximum width of the pile, there's a calculable result. Otherwise, running the simulation works too, there aren't that many grains. For part 2, I just counted the holes rather than the grains.

Day 15 is about union and intersections of disjointed ranges again, except in 2D. Which, with the Manhattan distance approximation, gets back to 1D fairly quickly.

Day 16 stumped quite a few people, because of the explosive nature of path searching. Combinatorics are pretty hard to wrap your head around. Personally, I went for "reachability" within the remaining time, constructed my graph, and explored. It was probably non optimal.

Day 17 make me inordinately proud. Nuff said.

Catching up

Because of the aforementioned workload, I was late by that point, so I decided to take my time and not complete the challenge by Xmas. Puzzles were getting hard, work was time-consuming, so the pressure needed to go down.

Because of the 3D background that I had, I tackled day 18 with raytracing, which is way over-engineered, but reminded me of the good ole times. Part 2 was trickier with that method, because suddenly I had 2 kinds of "inside".

Day 19 was path finding again, the trick being how to prune paths that didn't lead in a good direction. Probably the one that used up the most memory, and therefore the one I failed the most.

Because of my relative newness to Julia, I had to go through many hoops to solve day 20. As it turns out, screw_dog over on mastodon gave me the bit I lacked to solve it simply, although way after I solved it using other means.

Day 21 goes back to my compiler roots and tree optimizations, and Julia makes the huge integer manipulation relatively easy, so, there. Pretty proud of my solution:

Part 1:   3.709 ms (31291 allocations: 1.94 MiB)
Part 2:   4.086 ms (31544 allocations: 1.95 MiB)

Which, on my relatively slow mac mini is not bad at all! Symbolic linear equation solving (degree one, okay) is a fun thing to think about. I even think that the algorithm I devised would work on trees where the unknown appears on both sides of the tree. Maybe I'll test it some day.

Day 22. Aaaaaaaah day 22. Linked lists get you all the way if and only if you know how to fold a cube from a 2D pattern. I don't so, as many of the other participants, I hardcoded the folding. A general solution just eluded me. It's on my todo list of reading for later.

Day 23 is an interesting variant of Conway's Game of Life, and I don't believe there is a way to simplify a straight up simulation, but I fully accept I could be wrong. So I used no trick, and let the thing run for 40s to get the result.

Day 24 was especially interesting for me, for all the wrong reasons. As I mentioned, graph traversal isn't my forte. But the problem was setup in a way that "worked" for me: pruning useless paths was relatively easy, so the problem space didn't explode too quickly. I guess I should use the same method on previous puzzles that I was super clumsy with.

Finally day 25 is a straight up algorithmic base conversion problem that's a lot of fun for my brain. If you remember how carry works when adding or subtracting numbers, it's not a big challenge, but thinking in base 5 can trip you up.

Conclusion

I honestly didn't believe I could hack it this year. I don't routinely do that kind of problem anymore, I have a lot of things going on at school, on top of dealing with the long tail of Covid and its effects on education. Family life was a bit busy with health issues (nothing life threatening, but still time consuming), and the precious little free time that I had was sure to be insufficient for AoC.

I'm glad I persevered, even if it took me longer than I wished it had. I'm glad I learned how to use Julia better. And I'm happy I can still hack it.

I see here and there grumblings about formal computer science. During and after AoC, I see posts, tweets, toots, etc, saying that the "l33t c0d3" is useless in practical, day-to-day, professional development. Big O notation, formal analysis, made up puzzles that take you into voluntarily difficult territories, all these things aren't a reflection of the skills that are needed nowadays to write good apps, to make good websites, and so on.

It's true. Ish.

You can write code that works without any kind of formal training. Today's computing power and memory availability makes optimization largely irrelevant unless you are working with games or embedded systems, or maybe data science. I mean, we can use 4GB of temporary memory for like 1/4 of a second to parse and apply that 100kB json file, and it has close to no impact on the perceived speed of our app, right? Right. And most of the clever algorithms are part of the standard library anyway, or easily findable.

The problem, as usual, is at scale. The proof-of-concept, prototype, or even 1.0 version, of the program may very well work just fine with the first 100 users, or 1000 or whatever the metric is for success. Once everything takes longer than it should, there are only 3 solutions:

rely on bigger machines, which may work for a time, but ultimately does not address the problem
scale things horizontally, which poses huge synchronization issues between the shards
reduce the technical debt, which is really hard

The first two rely on compute power being relatively cheap. And most of us know about the perils of infrastructure costs. That meme regularly makes the rounds.

It's not about whether you personally can solve some artificially hard problem using smart techniques, so that's ok if you can't do every puzzle in AoC or other coding challenges. It's not about flexing with your big brain capable of intuiting the bigO complexity of a piece of code. It's about being able to think about these problems in a way that challenges how you would normally do it. It's about expanding your intuition and your knowledge about the field you decided to work in.

It's perfectly OK for an architect to build only 1 or 2 level houses, there's no shame in it. But if that architect ever wants to build a 20+ stories building, the way to approach the problem is different.

Same deal with computer stuff. Learning is part of the experience.

04 Dec 2022 on algorithms, CS, coding, maths, programming

[Dev Diaries] Advent of Code

I've been really interested in Julia for a while now, tinkering here and there with its quirks and capabilities.

This year, I've decided to try and do the whole of Advent of Code using that language.

First impressions are pretty good:
- map, reduce, and list/array management in general are really nice, being first-class citizens. I might even get over the fact that indices start at 1
- automatic multithreading when iterating over collections means that some of these operations are pretty speedy
- it's included in standard jupyterhub images, meaning that my server install gives me access to a Julia environment if I am not at my computer for some reason

Now it's kind of hard to teach old dogs new tricks, so I'm sure I misuse some of the features by thinking in "other languages". We'll see, but 4 days in, I'm still fairly confident.

21 Jan 2020 on maths, machine learning, swift, CS

[ML] Swift TensorFlow (Part 1)

First part of doing RNN text prediction with TensorfFlow, in Swift

Broad Strokes

For all intents and purposes, it's about statistics. The question we are trying to solve is either something along the lines of "given an input X, what is the most probable Y?", or along the lines of "given an input X, what is the probability of having Y?"

Of course, simple probability problems have somewhat simple solutions: if you take a game of chess and ask for a next move based on the current board, you can do all the possible moves and sort them based on the probability of having a piece taken off the board, for instance. If you are designing an autopilot of some kind, you have an "ideal" attitude (collection of yaw, pitch and roll angles), and you calculate the movements of the stick and pedals that will most likely get you closer to that objective. If your last shot went left of the target, chances are, you should go right. Etc etc etc.

But the most interesting problems don't have obvious causality. If you have pasta, tomatoes and ground meat in your shopping basket, maybe your next item will be onions, because you're making some kind of bolognese, maybe it will be soap, because that's what you need, maybe it will be milk, because that's the order of the shelves you are passing by.

Machine learning is about taking a whole bunch of hopefully consistent data (even if you don't know for sure that it's consistent), and use it to say "based on this past data, the probabilities for onions, soap and milk are X, Y, and Z, and therefore the most probable is onions.

The data your are basing your predictive model on is really really important. Maybe your next item is consistent with the layout of the shop. Maybe it is consistent with what other customers got. Maybe it's consistent to your particular habits. Maybe it's consistent with people who are in the same "category" as you (your friends, your socio-economic peers, your cultural peers, ... pick one or more).

So you want a lot of data to have a good prediction, but not all the data, because noise (random data) does not reveal a bias (people in the shop tend to visit the shelves in that order) or doesn't exploit a bias (people following receipes want those ingredients together).

Oh yes, there are biases. Lots of them. Because ML uses past data to predict the future, if the data we use was based on bad practices, recommendations won't be a lot better.

There is a branch of machine learning that starts ex nihilo but it is beyond the scope of this introduction, and generates data based on a tournament rather than on actual facts. Its principle is roughly the same, though.

So, to recap:

We start with a model with random probabilities, and a series of "truths" ( X leads to Y )
We try with a Z, see what the model predicts
We compare it to a truth, and fix the probabilities a little bit so that it matches
Repeat with as many Zs as possible to be fairly confident the prediction isn't too far off

If you didn't before, now you know why ML takes a lot of resources. Depending on the number of possible Xs, Ys and the number of truths, the probability matrix is potentially humongous. Operations (like fixing the probabilities to match reality) on such enormous structures aren't cheap.

If you want a more detailed oriented explanation, with maths and diagrams, you can read my other attempt at explaining how it works.

Swift TensorFlow

There are a few contenders in the field of "ML SDK", but one of the most known is TensorFlow, backed by Google. It also happens to have a Swift variant (almost every other ML environment out there is either Python or R).

And of course, the documentation is really... lacking, making this article more useful than average along the way.

In their "Why Swift?" motivation piece, the architects make a good case, if a little bit technical, as to why swift makes a good candidate for ML.

The two major takeaways you have to know going in are:

It's a different build of Swift. You cannot use the one that shipped with Xcode (yet)
It uses a lot of Python interoperability to work, so some ways of doing things will be a bit alien

The performance is rather good, comparable or better than the regular Python TensorFlow for the tasks I threw at it, so there's that.

But the documentation... My oh my.

Let's take an example: Tensor is, as the name of the framework implies, the central feature of the system. Its documentation is here: https://www.tensorflow.org/swift/api_docs/Structs/Tensor

Sometimes, that page greets me in Greek... But hey, why not. There is little to no way to navigate the hierarchy, other than going on the left side, opening the section (good luck if you don't already know if it's a class, a protocol or a struct you're looking for), and if you use the search field, it will return pages about... the Python equivalents of the functions you're looking for.

Clearly, this is early in the game, and you are assumed to know how regular TensorFlow works before attempting to do anything with STF.

But fret not! I will hold your hand so that you don't need to look at the doc too much.

The tutorials are well written, but don't go very far at all. Oh and if you use that triple Dense layers on more than a toy problem (flower classification that is based on numbers), your RAM will fill so fast that your computer will have to be rebooted. More on that later.

And, because the center of ML is that "nudge" towards a better probability matrix (also called a Tensor), there is the whole @differentiable thing. We will talk about it later.

A good thing is that Python examples (there are thousands of ML tutorials in Python) work almost out of the box, thanks to the interop.

Data Preparation

Which Learning will my Machine do?

I have always thought that text generation was such a funny toy example (if a little bit scary when you think about some of the applications): teach the machine to speak like Shakespeare, and watch it spit some play at you. It's also easy for us to evaluate in terms of what it does and how successful it is. And the model makes sense, which helps when writing a piece on how ML works.

A usual way of doing that is using trigrams. We all know than predicting the next word after a single word is super hard. And our brains tend to be able to predict the last word of a sentence with ease. So, a common way of teaching the machine is to have it look at 3 words to predict a 4th.

I am hungry -> because, the birds flew -> away, etc

Of course, for more accurate results, you can extend the number of words in the input, but it means you must have a lot more varied sentence examples.

What we need to do here is assign numbers to these words (because everything is numbers in computers) so that we have a problem like "guess the function f if f(231,444,12)->123, f(111,2,671)->222", which neural networks are pretty good at.

So we need data (a corpus), and we need to split it into (trigram)->result

Now, because we are ultimately dealing with probabilities and rounding, we need the input to be in Float, so that the operations can wriggle the matrices by fractions, and we need the result to be an Int, because we don't want something like "the result is the word between 'hungry' and 'dumb'".

The features (also called input) and the labels (also called outputs) have to be stored in two tensors (also called matrices), matching the data we want to train our model on.

That's where RAM and processing time enter the arena: the size of the matrix is going to be huge:

Let's say the book I chose to teach it English has 11148 words in it (it's Tacitus' Germany), that's 11148*3-2 trigrams (33442 lines in my matrices, 4 columns total)
The way neural networks function, you basically have a function parameter per neuron that gets nudged at each iteration. In this example, I use two 512 parameters for somewhat decent results. That means 2 additional matrices of size 33442*512.
And operations regularly duplicate these matrices, if only for a short period of time, so yea, that's a lot of RAM and processing power.

Here is the function that downloads a piece of text, and separates it into words:

func loadData(_ url: URL) -> [String] {
    let sem = DispatchSemaphore(value: 0)
    var result = [String]()
    
    let session = URLSession(configuration: URLSessionConfiguration.default)
//     let set = CharacterSet.punctuationCharacters.union(CharacterSet.whitespacesAndNewlines)
    let set = CharacterSet.whitespacesAndNewlines
    session.dataTask(with: url, completionHandler: { data, response, error in
        if let data = data, let text = String(data: data, encoding: .utf8) {
            let comps = text.components(separatedBy: set).compactMap { (w) -> String? in
                // separate punctuation from the rest
                if w.count == 0 { return nil }
                else { return w }
            }
             result += comps
       }
        
        sem.signal()
    }).resume()
    
    sem.wait()
    return result
}

Please note two things: I make it synchronous (I want to wait for the result), and I chose to include word and word, separately. You can keep only the words by switching the commented lines, but I find that the output is more interesting with punctuation than without.

Now, we need to setup the word->int and int->word transformations. Because we don't want to look at all the array of words every time we want to search for one, there is a dictionary based on the hashing of the words that will deal with the first, and because the most common words have better chances to pop up, the array for the vocabulary is sorted. It's not optimal, probably, but it helps makes things clear, and is fast enough.

func loadVocabulary(_ text: [String]) -> [String] {
    var counts = [String:Int]()
    
    for w in text {
        let c = counts[w] ?? 0
        counts[w] = c + 1
    }
    
    let count = counts.sorted(by: { (arg1, arg2) -> Bool in
        let (_, value1) = arg1
        let (_, value2) = arg2
        return value1 > value2
    })
    
    return count.map { (arg0) -> String in
        let (key, _) = arg0
        return key
    }
}

func makeHelper(_ vocabulary: [String]) -> [Int:Int] {
    var result : [Int:Int] = [:]
    
    vocabulary.enumerated().forEach { (arg0) in
        let (offset, element) = arg0
        result[element.hash] = offset
    }
    
    return result
}

Why not hashValue instead of hash? turns out, on Linux, which this baby is going to run on, the values are more stable with the latter rather than the former, according to my tests.

The data we will work on therefore is:

struct TextBatch {
    let original: [String]
    let vocabulary: [String]
    let indexHelper: [Int:Int]
    let features : Tensor<Float> // 3 words
    let labels : Tensor<Int32> // followed by 1 word
}

We need a way to initialize that struct, and a couple of helper functions to extract some random samples to train our model on, and we're good to go:

extension TextBatch {
    public init(from: [String]) {
        let v = loadVocabulary(from)
        let h = makeHelper(v)
        var f : [[Float]] = []
        var l : [Int32] = []
        for i in 0..<(from.count-3) {
            if let w1 = h[from[i].hash],
                let w2 = h[from[i+1].hash],
                let w3 = h[from[i+2].hash],
                let w4 = h[from[i+3].hash] {
                    f.append([Float(w1), Float(w2), Float(w3)])
                l.append(Int32(w4))
            }
        }
        
        let featuresT = Tensor<Float>(shape: [f.count, 3], scalars: f.flatMap { $0 })
        let labelsT = Tensor<Int32>(l)
        
        self.init(
            original: from,
            vocabulary: v,
            indexHelper: h,
            features: featuresT,
            labels: labelsT
        )
    }
    
    func randomSample(of size: Int) -> (features: Tensor<Float>, labels: Tensor<Int32>) {
        var f : [[Float]] = []
        var l : [Int32] = []
        for i in 0..<(original.count-3) {
            if let w1 = indexHelper[original[i].hash],
                let w2 = indexHelper[original[i+1].hash],
                let w3 = indexHelper[original[i+2].hash],
                let w4 = indexHelper[original[i+3].hash] {
                    f.append([Float(w1), Float(w2), Float(w3)])
                l.append(Int32(w4))
            }
        }

        var rf : [[Float]] = []
        var rl : [Int32] = []
        if size >= l.count || size <= 0 { 
            let featuresT = Tensor<Float>(shape: [f.count, 3], scalars: f.flatMap { $0 })
            let labelsT = Tensor<Int32>(l)
            return (featuresT, labelsT)
        }
        var alreadyPicked = Set<Int>()
        while alreadyPicked.count < size {
            let idx = Int.random(in: 0..<l.count)
            if !alreadyPicked.contains(idx) {
                rf.append(f[idx])
                rl.append(l[idx])
                alreadyPicked.update(with: idx)
            }
        }
        
        let featuresT = Tensor<Float>(shape: [f.count, 3], scalars: f.flatMap { $0 })
        let labelsT = Tensor<Int32>(l)
        return (featuresT, labelsT)
    }
    
    func randomSample(splits: Int) -> [(features: Tensor<Float>, labels: Tensor<Int32>)] {
        var res = [(features: Tensor<Float>, labels: Tensor<Int32>)]()
        var alreadyPicked = Set<Int>()
        let size = Int(floor(Double(original.count)/Double(splits)))
        var f : [[Float]] = []
        var l : [Int32] = []
        for i in 0..<(original.count-3) {
            if let w1 = indexHelper[original[i].hash],
                let w2 = indexHelper[original[i+1].hash],
                let w3 = indexHelper[original[i+2].hash],
                let w4 = indexHelper[original[i+3].hash] {
                    f.append([Float(w1), Float(w2), Float(w3)])
                l.append(Int32(w4))
            }
        }

        for part in 1...splits {
            var rf : [[Float]] = []
            var rl : [Int32] = []
            if size >= l.count || size <= 0 { 
                let featuresT = Tensor<Float>(shape: [f.count, 3], scalars: f.flatMap { $0 })
                let labelsT = Tensor<Int32>(l)
                return [(featuresT, labelsT)]
            }
            while alreadyPicked.count < size {
                let idx = Int.random(in: 0..<l.count)
                if !alreadyPicked.contains(idx) {
                    rf.append(f[idx])
                    rl.append(l[idx])
                    alreadyPicked.update(with: idx)
                }
            }
        
            let featuresT = Tensor<Float>(shape: [f.count, 3], scalars: f.flatMap { $0 })
            let labelsT = Tensor<Int32>(l)
            
            res.append((featuresT,labelsT))
        }
        return res
    }
}

In the next part, we will see how to set the model up, and train it.

06 Jan 2019 on double, float, maths, coding, CS

Double Precision (Not)

From this list, the gist is that most languages can't process 9999999999999999.0 - 9999999999999998.0

Why do they output 2 when it should be 1? I bet most people who've never done any formal CS (a.k.a maths and information theory) are super surprised.

Before you read the rest, ask yourself this: if all you have are zeroes and ones, how do you handle infinity?

If we fire up an interpreter that outputs the value when it's typed (like the Swift REPL), we have the beginning of an explanation:

Welcome to Apple Swift version 4.2.1 (swiftlang-1000.11.42 clang-1000.11.45.1). Type :help for assistance.
  1> 9999999999999999.0 - 9999999999999998.0
$R0: Double = 2
  2> let a = 9999999999999999.0
a: Double = 10000000000000000
  3> let b = 9999999999999998.0
b: Double = 9999999999999998
  4> a-b
$R1: Double = 2

Whew, it's not that the languages can't handle a simple substraction, it's just that a is typed as 9999999999999999 but stored as 10000000000000000.

If we used integers, we'd have:

  5> 9999999999999999 - 9999999999999998
$R2: Int = 1

Are the decimal numbers broken? 😱

A detour through number representations

Let's look at a byte. This is the fundamental unit of data in a computer and is made of 8 bits, all of which can be 0 or 1. It ranges from 00000000 to 11111111 ( 0x00 to 0xff in hexadecimal, 0 to 255 in decimal, homework as to why and how it works like that due by monday).

Put like that, I hope it's obvious that the question "yes, but how do I represent the integer 999 on a byte?" is meaningless. You can decide that 00000000 means 990 and count up from there, or you can associate arbitrary values to the 256 possible combinations and make 999 be one of them, but you can't have both the 0 - 255 range and 999. You have a finite number of possible values and that's it.

Of course, that's on 8 bits (hence the 256 color palette on old games). On 16, 32, 64 or bigger width memory blocks, you can store up to 2ⁿ different values, and that's it.

The problem with decimals

While it's relatively easy to grasp the concept of infinity by looking at "how high can I count?", it's less intuitive to notice that there is the same amount of numbers between 0 and 1 as there are integers.

So, if we have a finite number of possible values, how do we decide which ones make the cut when talking decimal parts? The smallest? The most common? Again, as a stupid example, on 8 bits:

maybe we need 0.01 ... 0.99 because we're doing accounting stuff
maybe we need 0.015, 0.025,..., 0.995 for rounding reasons
We'll just encode the numeric part on 8 bits ( 0 - 255 ), and the decimal part as above

But that's already 99+99 values taken up. That leaves us 57 possible values for the rest of infinity. And that's not even mentionning the totally arbitrary nature of the selection. This way of representing numbers is historically the first one and is called "fixed" representation. There are many ways of choosing how the decimal part behaves and a lot of headache when coding how the simple operations work, not to mention the complex ones like square roots and powers and logs.

Floats (IEEE 754)

To make it simple for chips that perform the actual calculations, floating point numbers (that's their name) have been defined using two parameters:

an integer n
a power (of base b) p

Such that we can have n x bᵖ, for instance 15.3865 is 153863 x 10^(-4). The question is, how many bits can we use for the n and how many for the p.

The standard is to use 1 bit for the sign (+ or -), 23 bits for n, 8 for p, which use 32 bits total (we like powers of two), and using base 2, and n is actually 1.n. That gives us a range of ~8 million values, and powers of 2 from -126 to +127 due to some special cases like infinity and NotANumber (NaN).

$$(-1~or~1)(2^{[-126...127]})(1.[one~of~the~8~million~values])$$

In theory, we have numbers from -10⁴⁵ to 10³⁸ roughly, but some numbers can't be represented in that form. For instance, if we look at the largest number smaller than 1, it's 0.9999999404. Anything between that and 1 has to be rounded. Again, infinity can't be represented by a finite number of bits.

Doubles

The floats allow for "easy" calculus (by the computer at least) and are "good enough" with a precision of 7.2 decimal places on average. So when we needed more precision, someone said "hey, let's use 64 bits instead of 32!". The only thing that changes is that n now uses 52 bits and p 11 bits.

Coincidentally, double has more a meaning of double size than double precision, even though the number of decimal places does jump to 15.9 on average.

We still have 2³² more values to play with, and that does fill some annoying gaps in the infinity, but not all. Famously (and annoyingly), 0.1 doesn't work in any precision size because of the base 2. In 32 bits float, it's stored as 0.100000001490116119384765625, like this:

(1)(2⁻⁴)(1.600000023841858)

Conversely, after double size (aka doubles), we have quadruple size (aka quads), with 15 and 112 bits, for a total of 128 bits.

Back to our problem

Our value is 9999999999999999.0. The closest possible value encodable in double size floating point is actually 10000000000000000, which should now make some kind of sense. It is confirmed by Swift when separating the two sides of the calculus, too:

2> let a = 9999999999999999.0
a: Double = 10000000000000000

Our big brain so good at maths knows that there is a difference between these two values, and so does the computer. It's just that using doubles, it can't store it. Using floats, a will be rounded to 10000000272564224 which isn't exactly better. Quads aren't used regularly yet, so no luck there.

It's funny because this is an operation that we puny humans can do very easily, even those humans who say they suck at maths, and yet those touted computers with their billions of math operations per second can't work it out. Fair enough.

The kicker is, there is a litteral infinity of examples such as this one, because trying to represent infinity in a finite number of digits is impossible.