3D Ray-Tracing

used to do 3D. A lot. I have a few leftovers from that era of my  life, and I am still knowledgeable enough to follow along the cool stuff  that's coming out of the race between GPU manufacturers (when they  aren't competing over who mines cryptocurrencies the best 🙄).

It's  always been super hard for me to explain how ray-tracing works, because  it's a very visual thing and it seemed like it required a good deal of  spatial awareness from the people I was trying to explain it to. But it  was probably because I suck at explaining how ray-tracing works without  showing it.

So, I was super happy to find a video that explains it all better than I always have. Enjoy, then download blender and have fun rendering stuff.


That Should Work

To get your degree in <insert commerce / political school  name here>, there is a last exam in which you need to talk with a  jury of teachers. The rule is simple, if the student is stumped or  hesitates, the student has failed. If the student manages to last the  whole time, or manages to stump the jury or makes it hesitate, the  student passes.
This particular student was having a conversation  about geography, and a juror thought to stump the candidate by asking  "what is the depth of <insert major river here>?" to which the  student, not missing a beat answered "under which bridge?", stumping the  juror.

Old student joke/legend

Programming is part of the larger tree of knowledge we call computer science. Everything we do has its roots in maths and electronics. Can you get by with shoddy reasoning and approximate "that should work"  logic? Sure. But in the same way you can "get by" playing the piano  using only the index finger of your hands. Being able to play chopsticks makes you as much of a pianist as being able to copy/paste stackoverflow answers makes you a programmer/developer.

The  problem is that in my field, the end-user (or client, or "juror", or  "decision maker") is incapable of distinguishing between chopsticks and Brahms,  not because of a lack of interest, but because we, as a field, have  become experts at stumping them. As a result, we have various policies  along the lines of "everyone should learn to code" being implemented  worldwide, and I cynically think it's mostly because the goal is to stop  getting milked by so-called experts that can charge you thousands of  monies for the chopsticks equivalent of a website.

To me, the  problem doesn't really lie with the coding part. Any science, any  technical field, requires a long formation to become good at. Language  proficiency, musical instruments, sports, dancing, driving, sailing,  carpentry, mechanical engineering, etc... It's all rather well accepted  that these fields require dedication and training. But somehow,  programming should be "easy", or "intuitive".

That's not to say I  think it should be reserved to an elite. These other fields aren't. I  have friends who got extremely good at guitars by themselves, and sports  are a well known way out of the social bog. But developers seem to be  making something out of nothing. They "just" sit down and press keys on a  board and presto, something appears and they get paid. It somehow seems unfair, right?

There  are two aspects to this situation: the lack of nuanced understanding on  the person who buys the program, and the overly complicated/flaky way  we programmers handle all this. I've already painted with a very broad brush what we developers feel about this whole "being an industry" thing.

So what's the issue on the other side? If you ask most customers (and students), they respond  "obfuscation" or a variant of it. In short, we use jargon, technobabble,  which they understand nothing of, and are feeling taken advantage of  when we ask for money. This covers the whole gamut from "oh cool, they seem to know what they are talking about, so I will give them all my money" to "I've been burned by smart sounding people  before, I don't trust them anymore", to "I bet I can do it myself in  under two weeks", to "the niece of the mother of my friend is learning  to code and she's like 12, so I'll ask her instead".

So, besides reading all of Plato's work on dialectic and how to get at the truth through questions, how does  one differentiate between a $500 website and a $20000 one? Especially if  they look the same?

Well, in my opinion as a teacher,  for which I'm paid to sprinkle knowledge about computer programming onto  people, there are two important things to understand about making  software to evaluate the quality of a product:

  • Programming is exclusively about logic. The difficulty (and the price) scales in regards to the logic needed to solve whatever problem we are hired to solve
  • We very often reuse logic from other places and combine those lines of code with ours to refine the solution

Warning triggers that make me think the person is trying to sell me magic pixie dust include:

  • The  usual bullshit-bingo: if they try to include as many buzzwords (AI,  machine learning, cloud, big data, blockchain,...) as possible in their  presentation, you have to ask very pointed question about your problem, and how these things will help you solve it
  • If they tell you they have the perfect solution for you even though they asked no question, they are probably trying to recycle something they have which may or may not work for your issues

A  word of warning though: prices in absolute aren't a factor at all. In  the same way that you'd probably pay quite naturally a whole lot more  money for a bespoke dinner table that is exactly what you envision in  your dreams than the one you can get in any furniture store, your  solution cannot be cheaper than off-the-shelf. Expertise and tailoring  cannot be free. Balking at the price when you have someone who genuinely  is an expert in front of you, and after they announced their price is  somewhat insulting. How often do you go to the bakery and ask the  question "OK, so your cake is really good, and all my friends recommend  it, and I know it's made with care, but, like, $30 is way too  expensive... how about $15?"

I have also left aside the question  of visual design. it's not my field, I suck at it, and I think that it  is an expert field too, albeit more on the "do I like it?" side of the  equation than the "does it work?" one, when it comes to estimating its  value. It's like when you buy a house: there are the foundations, and  the walls, and the roof, and their job is to answer the question "will I  still be protected from the outside weather in 10 years?", whereas the  layout, the colors of the walls, and the furniture are the answer to the  question "will I still feel good in this place in 10 years?".  Thing is, with software development as well, you can change the visuals  to a certain extent (up to the point when you need to change the  position of the walls, to continue with the metaphor), but it's hard to  change the foundations.


DocumentDB vs MongoDB

From AWS gives open source the middle finger:

Bypassing  MongoDB’s licensing by going for API comparability, given that  AWS  knows exactly why MongoDB did that, was always going to be a   controversial move and won’t endear the company to the open-source   community.

MongoDB is hugely popular, although entirely  for the wrong reasons in my mind, and it's kind of hard to scale it up  without infrastructure expertise, which is why it makes sense for a  company to offer some kind of a turnkey solution. Going for  compatibility rather than using the original code also makes a lot of  sense when you're an infrastructure-oriented business, because your own  code tends to be more tailored to your specific resources.

But in  terms of how-it-looks, after having repeatedly been accused of leeching  off open-source, this isn't great. One of the richest services divisions  out there, offloading R&D to the OSS community, then, once the  concept proves to be a potential goldmine, undercutting the original?

The  global trend of big companies is to acknowledge the influence of  open-source in our field and give back. Some do it because they believe  in it, some because they benefit from fresh (or unpaid) eyes, some  because of "optics" (newest trendy term for "public relations"). I'm not  sure that being branded as the only OSS-hostile name in the biz' is a  wise move.


Double Precision (Not)

From this list, the gist is that most languages can't process 9999999999999999.0 - 9999999999999998.0

Why  do they output 2 when it should be 1? I bet most people who've never  done any formal CS (a.k.a maths and information theory) are super  surprised.

Before you read the rest, ask yourself this: if all you have are zeroes and ones, how do you handle infinity?

If  we fire up an interpreter that outputs the value when it's typed (like  the Swift REPL), we have the beginning of an explanation:

Welcome to Apple Swift version 4.2.1 (swiftlang-1000.11.42 clang-1000.11.45.1). Type :help for assistance.
  1> 9999999999999999.0 - 9999999999999998.0
$R0: Double = 2
  2> let a = 9999999999999999.0
a: Double = 10000000000000000
  3> let b = 9999999999999998.0
b: Double = 9999999999999998
  4> a-b
$R1: Double = 2

Whew, it's not that the languages can't handle a simple substraction, it's just that a is typed as 9999999999999999 but stored as 10000000000000000.

If we used integers, we'd have:

  5> 9999999999999999 - 9999999999999998
$R2: Int = 1

Are the decimal numbers broken? 😱

A detour through number representations

Let's  look at a byte. This is the fundamental unit of data in a computer and  is made of 8 bits, all of which can be 0 or 1. It ranges from 00000000 to 11111111 ( 0x00 to 0xff in hexadecimal, 0 to 255 in decimal, homework as to why and how it works like that due by monday).

Put like that, I hope it's obvious that the question "yes, but how do I represent the integer 999 on a byte?" is meaningless. You can decide that 00000000 means 990 and count up from there, or you can associate arbitrary values to the 256 possible combinations and make 999 be one of them, but you can't have both the 0 - 255 range and 999. You have a finite number of possible values and that's it.

Of  course, that's on 8 bits (hence the 256 color palette on old games). On  16, 32, 64 or bigger width memory blocks, you can store up to 2ⁿ different values, and that's it.

The problem with decimals

While  it's relatively easy to grasp the concept of infinity by looking at  "how high can I count?", it's less intuitive to notice that there is the same amount of numbers between 0 and 1 as there are integers.

So,  if we have a finite number of possible values, how do we decide which  ones make the cut when talking decimal parts? The smallest? The most  common? Again, as a stupid example, on 8 bits:

  • maybe we need 0.01 ... 0.99 because we're doing accounting stuff
  • maybe we need 0.015, 0.025,..., 0.995 for rounding reasons
  • We'll just encode the numeric part on 8 bits ( 0 - 255 ), and the decimal part as above

But that's already  99+99 values taken up. That leaves us 57 possible values for the rest of infinity. And that's not even mentionning the totally arbitrary nature of the  selection. This way of representing numbers is historically the first  one and is called "fixed" representation. There are many ways of  choosing how the decimal part behaves and a lot of headache when coding  how the simple operations work, not to mention the complex ones like  square roots and powers and logs.

Floats (IEEE 754)

To  make it simple for chips that perform the actual calculations, floating  point numbers (that's their name) have been defined using two  parameters:

  • an integer n
  • a power (of base b) p

Such that we can have n x bᵖ, for instance 15.3865 is 153863 x 10^(-4). The question is, how many bits can we use for the n and how many for the p.

The standard is to use 1 bit for the sign (+ or -), 23 bits for n, 8 for p, which use 32 bits total (we like powers of two), and using base 2, and n is actually 1.n.  That gives us a range of ~8 million values, and powers of 2 from -126  to +127 due to some special cases like infinity and NotANumber (NaN).

$$(-1~or~1)(2^{[-126...127]})(1.[one~of~the~8~million~values])$$

In theory, we have numbers from -10⁴⁵ to 1038 roughly, but some numbers can't be represented in that form. For  instance, if we look at the largest number smaller than 1, it's 0.9999999404. Anything between that and 1 has to be rounded. Again, infinity can't be represented by a finite number of bits.

Doubles

The  floats allow for "easy" calculus (by the computer at least) and are  "good enough" with a precision of 7.2 decimal places on average. So when  we needed more precision, someone said "hey, let's use 64 bits instead  of 32!". The only thing that changes is that n now uses 52 bits and p 11 bits.

Coincidentally, double has more a meaning of double size than double precision, even though the number of decimal places does jump to 15.9 on average.

We  still have 2³² more values to play with, and that does fill some  annoying gaps in the infinity, but not all. Famously (and annoyingly),  0.1 doesn't work in any precision size because of the base 2. In 32 bits  float, it's stored as 0.100000001490116119384765625, like this:

(1)(2⁻⁴)(1.600000023841858)

Conversely, after double size (aka doubles), we have quadruple size (aka quads), with 15 and 112 bits, for a total of 128 bits.

Back to our problem

Our value is 9999999999999999.0. The closest possible value encodable in double size floating point is actually 10000000000000000, which should now make some kind of sense. It is confirmed by Swift when separating the two sides of the calculus, too:

2> let a = 9999999999999999.0
a: Double = 10000000000000000

Our  big brain so good at maths knows that there is a difference between  these two values, and so does the computer. It's just that using  doubles, it can't store it. Using floats, a will be rounded to 10000000272564224 which isn't exactly better. Quads aren't used regularly yet, so no luck there.

It's  funny because this is an operation that we puny humans can do very  easily, even those humans who say they suck at maths, and yet those  touted computers with their billions of math operations per second can't  work it out. Fair enough.

The kicker is, there is a litteral infinity of examples such as this one, because trying to represent infinity in a finite number of digits is impossible.


A Practical Use For DictionaryCoding

Those of you who read the previous blog know that I'm a huge fan of OpenWhisk. It's one of those neat new-ish ways to do server side stuff without having to deal with the server too much.

Quick Primer on OpenWhisk

The idea behind serverless technologies is to encapsulate code logic in actions: each action can be written in any language, take a json as a parameter, and spit out a json as results. You can chain them, you have mechanisms to trigger them via cron jobs, urls, or through state modifications.

The example I used back in the day was a scraper that sent a mail if and when a certain page had changed:

cron triggered a lookup, lookup was written in Swift, checking HTTP code and content, passing its findings along an action written in Python that would compare contents if needed to its cache data stored in an ElasticSearch stack (yes I like to complicate things), before passing the data needed for a mail to be sent to a PHP action that would send the mail if needed.

It's obviously stupidly convoluted but it highlights the main advantage of using actions: you write them in the language and with the frameworks that suit your needs the best.

One other cool feature of OW is that when you don't use the actions for a while they are automatically spooled down, saving on resources.

OpenWhisk for iOS

There is a library that allows iOS applications to call the actions directly from your code, passing dictionaries and retrieving dictionaries. It's dead simple to use, and provides instant access to the serverside logic without having to go through the messy business of exposing routes, transcoding data in and out of JSON and managing weird edge cases.

For the purpose of a personal project, I am encapsulating access to  some Machine Learning stuff on a server through actions that will query databases and models to spit out useful data for an app to use.

I'll simplify for the purpose of this post, but I need to call an action that takes a score (or a level) and responds with a user that is "close enough" to be a good partner:

struct MatchData : Codable {
  let name : String
  let level : Int
  let bridgeID : UUID?
}

My action is called match/player, and my call in the code looks like this:

do {
  try WLAppDelegate.shared.whisk.invokeAction(
    qualifiedName: "match/player", 
    parameters: p as AnyObject, 
    hasResult: true, 
    callback: { (result, error) in
      if let error = error {
        // deal with it [1]
      } else if let r = result?["result"] as? [String:Any] {
        // deal with it [2]
      } else if let aid = result?["activationId"] as? String {
        // deal with it [3]
      }
    })
} catch {
  // deal with it [4]
}

A little bit of explanation might be required at this point:

  • [1] is where you would deal with an error that's either internal to OW or your action
  • [2] is the probably standard case. Your action has run, and the result is available
  • [3] is a mechanism that's specific to long running actions: if it takes too long to spool your action up, run it and get the result back, OW will not block you for too long, and give you an ID to query later for the result when it is available. Timeouts both for responding and for a maximum action running time can be tweaked in OW's configuration. Your app logic should accomodate for that, which you are doing already anyways because you don't always have connectivity, right?
  • [4] is about iOS side errors, like networking issues, certificate issues and the like

That Whole JSON/Dictionary/Codable Thing

So, OW deals with JSONs, which are fine-ish when you have a ton of processing power and don't mind wasting cycles looking up keys and values. The iOS client translates them to [String:Any] dictionaries which are a little bit better but not that much.

In my code I don't really want to deal with missing keys or type mismatches, so I map the result as fast as I can to the actual struct/class:

if let r = result?["result"] as? [String:Any], let d = try? DictionaryCoding().decode(MatchData.self, from: r) {
  // yay I have d.name, d.level and d.bridgeID 
  // instead of those pesky dictionaries
}

Of course, the downside is that decoders are called twice but:

  • DictionaryCoding is blistering fast
  • As long as I pass this object around, I will never have to check keys and values again, which is a huge boost in performance

Why?

My front facing actions are all written in swift, using the same Codable structures as the ones I expect in the app. That way, I can focus on the logic rather than route and coding shenanigans. OpenWhisk scales with activity from the app, and the app doesn't need a complicated networking layer.

My backend actions responsible for all the heavy lifting are written in whatever I need for that task. Python/TensorFlow, for me, but your mileage may vary.