That's a Nice Range! Can I Poke Holes In It?

Because of various Machine Learning and brain teaser problems I focused on, I have been developing for my own use an "extension" to the standard swift type Range that allowed for a very particular behavior: having holes.

TL;DR: Grab the code/package on the github repo

Why?!? 😳

I need this class for two scenarii (that's the plural of scenario, deal with it):

  • Being able to sample continuous data sets (eg "I want a value between 0 and 1, but not between 0.45 and 0.55 because the middle sucks")
  • Being able to reconcile continuous data sets (eg "This tells me it can be between 0 and 1, and this tells me it can be between 10 and 11")

In both instances, I ended up writing a lot of uninteresting and similar code, so I decided to put it all under a single umbrella.

A little bit of personal history and maths 🤕

Practictionners of the dart art known as maths entertain a love/hate relationship with "domains". It's one of these caveats in otherwise universal rules and theorems that restrict the use of all these wonderful tools, and both helps with proofs and hinders usage.

Some functions have a domain definition (eg 1/x exists for every value but x=0. No, infinity isn't the answer), some have a resulting domain (eg is always positive), and some functions have a really weird set of both.

The main problem with maths is that, for a lot of people, it isn't very... permissive, shall we say. There seems to be more "don't"s than "do"s, and the practical application of all these wonderful theorems seems out of reach.

But it's not true that maths is counter-intuitive. I am a firm believer that the "intelligence" we humans exhibit, as opposed to our genetic relatives on the tree of life, is strongly related to our brain's ability to do maths. If I have 4 apples today and eat one per day, I'd better find a supply for apples very shortly, because I won't have any apple left soon. That ability to extrapolate numbers and trends is what differentiates us from, say, our forager ancestors who went looking for food when they were hungry, and didn't plan much ahead.

It is my humble opinion that language exists because of our ability to do maths, and not the other way around. Of course, you might argue that advanced maths is in itself a new language, and quite disconnected from my need to find some apples for next week, but to me, it looks like parallel evolution rather than anything else.

Our brains are wired to count, extrapolate, share, split, combine, and calculate. The rest is a strategy to ensure that I have an infinite supply of apples. Maybe my offspring too, if they behave nicely.

Because of my special condition, maths isn't any less intuitive than driving or drawing. It's just a matter of practice.

The more immediate problem 🙄

Let's say I have a data set that tells me that a variable can be either negative or positive, but that its absolute value can't be beyond 10, and it can't be 0.

[-10;0[ ∪ ]0;10]

That's easy right? Every time I update that variable, I can always use something like:

v = min(10,max(-10,v)) // for constraints
if v == 0 { // ?

What if it's something more complicated? Something that exists between 0 and 1 or between 10 and 11, or between 100 and 101?

[0;1] ∪ [10;11] ∪ [100;101]

My list of min and max calls and if/else or switch statements will become very complicated very soon...

What if it's the intersection of two sets of ranges like the one above? Think marketing segmentation, where you are interested in age groups intersected over two different statistical analyses...

What if it's a large range excluding certain values or ranges?

Of course, because we are dealing with computers, we have the other problem to contend with: infinity does not exist

Implementation choices 🤓

In maths, compound ranges are a collection of open and closed ranges linked together by unions. In swift, Range and ClosedRange are distinct unrelated types. Having a collection that contains both is painful for standard operations like unions and merging and symmetric difference, because it implies changing the type of object depending on the operations:

]0;1[ ∩ [-1;0.5] = ]0;0.5]

(please note the changes in inclusion and exclusion of the bounds)

I decided to go with ClosedRange and a touch of excluded values to deal with open ranges. The reasoning behind that is that it also allows me to exclude single values easily, which is one of the main uses of this library.

Most of the standard operations are fairly easy to implement (adding or unioning two ranges is just adding a new item to the collection of ranges, removing ranges means changing the bounds, etc), but they often require an optimizitation pass, which is algorithmically complex: I need to merge overlapping ranges to keep the list as small as possible.

Intersection is pretty straightforward as well: you "just" remove the ranges that aren't common. It is implemented in two steps:

  • find the ranges that share values (overlaps)
  • shrink the result to the common values with min and max

The most interesting one (for me) is the symmetric difference of two ranges: elements that are in either range, but not in both (good old XOR).
The idea here is pretty similar: we have to find the intersection (the common elements) and remove them from the union (all the elements of both ranges).

The problem lies, of course, in the details. Because my Bound type conforms to Comparable and my range needs to satisfy lowerBound <= upperBound, I need to check how the ranges overlap. Is one containing the other? Is one straddling the bound of the other one? Draw it on a piece of paper and you'll see we have a lot of different scenarii (AGAIN!)

What's next? 😱

I don't know. This library works for my current purposes. I will probably continue to add stuff to it as time passes. I have in the back of my mind an idea of how to use this for a constraint solver engine. Maybe.

If you find it useful or want to add stuff to it, feel free!  I'm sure there are some bugs and edge cases I havent' taken into account. I tried to document it fully and provide the unit tests that I thought were appropriate, so if you decide to contribute, please carry on with those two good habits.