Some More Adventures in Swiftland

Last summer, I tried to learn new tricks and got some small success. Because the pace at which Swift 3 is changing finally settled a bit, I decided to finish what I set out to do and make SwiftyDB both up to date with the latest version and working on Linux.

The beginning

In August, SwiftyDB was working fine on MacOS but, while it compiled fine, it didn’t on Linux for a variety of reasons.

Swift was flimsy on that platform. The thing “worked”, but caused weird errors, had strange dependancies and was severely lacking stuff in the Foundation library. The version I had then crashed all the time, but for random and different reasons, so I decided to wait till it stabilized.

With Swift 3.0.2 and the announcement that Kitura was to become one of the pillars of the official built-in server apis (called it, by the way), I figured it was time to end the migration.

The problem

The main issue is that swift for Linux lacks basic things Foundation on the Mac has. I mean, it doesn’t even have NSDate’s timeIntervalSinceReferenceDate… But beyond that, the port lacks something that is truly important for the kind of framework that SwiftyDB is : introspection.

The typechecker is the absolute core of Swift. Don’t get me wrong, it’s great. It forces people to mind the type of the data they are manipulating, and throws errors early rather than late. But it comes at a cost : the compiler does all kinds of crazy operations to try to guess the type and too often for my tastes, fails miserably. If you’ve ever seen “IDE internal error” dialogs in XCode, that’s probably the reason.

But even if it worked perfectly, data manipulation that’s needed to get rows in and out of the db requires either to work with formless data (JSON) or to have a way to coerce and map types at run time. And boy, swift doesn’t like that at all.

So, SwiftyDB handles it in a hybrid way, passing along dictionaries of type [String : Any] and suchlikes. It’s kind of gross, to be honest, but that’s the only way this is going to work.

But that was kind of solved in August, albeit in a crash-prone way. What was the issue this time?

The swift team made huge strides to unify MacOS and Linux from an API point of view. If you read the doc, it “just works”, more or less. And that’s true, except for one tiny little thing : toll-free bridging.

Data type conversion

Swift, as ObjectiveC before it, deals with legacy stuff by having a toll-free bridging mechanism. Basically, to the compiler, NSString and String are interchangeable, and will use the definition (and the methods) it needs based on the line it’s at, rather than as a static type.

Something you surely know if you’ve done any kind of object oriented programming, typecasting is hard. If String inherits from NSString, I can use an object of the former type in any place I would have to use the latter. Think of the relationship between a square and a rectangle. The square is a rectangle, but the rectangle isn’t necessarily a square. It’s an asymmetrical relationship. And you can’t make it work by also having NSString inheriting from String, because that’s not allowed for a lot of complicated reasons, but with effects you can probably guess.

So, how does this toll-free bridging work? By cheating. But that’s neither here nor there. The fact is that it works just fine on MacOS, and not on Linux.

A solution

The easiest way to solve that is to have constructors in both classes that take the other as a parameter. And that’s the way it’s solved on Linux. True it’s a bit inelegant, and negates most of the “pure sexiness” that swift is supposed to have, but what are you gonna do? This, after all, is still a science.

Once those extensions are in place, as well as a few replacement additions to the stock Foundation (such as the infamous timeIntervalSinceReferenceDate), and a few tweaks to the way the system finds the SQLite headers, everything finally works.

Use it

As I said before it’s mostly an intellectual exercise, and a way to see if I could some day write some serverside stuff, but in the meantime it works just fine and you can find it here. Feel free to submit pull requests and stuff, but as it is, it works as intended : swift objects to sqlite storage and vice versa.

As usual, if you want to use it as a swift package, just use:

.Package(url: "https://github.com/krugazor/swiftydb", majorVersion: 1, minor: 2)

  

Web Services And Data Exchange

I may not write web code for a living (not much backend certainly and definitely no front-end stuff, as you can see around here), but interacting with webservices to the point of sometimes having to “fix” or “enhance” them? Often enough to have an Opinion.

There is a very strong divide between web development and more “traditional” heavy client/app development : most of the time, I tell people I write code for that these are two very distinct ways of looking at code in general, and user interaction in particular. I have strong reservations about the current way webapps are rendered and interacted with on my screen, but I cannot deny the visual and overall usage quality of some of them. When I look at what is involved in displaying that blog in my browser window, from the server resources that it takes to the hundreds of megabytes of RAM to run a couple paltry JS scripts in the window, the dinosaur that I am reels back in disgust, I won’t deny it.

But I also tell my students that you use the tool that’s right for the job, and I am not blind : it works, and it works well enough for the majority of the people out there.

I just happen to be performance-minded and nothing about the standard mysql-php-http-html-css-javascript standard pipeline of delivering stuff to my eyeballs is exactly doing that. Sure, individually, these nodes have come a long long way, but as soon as you start passing data along the chain, you stack up transformation and iteration penalties very quickly.

The point

It so happens that I wanted to do a prototype involving displaying isothermic-like areas on a map, completely dynamic, and based on roughly 10k points whenever you move the camera a bit in regards to the map you’re looking at.

Basically, 10 000 x 3 numbers (latitude, longitude, and temperature) would transit from a DB to a map on a cell phone every time you moved the map by a significant factor. The web rendering on the phone was quickly abandoned, as you can imagine. So web service it is.

Because I’m not a web developer, and fairly lazy to boot, I went with something that even I could manage writing in : Silex (I was briefly tempted by Kitura but it’s not yet ready for production when involved with huge databases).

Everyone told me since forever that SOAP (and XML) was too verbose and resource intensive to use. It’s true. I kinda like the built-in capability for data verification though. But never you mind, I went with JSON like everyone else.

JSON is kind of anathema to me. It represents everything someone who’s not a developer thinks data should look like :

  • there are 4 types that cover everything (dictionary, array, float, and string)
  • it’s human readable
  • it’s compact enough
  • it’s text

The 4 types thing, combined with the lack of metadata means that it’s not a big deal to any of the pieces in the chain to swap between 1, 1.000, “1”, and “1.000”, which, to a computer, is 3 very distinct types with hugely different behaviors.

But in practice, for my needs, it meant that my decimal numbers, like, say, a latitude of 48.8616138, gobbles up a magnificent 10 bytes of data, instead of 4 (8 if you’re using doubles). That’s only the start. Because of the structure of the JSON, you must have colons and commas and quotes and keys. So for 3 floats (12 bytes – 24 bytes for doubles), I must use :

{lat:48.8616138,lng:2.4442788,w:0.7653901}

That’s the shortest possible form – and not really human readable anymore when you have 10k of those -, and it takes 41 bytes. That’s almost four times as much.

Furthermore

Well, for starters, the (mostly correct) assumption that if you have a web browser currently capable of loading a URL, you probably have the necessary bandwidth to load the thing – or at least that the user will understand page load times – fails miserably on a mobile app, where you have battery and cell coverage issues to deal with.

But even putting that aside, the JSON decoding of such a large datasets was using 35% of my cpu cycles. Four times the size, plus a 35% performance penalty?

Most people who write webservices don’t have datasets large enough to really understand the cost of transcoding data. The server has a 4×2.8Ghz CPU with gazillions of bytes in RAM, and it doesn’t really impact them, unless they do specific tests.

At this point, I was longingly looking at my old way of running CGI stuff in C when I discovered the pack() function in PHP. Yep. Binary packing. “Normal” sized data.

Conclusion

Because of the large datasets and somewhat complex queries I run, I use PostgreSQL rather than MySQL or its infinite variants. It’s not as hip, but it’s rock-solid and predictable. I get my data in binary form. And my app now runs at twice the speed, without any optimization on the mobile app side (yet).

It’s not that JSON and other text formats are bad in themselves, just that they use a ton more resources than I’m ready to spend on “just” getting 30k numbers. For the other endpoints (like authentication and submitting new data), where performance isn’t really critical, they make sense. As much sense as possible given their natural handicap anyways.

But using the right tool for the right job means it goes both ways. I am totally willing to simplify backend-development and make it more easily maintainable. But computers work the same way they have always done. Having 8 layers of interpretation between your code and the CPU may be acceptable sometimes but remember that the olden ways of doing computer stuff, in binary, hex, etc, also provide a way to fairly easily improve performance : less layers, less transcoding, more cpu cycles for things that actually matter to your users.

  

You Are Not Alone

Employers need their star programmers to be leaders – to help junior developers, review code, perform interviews, attend more meetings, and in many cases to help maintain the complex legacy software we helped build. All of this is eminently reasonable, but it comes, subtly, at the expense of our knowledge accumulation rate.

From Ben Northrop’s blog

It should come at no surprise that I am almost in complete agreement with Ben in his piece. If I may nuance it a bit though, it is on two separate points : the “career” and some of the knowledge that is decaying.

On the topic of career, my take as a 100% self-employed developer is of course different from Ben’s. The hiring process is subtly different, and the payment model highly divergent. I don’t stay relevant through certifications, but through “success rate”, and clients may like buzzwords, but ultimately, they only care of their idea becoming real with the least amount of time and effort poured in (for v1 at least). While reading up and staying current on the latest flash-in-the-pan as Ben puts it, allows someone like me to understand what it is the customer wants, it is by no means a requirement for a successful mission. In that, I must say I sympathize, but look at it in the same way I sometimes look at my students who get excited about an arcane new language. It’s a mixture of “this too shall pass” and “what makes it tick?”.

Ultimately, and that’s my second nuance to Ben’s essay, I think there aren’t many fundamentally different paradigms in programming either. You look at a new language, and put it in the procedural, object oriented, or functional categories (or a mixture of those). You note their resemblances and differences with things you know, and you kind of assume from the get-go that 90% of the time, should you need to, you could learn it very quickly. That’s the kind of knowledge you will probably never lose : the basic bricks of writing a program are fairly static, at least until we switch to quantum computing (ie not anytime soon). However fancy a language or a framework looks, the program you write ends up running roughly the same way your old programs used to. They add, shift, xor, bytes, they branch and jump, they read and store data.

To me that’s what makes an old programmer worth it, not that the general population will notice mind you : experience and sometimes math/theoretical CS education brought us the same kind of arcane knowledge that exists in every profession. You can kind of “feel” that an algorithm is wrong or how to orient your development before you start writing code. You have a good guesstimate, a good instinct, that keeps getting honed through the evolution of our field. And we stagnate only if we let it happen. Have I forgotten most of my ASM programming skills? sure. Can I read it still and understand what it does and how with much less of a sweat than people who’ve never even looked at low-level debugging? Mhm.

So, sure, it’ll take me a while to get the syntax of the new fad, as opposed to some new unencumbered mind. I am willing to bet though, that in the end, what I write will be more efficient. How much value does that have? Depends on the customer / manager. But with Moore’s law coming to a grinding halt, mobile development with its set of old (very old) constraints on the rise, and quantum computing pretty far on the horizon, I will keep on saying that what makes a good programmer good isn’t really how current they are, but what lessons they learn.

  

Blind Faith

“the most common software packages for fMRI analysis (SPM, FSL, AFNI) can result in false-positive rates of up to 70%”

(from PNAS)

“The death in May of Joshua Brown, 40, of Canton, Ohio, was the first known fatality in a vehicle being operated by computer systems.”

( from NYT)

It’s not that buggy software is a reality that strikes me, it’s that people think that either software is magic (for users) and therefore requires no attention, or that it will get patched soon™ enough (for “powerusers” and devs). The problem is that beta-testing, which is the new 1.0, shouldn’t be optional or amateur.

  
%d bloggers like this: