Web Services And Data Exchange

I may not write web code for a living (not much backend certainly and definitely no front-end stuff, as you can see around here), but interacting with webservices to the point of sometimes having to “fix” or “enhance” them? Often enough to have an Opinion.

There is a very strong divide between web development and more “traditional” heavy client/app development : most of the time, I tell people I write code for that these are two very distinct ways of looking at code in general, and user interaction in particular. I have strong reservations about the current way webapps are rendered and interacted with on my screen, but I cannot deny the visual and overall usage quality of some of them. When I look at what is involved in displaying that blog in my browser window, from the server resources that it takes to the hundreds of megabytes of RAM to run a couple paltry JS scripts in the window, the dinosaur that I am reels back in disgust, I won’t deny it.

But I also tell my students that you use the tool that’s right for the job, and I am not blind : it works, and it works well enough for the majority of the people out there.

I just happen to be performance-minded and nothing about the standard mysql-php-http-html-css-javascript standard pipeline of delivering stuff to my eyeballs is exactly doing that. Sure, individually, these nodes have come a long long way, but as soon as you start passing data along the chain, you stack up transformation and iteration penalties very quickly.

The point

It so happens that I wanted to do a prototype involving displaying isothermic-like areas on a map, completely dynamic, and based on roughly 10k points whenever you move the camera a bit in regards to the map you’re looking at.

Basically, 10 000 x 3 numbers (latitude, longitude, and temperature) would transit from a DB to a map on a cell phone every time you moved the map by a significant factor. The web rendering on the phone was quickly abandoned, as you can imagine. So web service it is.

Because I’m not a web developer, and fairly lazy to boot, I went with something that even I could manage writing in : Silex (I was briefly tempted by Kitura but it’s not yet ready for production when involved with huge databases).

Everyone told me since forever that SOAP (and XML) was too verbose and resource intensive to use. It’s true. I kinda like the built-in capability for data verification though. But never you mind, I went with JSON like everyone else.

JSON is kind of anathema to me. It represents everything someone who’s not a developer thinks data should look like :

  • there are 4 types that cover everything (dictionary, array, float, and string)
  • it’s human readable
  • it’s compact enough
  • it’s text

The 4 types thing, combined with the lack of metadata means that it’s not a big deal to any of the pieces in the chain to swap between 1, 1.000, “1”, and “1.000”, which, to a computer, is 3 very distinct types with hugely different behaviors.

But in practice, for my needs, it meant that my decimal numbers, like, say, a latitude of 48.8616138, gobbles up a magnificent 10 bytes of data, instead of 4 (8 if you’re using doubles). That’s only the start. Because of the structure of the JSON, you must have colons and commas and quotes and keys. So for 3 floats (12 bytes – 24 bytes for doubles), I must use :

{lat:48.8616138,lng:2.4442788,w:0.7653901}

That’s the shortest possible form – and not really human readable anymore when you have 10k of those -, and it takes 41 bytes. That’s almost four times as much.

Furthermore

Well, for starters, the (mostly correct) assumption that if you have a web browser currently capable of loading a URL, you probably have the necessary bandwidth to load the thing – or at least that the user will understand page load times – fails miserably on a mobile app, where you have battery and cell coverage issues to deal with.

But even putting that aside, the JSON decoding of such a large datasets was using 35% of my cpu cycles. Four times the size, plus a 35% performance penalty?

Most people who write webservices don’t have datasets large enough to really understand the cost of transcoding data. The server has a 4×2.8Ghz CPU with gazillions of bytes in RAM, and it doesn’t really impact them, unless they do specific tests.

At this point, I was longingly looking at my old way of running CGI stuff in C when I discovered the pack() function in PHP. Yep. Binary packing. “Normal” sized data.

Conclusion

Because of the large datasets and somewhat complex queries I run, I use PostgreSQL rather than MySQL or its infinite variants. It’s not as hip, but it’s rock-solid and predictable. I get my data in binary form. And my app now runs at twice the speed, without any optimization on the mobile app side (yet).

It’s not that JSON and other text formats are bad in themselves, just that they use a ton more resources than I’m ready to spend on “just” getting 30k numbers. For the other endpoints (like authentication and submitting new data), where performance isn’t really critical, they make sense. As much sense as possible given their natural handicap anyways.

But using the right tool for the right job means it goes both ways. I am totally willing to simplify backend-development and make it more easily maintainable. But computers work the same way they have always done. Having 8 layers of interpretation between your code and the CPU may be acceptable sometimes but remember that the olden ways of doing computer stuff, in binary, hex, etc, also provide a way to fairly easily improve performance : less layers, less transcoding, more cpu cycles for things that actually matter to your users.

  

Time? Time? Who’s Got The Time?

As I sat down at my desk, with the clear objective to update WordPress here, I had a few revelations:

1/ I can’t update WordPress. The  PHP version on that server is too old. Half an hour wasted. And each time I try doing something with this corner of the World Wide Web, I realize it’s been almost ten years I have decided to do something worthwhile with it, and never quite found time enough to do so.

Of course, I want to project a decent image as a technophile (which I am), a cool guy (which I may be), and a somewhat successful business guy (results may vary). I also obviously want to be able to disclose information, in long form (as I’ve done here for a while, with sporadic energy), or in short form (mostly via Twitter). Combining that with information about what I could do for potential customers, what I have done for low-and-high-profile customers is something you have to do in the current Web2.0 bonanza, and something most companies, and even indies, have no trouble doing.

But I find myself facing a conundrum: I want complete control (so I need to be able to edit the pages, the CRM, and whatever else might be on the site), and I am utterly incompetent at deciding what’s best.

This is why, every time I decide to spend half a day looking around for technical solutions to my “needs”, I end up lost in unfamiliar territory. There are 3 dozens CRM systems, all requiring a boatload of dependancies. There is a plethora of CSS and whatnot design styles, none of which I could fine-tune if my life depended on it (and it doesn’t).

Sure, there are services that would do it all for me, but it would be worse, to my mind: I wouldn’t understand how it does what it does, which to any self-styled developer feels extremely weird.

At least I understand the underpinnings of wordpress pretty well. I can (and have) modified it to suit my needs. But now that this PHP requirement prevents me from upgrading, I will have to do another weird “let me play!” session. And the end result will probably be the same in the end, which is a little depressing.

Over the years, a bunch of people asked me why I had such a reluctance with web interface, and truth be told, I don’t know for sure. That might be the topic of another post sometime in the future.

Which brings us to…

2/ I have 5 blog drafts I have started and never finished. That’s right. 5 pieces of 500-1000 words I have written but was never satisfied of enough to hit the publish button.

Some geeky analysis of a book, a piece of reusable code I’m proud of, and a somewhat heartfelt eulogy that have been sitting in my (virtual) out-tray for 6 months.

I guess that speaks volumes about a definite flaw: When it concerns only me, I am not a closing kind of guy. Making prototypes and brainstorming commando-style? pschhhhhh easy. Spending 4h navigating ARM-assembly to find a bug that prevents some obscure piece of software from working? stroll in the park. Spending the necessary 4h for a personal project of mine to be finished and released? Apparently can’t be done… If you guys have any motivational tips and tricks for that, I’m all ears.

3/ However…

I am currently working on 2 projects that I really really want to see through. Unfortunately, they are time consuming, at roughly the same level as my paid gigs. That means every hour I’m working on these pet projects, I’m stealing from a customer. And given that I’m at ~120% capacity due mostly to “not enough pressuring the customers to give me what I need to move forward”, these stolen hours are just really hard to pull off.

But I’m still announcing that these two projects (an iOS application that interfaces with RT for bug/incident tracking, and a podcast with Dam and Marc) will see the light of day before the end of Q1-2012. Because we all know the end of the world at the end of aliens who think we are pathetic comes around the 2012 equinox or somesuch.

Happy holidays to everyone!