[CoreData] Duplicating an object

As any of you knows, duplicating an object in coredata is just a nightmare : you basically have to start afresh each and every single time for each object, then iterate over attributes and relationships.

It so happens I have to do that often in one of my projects. I have to duplicate them except for a couple of attributes and relationships, and there are 20 of each on average (I didn’t come up with the model, OK?).

So, I came up with this code. Feel free to use it, just say hi in the comments, via mail, or any other way if you do!

@implementation NSManagedObject (Duplication)
+ (BOOL) duplicateAttributeValuesFrom:(NSManagedObject*)source To:(NSManagedObject*)dest ignoringKeys:(NSArray*)ignore {
    if(source == nil || dest == nil) return NO;
    if(![[source entity] isEqual:[dest entity]]) return NO;
 
    for(NSString *attribKey in [[[source entity] attributesByName] allKeys]) {
        if([ignore containsObject:attribKey]) continue;
 
        [dest setValue:[source valueForKey:attribKey] forKey:attribKey];
    }
 
    return YES;
}
 
+ (BOOL) duplicateRelationshipsFrom:(NSManagedObject*)source To:(NSManagedObject*)dest ignoringKeys:(NSArray*)ignore {
    if(source == nil || dest == nil) return NO;
    if(![[source entity] isEqual:[dest entity]]) return NO;
 
    NSDictionary *relationships = [[source entity] relationshipsByName];
    for(NSString *attribKey in [relationships allKeys]) {
        if([ignore containsObject:attribKey]) continue;
 
        if([((NSRelationshipDescription*)[relationships objectForKey:attribKey]) isToMany]) {
            [dest setValue:[NSSet setWithSet:[source valueForKey:attribKey]] forKey:attribKey];
 
        } else {
            [dest setValue:[source valueForKey:attribKey] forKey:attribKey];
        }
 
    }
 
    return YES;
}
 
@end
  

Who-What-When-Why

Now that I’m almost done with my despaghettification, and for reasons that will become obvious as you read on, I’m kind of wondering about teamwork.

I’ve had good and bad relationships, projects that made it through despite the huge odds (think “organizing events”) or projects that went down the drain pretty fast for unforeseen reasons. I think that despite the common asocial image associated with computer geeks, working with unforgiving machines makes us highly sensitive in some areas of human relationships. To wit, we have to follow a clear set of rules all day long, and it can be really hard to have an interaction in which the rules change constantly, or aren’t completely explainable. I think that deeply ingrained in our daily life is the assumption that given enough time and “debugging”, everything’s 100% understandable.

Most “normal” people, as some of my friends will call them, don’t think like that. They keep adding stuff to the rulebook, such as “when someone higher up in the hierarchy calls, I pick up the phone, no matter what”, or “you don’t say to your spouse that you’d like to go out with your buddies because you want a taste of normalcy once in a while”. True, it’s a primitive form of trial and error-by-being-punished, but they are stupid rules that should be replaced by more generic and sensible ones. You pick up the phone because they have some business with you (but should still be ready to say “Look I have work to do, can I call you back?”), or you behave in such a way that it is normal to spend some time apart from your spouse, and just as normal as it is to spend some time with him/her. Basically, you don’t spring up a new rule on somebody else because you don’t like them sprung on you.

Back to teamwork, I find it hard to work with people who have a different set of rules than mine, especially when we are trying to build something together. I’ve been kind of documenting this for a long while and am curious as to what you guys think about it.

To me, doing a task can always be resumed in a sentence such as this:

This person wants me to do this Nth step like this because of that

And in my decade or so of working, I’ve seen various people focus primarily on this or that part of the sentence. I find I have a better personal relationship with people who share my focus, but can work with them more easily if I “get” what their focus is. If I don’t understand what they are leaning towards, it’s a one-way ticket to hell.

Who

For obvious reasons, who’s doing the asking matters a lot. As a somewhat social species, we don’t feel like putting the same effort into a task for a personal friend or a complete stranger. It is unequivocally true in personal relationships (hence the word personal), but in professional tasks, the who is the gatekeeper, and should not be some kind of thermostat.

If your boss asks you do do something, and you agreed to do it, then it has no more value than if it’s a friend who does the asking, in my mind. The who deals with the accepting the task, not the realization of it.

When I have to deal with co-workers to whom the person doing the asking defines the whole of the effort put into the task, I shiver. Yes, this person is your biggest customer. Yes this person is your boss and could reassign you to toilet duty. Yes this client is the cutest person on Earth. But you have agreed to other tasks as well. It actually serves the relationship to state you’ll do your task the way you think is better, rather than the way this person thinks it should be done.

What

Each and every one of us has things they like doing, and things they hate doing. We tend to procrastinate on things we don’t like to do and perform admirably on things we love. And everything in between, mileage may vary. Same thing as the who, it’s natural and literally built-in our genes. We learn through painful mistakes not to put our fingers in the flames. It gives us most of the “one shot rules” I was talking about earlier.

But the thing is, it’s like the who of the previous section. It acts as a filter for accepting the task or not. Once you agreed to perform the task… well, it has to be done.

One of the people who was supposed to “help me” (aka “do it for me”) with my accounting and stuff was exactly like that. Give him a book on his favorite topic or strike a conversation about it, and absolutely no other work would be getting done. And I know I have myself that tendency that I try to overrule all the time: I like debugging and optimizing and looking at the tiny teenie details, rather than cutting up the PSDs to size. But hey, I accepted the task.

When

Each task has a history, and has a place in a process. Being a developer, most of the tasks I get assigned are the end-of-chain kind. There’s a reason why this tasks occurs now, has to be done by that time and is worded or constrained the way it is.

But rehashing about the timeline over and over and over again bores me. If I have to plug this into that, I do’t really care about all the steps that led to this particular form factor. I mean, I do care about it, but it’s a matter of perspective. I care about it to understand how to perform better, but if it’s not relevant, or I don’t have much latitude in my task, then… why bother.

Why

As you probably guessed by now, that’s what makes me tick, and it does make me tick a little bit too much sometimes. I want to understand why it works or should work in the way it does, because it makes me feel like I’ll write less code, or write code that will integrate better in the grand scheme of things.

The main trouble with focusing on that part of the task is that progress comes in discreet steps. There’s an inordinate amount of analysis and thought that goes in it before a step is taken, then another kind of long pause, etc… It can be hugely frustrating for people who work with me because they’d see nothing moving till all the pieces fit together, and then it all works, as if by magic. And even though I’d be able to explain in great details why I did things that way, we rarely have time for it. So it stays “a kind of magic” that was long-ish to come, and works in mysterious ways.

Kind of conclusion

Being a developer means interacting all day long with an alien way of doing things, and we’re kind of stuck between “regular” humans and unforgiving computers. That means that the human interaction part of work will mostly be evaluated in a similar way: how should I say/write this (syntax), so that the other person (compiler) does what I’d like them to (algorithm). Man, that sounds so nerdy, but I genuinely think that it’s somewhere deep in most work-related conversations I’ve had over the years.

And so, based on that experience and these thoughts, I realized that it takes quite a huge effort to find out what makes other people tick, but once it’s understood, it makes every interaction a whole lot easier. But if the focus keeps changing, the amount of time spent finding out the right “syntax” to talk with somebody about what has to be done becomes too long to be a useful use of time. That’s why some of my co-workers feel they can’t delegate and can’t rely on anyone to help them.

So my advice to anyone dealing with what seems to be an introverted geek, is to find out which part they are more used to dealing with (because they were educated that way or just like it) and make sure your translator is on!

  

[CoreData] Subtleties And Performance

It so happens I got a project to “polish up” that relies heavily on CoreData, and has some huge performance issues. I can’t say much about the project, but suffice to say that a “normal” account on the app has 130+ entities, and 250 000 records in the sqlite database, for a grand total of roughly 150MB in size.

Funnily enough, during the development phase, one of the developers asked directly some people at Apple if it would turn out to be a problem, and obviously they said no, not at all. It made most of the more seasoned developers I asked slapping their thighs and laugh.

The problem is basically twofold: on the one hand, the huge number of entities (and their relationships) makes any query nonatomic – it requires a lot of back-and-forth between the storage and the memory; on the other hand, the huge number of records makes most results huge.

So let’s take a few examples of things that should be foresighted.

Lots of individual requests with stuff like an ID

Not happening. Ever. You don’t do something like

NSMutableArray *results = [NSMutableArray arrayWithCapacity:[fetchedIDs count]];
for(NSNumber *interestingID in fetchedIDs) {
  NSFetchRequest *fr = [[NSFetchRequest alloc] init];
  [fr setEntity:[NSEntityDescription entityForName:@"Whatever" inManagedObjectContext:AppDelegate.managedObjectContext]];
  [fr setPredicate:[NSPredicate predicateWithFormat:@"objectID == %@", interestingID]];
  NSArray *localResults = [AppDelegate.managedObjectContext executeFetchRequest:[fr autorelease] error:nil];
  if(localResults.count > 0)
    [results addObjectsFromArray:localResults];
}

Why? because in the worst case scenario there are 2 on-disk search accesses for every object you get. One to find the correct row and then one (or a bunch, depending on Apple’s implementation) to de-fault (load most values in memory) the object. Besides, if you do that pretty much everywhere in your code, you end up actually bypassing any kind of cache Apple could have set up.

Either implement your own cache (“logical ID” < -> NSManagedObjectID, for instance), or batch fetch stuff.

Lots of indirections

Looking for something like company.employees.position == "Developer" to find all the companies that have at least one developer, is expensive (and doesn’t actually work as-is).

First things first: making it work. What do we want here? All the companies in which at least one of the employees’s position is a “Developer”.

Traditionally, this is done through a subquery. A subquery is a way to fraction your search with as little performance penalty as possible. Basically, you reduce part of a statement to a simple boolean. Here:

(subquery(employees, $x, $x.position == "Developer")).@count > 0

the subquery statement will iterate through employees, find the ones that have the “Developer” position, consolidate the results as an array, and give me a count. If there’s 1 or more, that statement is true.

An other way of saying the same thing with a more natural language would be:

ANY employee.position == "Developer"

which will do pretty much the same thing. Performance-wise, it feels like the first one is faster, but I guess it all depends on your model and the amount of records, your indexes, etc etc.

Optimize your model for the most common searches

Let’s pretend I have a bunch of products that have a few requirements each, each requirement having a key I’m looking for. Imagine the list of Apple hardware products over the years, each one having a list of internal equipments (some of which might be in several products, like a modem, for instance), each being available in some countries in the world, but not all.

Now let’s say that based on this database, you have an entry point by country, which displays the Apple products available (filtered obviously by something like “all the parts in it are available in this country”). every time you’ll open that list, you’ll make a complex query like

"SUBQUERY(parts, $x, SUBQUERY(countries, $y, $y == %@).@count > 0).@count == parts.@count", country (untested)

Just imagine the strain… for every computer, you have to list all the parts that are allowed in a particular country and check if the count is good. That means loading each and every object and relationship just to check if it’s available.

So maybe your model computer⇄part⇄country isn’t ideal after all, for all its simplicity.

Maybe you should’ve set a field with all the country codes in which a computer is available, updating it as you change the parts, (in the willSave callback), so that the query could be something like "computer.availableIn CONTAINS %@", @"#fr#" (if availableIn is a string like #fr#us#de#it, and is obviously indexed) or anything faster but with only one indirection.

Kind of conclusion

As with everything else in computer science, the quality of an algorithm unfortunately has to be measured with the worst case scenario in the back of the mind. It’s all good to see in small scale tests that the whole database can be loaded up in RAM, speeding things up a treat, but in real world situations, the worst case scenario is that you’ll have to access on-disk stuff all the time. And on mobile platforms, it’s a huge bottleneck. Also, the simulator is a tool that doesn’t simulate very well, apart from graphically. My iPhone doesn’t have 8 cores and 16GB of RAM. Running basic performance tests on the worst targeted device should be part of the development cycle.

  

The Bane Of Reality

Fiction is not enough. Apparently the masses want reality. The superheroes and master spies have to be explained and “fit in” the real world (by the way, thanks to the people who did Iron Sky, it was a breather of absurdity and laughter).

In software terms, it gave us (rolling drums) skeuomorphism, the art to mimick real objects to help us poor humans deal with apparently complex functions.

Last one to date, the podcast application from Apple, and it looks like a tape deck. Seriously. Man, I mastered the art of obscure VCR controls a long time ago… And now you want to simplify my life by analoguing a defunct technology?

Don’t get me wrong, I really think interfaces should be thought about and self-explanatory, but really? Who uses a binder these days? So what’s the point of the spirals on the left of your writing interface? I’ve actually never used an agenda that I recall, so why give me that faux-leather look?

Some ideas are not based in the real world, but they quickly become THE way to do it, like pull-for-refresh, for instance, or pinch to zoom in and out. What’s the real world equivalent of those? Do we need any equivalent?

I guess I’m not a natural target for software anyway: when I take a look at a program, I want to know what it does for me. Let’s say I want an app that gives me remote control of my coffee maker. That way, I’m heading back home after a tiring day, and I want a coffee that’s strong (more coffee in it) and has been finished 5 minutes before I get home (because coffee has to cool down a little bit). Do I want to drag and drop the number of spoons from one half of the screen to the next to simulate the amount to pour in? Do I want the same kind of clumsy timers-with-arrows that exists already on these machines? Nope.

But I want to know if the coffee maker can make me coffee (because it’s all washed up and ready to go), the amount of coffee left in the reservoir, as well as the water level, I want to set up the amount in as little movements as possible while being totally reliable and I want to be able to just say “ready 5 minutes before I’m in” and let the location manager deal with it (one man can dream, right?)

There is a history behind physical controls. Some designers, ergonomists, and engineers took the time to fine tune them for daily use (with mixed results), and the ones that stayed with us for 20 years or more stayed because people “got them”, not because they liked them or thought it was a good analogy to whatever they were using before. Thank goodness, we’re not driving cars with reins-analogues, or bicycle-horns-analogues.

It’s time to do the same with software. Until we have 3d-manipulation interfaces, we’re stuck in Flatland. And that means that any control that was built for grabbing with multiple fingers at several depths, is out (you hear me rotating dial analogue?).

If you want your users to feel comfortable with your software, make sure the function of it is clear to the intended audience. Then prettify it with the help of a computer designer. Different world, different job.

  

Happy Birthday Alan

Alan Turing is considered to be the father of computing (at least for those who don’t believe in mayan computers, secret alien infiltrations, or Atlantis). He would have turned 100 this year.

Computers are everywhere nowadays, and pretty much anyone can learn very quickly to use one. But you have to remember that up until the fifties, people were paid to do calculus. In the case of all the complicated operations for astronomical charts and stuff, the post of calculator was in high regard, and the fastest (and more accurate) one could name his price.

Machines have been around for a long time, but there was no adaptability to them: the intelligence was in the hand of the user. Complicated clockwork machinery could perform very delicate stuff, but not by themselves. And repurposing one of them to do something it wasn’t built for was close to impossible.

Basically that’s what Turing pioneered: a machine that could be repurposed (reprogrammed) all the time, and could modify its own behavior (program) based on its inputs.

Before Turing, what you had is an input -> tool -> output model for pretty much everything.
After him (and we can’t help but smile when seeing how pervasive these things are today — even my TV!), the model switched to input + memory -> tool -> output + modified memory (+ modified tool).

Meaning that two consecutive uses of the same tool might yield different results with the exact same inputs. Of course, it’s better if the modification is intentional and not the result of a bug, but he opened a whole new field of possibles.

So happy birthday Alan, and thanks for giving an outlet to my skills. If you hadn’t been around there would have precious few other ways for me to whore my faulty brain!

  

Research vs Development

It has been true throughout the history of “practical” science: there seems to be a very strong border between “pure” research (as in academia, among others) and “applied” or “empirical” research (what might arguably be inventing). I’m not sure where “innovation” fits on that scale, because it depends mostly on the goals of the person using that word.

But first, a disclaimer: I have a somewhat weird view of the field. My dad, even though he routinely dabs in practical things, loves discussing theory and ideas. My mom on the other hand expresses boredom rather quickly when we digress ad nauseam on such topics. Growing up, I started with a genuine love for maths and scratching my head over theoretical problems, sometimes forgetting to even eat before I solved one of my “puzzles”, then branched to a more practical approach when I started earning a living by writing code, before going back to pure research in biology and bio-computing, which ended badly for unrelated reasons, which led to a brute force pragmatism daily life for a while, which switched again when I started teaching both theory and practicalities of programming to my students, and now… well, I’m not exactly sure which I like most.

Writing code today is the closest thing I can think of to dabbling in physics back in the 17th century. You didn’t need a whole lot of formal education, you pretty much picked up on whatever you could grab from experience and the various articles and books from the people in your field, and submitted your theories and your inventions to some kind of public board. Some of it was government (or business) funded, to give a competitive advantage to your benefactors in military or commerce or “cultural glow” terms. Some of it came from enthusiasts who were doing other things in their spare time.

Some people would say the world was less connected back in those days, so the competition was less fierce, but the world was a lot smaller too. Most of the Asian world had peaked scientifically for religious, bureaucratic or plain self-delusional reasons, the American and African continents weren’t even on the scientific map, so the whole world was pretty much Europe and the Arab countries. Contrary to what most people I’ve had a chat about that period with think, communication was rather reliable and completely free, if a little slow. Any shoemaker could basically go “hey, I’ve invented this in my spare time, here’s the theory behind it and what I think it does or proves” and submit it to the scientific community. True, it would take a long time to get past snobbery, sometimes, but the discussion was at least relatively free. Kind of like the internet today.

Back in those days, the two driving forces behind research were competition (my idea is better than yours, or I was the first to figure it out) and reputation (which attracted money and power). Our human scientific giants did morally and ethically wrong sometimes (like Galileo grabbing the telescope to make a quick buck, albeit in order to finance his famous research, or Newton ruthlessly culling papers and conferences to stay in power at the head of the royal scientific society) but to my knowledge, they never intentionally prevented any kind of progress.

That’s where the comparison kind of falls short with today’s research and development. First of all, the gap between pure research and practical research has widened considerably. No one with less than 10 years of studying a particular field is going to be granted a research post. That’s both because the amount of knowledge required to build on all that we know is simply humongous and because pure research is notably underfunded. Then there is the practical development side, which has the same kind of educational problem: the systems we deal with are complex enough with a degree, so without one… And the amount of money and effort poured by companies into these projects simply can’t tolerate failure.

That’s obviously not to say that it doesn’t exist anymore, far from it. I’ve had the chance to spend some time with the people from the ILL, a research facility devoted to neutron physics, and wow. Just wow. And Obviously, from time to time we developers are involved in some cool new project that no one has done before (hush hush). But the entry barrier is a lot higher. I wouldn’t qualify for research, even though I almost started a PhD and am not entirely stupid, and however good reviews I am given on my work, I guess I’d still have to do R&D on my own before anyone gave me a big wad of bills to pay for a project of mine.

Getting back to the point, while academia hasn’t changed much it seems in the way it operates (but changed a lot on the hurdles to get through), the practical side of research has changed dramatically. Global markets means fiercer competition. In order to attract the good persons, a company has to pay them better than rivals, and in order to do that, they have to make more money per employee. But to make more money per employee, there has to be either very few rivals (monopoly) or a clear-cut quality advantage. The second strategy requires to attract the best people and to take more risks, while the first requires a better defense.

And that’s where the slant is, today: It’s actually a lot cheaper and less risky to work secretly on something new, slap a couple of patents on it to get a de facto monopoly and live off the dividends it will assuredly bring. That’s the reasoning, anyway.