These old formats we love to hate

When you write new tools for existing workflows, you generally have to handle a cryptic and “widely used” format, that generally is completely dumb. I know, I know, when all’s fresh and new, you design your file format, and you just can’t foretell all the future enhancements your users will want to have. But still, some pitfalls seem incredibly easy to avoid.

Today’s example : EDL (edit list). It’s a text format widely used in video processing that’s used to represent a sequence from a variety of files. When you google the format, it’s very hard to get a definite answer, but the EDLs I have here follow this pattern:

< (1)(4 chars)> < (2)(8 chars)> < (3)(5 chars)> < (4)(8 chars)> < (5)(11 chars)> < (6)(11 chars)> < (7)(11 chars)> < (8)(11 chars)>

next line…

where (1) is the number of the clip, (2) the is name of the source, (3) is some flags, apparently starting with “V”, (4) is some flags, apparently starting with”C”, (5) is the timecode for the source in point, (6) is the timecode for the source out point, (7) is the timecode for the destination in point, (8) is the timecode for the destination out point.

Now, a regular movie generally has 10 reels, each reel containing a few hundred shots. The 4 chars of the clip number allows for 9999 shots, which still seems adequate, but could be overflown in some cases. The 8-chars file name is from this old MSDOS habit we all learned to loath. Just imagine your average 3000 clips. If you want the clip number in the name of the file (which is generally a good habit if you want to get them all back), it leaves you 4 characters to actually name the file. 4! It would just be like calling every human being by their initials… I understand it was a filesystem restriction, but who could actually say under any circumstance that people will forever name their files like this? One simple solution would have been to separate fields with a tab and allow for any width… That way, if you can have 1000+characters in a file name, it’s up to the user to warn everyone running MSDOS that they won’t be able to use the full potential of the EDL… Who edits video under MSDOS anyway?

In the video world, there are so many such examples that my work is at least 40% understanding them and cursing. In one such format, there’s a text-block mechanism that is limited to 255 characters. I say, why not? How do you create blocks that are wider? You just use as many 255-blocks you want, and then give them the same ID code. How clever is that? sigh it just means that anone who doesn’t know this will figure it’s a bug and take the first or the last block…
Too often, a file format just reflects the memory structure the program is using. Your program runs on MSDOS? Then you have to have 8 characters names and 3 characters extensions, and that’s the end of it… Well… No. Further along, there might be hardware / software evolutions that will expand the possibilities. The computer has to adapt to humans, not the other way around… Just picture the people’s reaction if you draft a law saying that kids must have a first name that’s 10 characters long tops, a mandatory middle name (I don’t have one) that’s also 10 characters long max, and a family name truncated to 20 characters or less. One word : RIOT.

Then why do we have to suffer these restrictions in file formats? Because 20 years ago it was physically impossible to do better? My dear fellow developers, please take this pain away from the next generation… Let’s not give information a maximal size, because information is meant to grow…


Leave a Reply