[Dev Diaries] URL Shortener Style Things

UUIDs are fine, but who doesn't like a decent String instead? It's shorter and it doesn't scare the non-programmers as much!

UUIDs are 128 bits, and I want to use a 64 characters map: a to z, A-Z, and 0-9, plus - and + to round it up.

64 possibilities is equivalent to 6 bits of data, and UUIDs are made of 16 bytes (8 bits of data). What that gives me is a way to split 16 bytes into 22 sixytes (yes I invented that word. So what?)

| 8 8 8 _ | 8 8 8 _ | 8 8 8 _ | 8 8 8 _ | 8 8 8 _ | 8
| 6 6 6 6 | 6 6 6 6 | 6 6 6 6 | 6 6 6 6 | 6 6 6 6 | 6 6

Why? Because 3x8 = 6x4, same number of bits in both.

Now, we redistribute the bits around (Xs are the bits fron the bytes, Ys are the bits from the sixytes):

XXXXXX|XX XXXX|XXXX XX|XXXXXX
YYYYYY|YY YYYY|YYYY YY|YYYYYY

With some shifts and some binary or, we're down from a 36 hexadecimal character string with dashes to a 22 character with a very low probability of punctuation. Of course if you want to disambiguate the symbols like O and 0, you can change the character map, as long as your charmap stays 64 items long.

extension UUID {
    static let charmap = 
["a","b","c","d","e","f","g","h","i","j","k","l","m","n",
"o","p","q","r","s","t","u","v","w","x","y","z",
"A","B","C","D","E","F","G","H","I","J","K","L","M","N",
"O","P","Q","R","S","T","U","V","W","X","Y","Z",
"0","1","2","3","4","5","6","7","8","9","-","+"]
    static let charmapSet =
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-+"

    var tinyWord : String {
        let from = self.uuid
        let bytes = [from.0, from.1, from.2,from.3,from.4,from.5,from.6,from.7,from.8,from.9,
                     from.10, from.11, from.12,from.13,from.14,from.15]
        
        // split in 6-bits ints
        var sbytes : [UInt8] = []
        for i in 0..<5 {
            let b1 = bytes[i*3]
            let b2 = bytes[i*3+1]
            let b3 = bytes[i*3+2]
            
            let sb1 = b1 >> 2
            let sb2 = (b1 & 0x03) << 4 | (b2 >> 4)
            let sb3 = (b2 & 0x0f) << 2 | (b3 >> 6)
            let sb4 = (b3 & 0x3f)
            sbytes += [sb1,sb2,sb3,sb4]
        }
        // all done but the last byte
        sbytes.append(bytes[15]>>2)
        sbytes.append(bytes[15]&0x03)
         
        var result = ""
        for i in sbytes {
            result += UUID.charmap[Int(i)]
        }
        
        return result
    }
}

The reverse procedure is a bit longer, because we have to stage the values in groups of 4 sexytes for 3 bytes, and do a couple of sanity checks.

extension UUID {
    init?(tinyWord: String) {
        if tinyWord.count != 22 || !tinyWord.allSatisfy({ UUID.charmapSet.contains($0) }) { return nil }
        var current : UInt8 = 0
        var bytes : [UInt8] = []
        for (n,c) in tinyWord.enumerated() {
            guard let idx32 = UUID.charmap.firstIndex(of: String(c)) else { return nil }
            let idx = UInt8(idx32)
            if n >= 20 { // last byte
                if n == 20 {
                    current = idx << 2
                } else {
                    current |= idx
                    bytes.append(current)
                }
            } else if n % 4 == 0 { // first in cycle
                current = idx << 2
            } else if n % 4 == 1 { // combine
                current |= idx >> 4
                bytes.append(current)
                current = (idx & 0xf) << 4
            } else if n % 4 == 2 { // combine
                current |= (idx >> 2)
                bytes.append(current)
                current = (idx & 0x3) << 6
            } else {
                current |= idx
                bytes.append(current)
                current = 0
            }
        }
        
        // double check
        if bytes.count != 16 { return nil }
        
        self.init(uuid: (bytes[0], bytes[1], bytes[2], bytes[3], bytes[4], bytes[5], bytes[6], bytes[7], bytes[8], bytes[9],
                         bytes[10], bytes[11], bytes[12], bytes[13], bytes[14], bytes[15]))
    }
}

Let's test this!

let u = UUID()
let w = u.tinyWord
print(u.uuidString+" : \(u.uuidString.count)")
print(w+" : \(w.count)")
print(UUID(tinyWord: w)!)
30A5CB6E-778F-4218-A333-3BC8B5A40B65 : 36
mkxlBNEpqHIJmZViTAqlzb : 22
30A5CB6E-778F-4218-A333-3BC8B5A40B65

Now I have a "password friendly" way to pass UUIDs around. Is it a waste of time (because I could just pass the UUIDs around, they aren't that much longer)? Who knows? It makes my shortened URLs a bit less intimidating 😁