[Dev Diaries] URL Shortener Style Things
UUIDs are fine, but who doesn't like a decent String instead? It's shorter and it doesn't scare the non-programmers as much!
UUIDs are 128 bits, and I want to use a 64 characters map: a to z, A-Z, and 0-9, plus -
and +
to round it up.
64 possibilities is equivalent to 6 bits of data, and UUIDs are made of 16 bytes (8 bits of data). What that gives me is a way to split 16 bytes into 22 sixytes (yes I invented that word. So what?)
| 8 8 8 _ | 8 8 8 _ | 8 8 8 _ | 8 8 8 _ | 8 8 8 _ | 8
| 6 6 6 6 | 6 6 6 6 | 6 6 6 6 | 6 6 6 6 | 6 6 6 6 | 6 6
Why? Because 3x8 = 6x4, same number of bits in both.
Now, we redistribute the bits around (Xs are the bits fron the bytes, Ys are the bits from the sixytes):
XXXXXX|XX XXXX|XXXX XX|XXXXXX
YYYYYY|YY YYYY|YYYY YY|YYYYYY
With some shifts and some binary or, we're down from a 36 hexadecimal character string with dashes to a 22 character with a very low probability of punctuation. Of course if you want to disambiguate the symbols like O
and 0
, you can change the character map, as long as your charmap stays 64 items long.
extension UUID {
static let charmap =
["a","b","c","d","e","f","g","h","i","j","k","l","m","n",
"o","p","q","r","s","t","u","v","w","x","y","z",
"A","B","C","D","E","F","G","H","I","J","K","L","M","N",
"O","P","Q","R","S","T","U","V","W","X","Y","Z",
"0","1","2","3","4","5","6","7","8","9","-","+"]
static let charmapSet =
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-+"
var tinyWord : String {
let from = self.uuid
let bytes = [from.0, from.1, from.2,from.3,from.4,from.5,from.6,from.7,from.8,from.9,
from.10, from.11, from.12,from.13,from.14,from.15]
// split in 6-bits ints
var sbytes : [UInt8] = []
for i in 0..<5 {
let b1 = bytes[i*3]
let b2 = bytes[i*3+1]
let b3 = bytes[i*3+2]
let sb1 = b1 >> 2
let sb2 = (b1 & 0x03) << 4 | (b2 >> 4)
let sb3 = (b2 & 0x0f) << 2 | (b3 >> 6)
let sb4 = (b3 & 0x3f)
sbytes += [sb1,sb2,sb3,sb4]
}
// all done but the last byte
sbytes.append(bytes[15]>>2)
sbytes.append(bytes[15]&0x03)
var result = ""
for i in sbytes {
result += UUID.charmap[Int(i)]
}
return result
}
}
The reverse procedure is a bit longer, because we have to stage the values in groups of 4 sexytes for 3 bytes, and do a couple of sanity checks.
extension UUID {
init?(tinyWord: String) {
if tinyWord.count != 22 || !tinyWord.allSatisfy({ UUID.charmapSet.contains($0) }) { return nil }
var current : UInt8 = 0
var bytes : [UInt8] = []
for (n,c) in tinyWord.enumerated() {
guard let idx32 = UUID.charmap.firstIndex(of: String(c)) else { return nil }
let idx = UInt8(idx32)
if n >= 20 { // last byte
if n == 20 {
current = idx << 2
} else {
current |= idx
bytes.append(current)
}
} else if n % 4 == 0 { // first in cycle
current = idx << 2
} else if n % 4 == 1 { // combine
current |= idx >> 4
bytes.append(current)
current = (idx & 0xf) << 4
} else if n % 4 == 2 { // combine
current |= (idx >> 2)
bytes.append(current)
current = (idx & 0x3) << 6
} else {
current |= idx
bytes.append(current)
current = 0
}
}
// double check
if bytes.count != 16 { return nil }
self.init(uuid: (bytes[0], bytes[1], bytes[2], bytes[3], bytes[4], bytes[5], bytes[6], bytes[7], bytes[8], bytes[9],
bytes[10], bytes[11], bytes[12], bytes[13], bytes[14], bytes[15]))
}
}
Let's test this!
let u = UUID()
let w = u.tinyWord
print(u.uuidString+" : \(u.uuidString.count)")
print(w+" : \(w.count)")
print(UUID(tinyWord: w)!)
30A5CB6E-778F-4218-A333-3BC8B5A40B65 : 36
mkxlBNEpqHIJmZViTAqlzb : 22
30A5CB6E-778F-4218-A333-3BC8B5A40B65
Now I have a "password friendly" way to pass UUIDs around. Is it a waste of time (because I could just pass the UUIDs around, they aren't that much longer)? Who knows? It makes my shortened URLs a bit less intimidating 😁