The Engineer's Triangle

Fast, cheap, or good, pick two
(an unknown genius)

It's a well-known mantra in many fields, including - believe it or not - in the project manager's handbook. Except they don't like those trivial terms, so they use schedule, cost, scope, instead.

So, why do a lot of developers feel like this doesn't apply to their work? Is it because with the wonders of CI/CD, fast and cheap are a given, and good will eventually happen on its own? But enough of the rant, let's look at the innards of computers to see why you can't write a program that ignores the triangle either.

Fast

The performance of our CPUs have more or less plateaued. We can expand the number of cores, but by and large, a single process will not be done in half the time two years from now anymore, if the developer doesn't spend some time honing the performance. GPUs have a little more legroom, but in very specific areas, which are intrinsinctly linked to the number of cores. And the user won't (or maybe even can't) wait for a process for a few minutes anymore. Gotta shave those milliseconds, friend.

Cheap

In terms of CS, the cost of a program is about the resources it uses. Does running your program forbid any other process from doing anything at the same time? Does it use 4 GB of RAM just to sort the keys of a JSON file? Does it occupy 1TB on the drive? Does it max out the number of threads, opened files and sockets and ports that are available? Performance ain't just measured in units of time.

Good

This is about completude and completeness. Does your software handle gracefully all the edge cases? Does it crash under load? Does it destroy valuable user data? Does it succumb to a poor rounding error, or a size overflow? Is it safe?

Pick the right tool for the right job

And so, it's a very very very hard thing to get all three in a finite amount of time, especially in the kind of timescales we work under. Sometimes, it's even lucky if we get only one of those.

It's important to identify as soon as possible the cases you want to pursue:

  • Cheap and fast: almost nothing except maybe tools for perfectly mastered workflows (where the edge cases and the rounding errors are left to the user to worry about)
  • Fast and good: games, machine learning, scientific stuff
  • Good and cheap: pro tools (dev tools, design tools, 3d modelers, etc) where the user is informed enough to wait for a good result

[BETA] Fun With Combine

I'm an old fart, that's not in any way debatable. But being an old fart, I have done old things, like implementing a databus-like system in a program before. So when I saw Combine, I thought I'd have fun with re-implementing a databus with it.

First things first

Why would I need a databus?

If you've done some complex mobile programming, you will probably have passed notifications around to signal stuff from one leaf of your logic tree to another, something like a network event that went in the background and signalled its task was done, even though the view controller that spawned it has gone dead a long time ago.

Databuses solve that problem, in a way. You have a stream of "stuff", with multiple listeners on it that want to react to, say, the network went down or up, the user changed a crucial global setting, etc. And you also have multiple publishers that generate those events.

That's why we used to use notifications. Once fired, every observer would receive it, independantly of their place in the logic (or visual) tree.

The goal

I wanted to have a databus that could do two things:

  • allow someone to subscribe to certain events or all of them
  • allow to replay the events (for debug, log, or recovery purposes)

I also decided I wanted to have operators that reminded me of C++ for some reason.

The base

Of course, for replay purposes, you need a buffer, and for a buffer, you need a type (this is Swift after all)

public protocol Event {
    
}

public final class EventBus {
    fileprivate var eventBuffer : [Event] = []
    fileprivate var eventStream = PassthroughSubject<Event,Never>()

PassthroughSubject allows me to avoid implementing my own Publisher, and does what it says on the tin. It passes Event objects around, and fails Never.

Now, because I want to replay but not remember everything (old fart, remember), I decided to impose a maximum length to the replay buffer.

    public var bufferLength = 5 {
        didSet {
            truncate()
        }
    }
    
    fileprivate func truncate() {
        while eventBuffer.count > bufferLength {
            eventBuffer.remove(at: 0)
        }
    }

It's a standard FIFO, oldest in front latest at the back. I will just pile them in, and truncate when necessary.

Replaying is fairly easy: you just pick the last x elements of a certain type of Event and process them again. The only issue is reversing twice: once for the cutoff, and once of the replay itself. But since it's going to be seldom used, I figured it was not a big deal.

    public func replay<T>(count: UInt = UInt.max, handler: @escaping (T) -> Void) {
        var b = [T]()
        for e in eventBuffer.reversed() {
            if b.count >= count { break }
            if let e = e as? T {
                b.append(e)
            }
        }
        for e in b.reversed() {
            handler(e)
        }
    }

Yes I could have done it more efficiently, but part of the audience is new to swift (or any other kind) of programming, and, again, it's for demonstration purposes.

Sending an event

That's the easy part: you just make a new event and send it. It has to conform to the Event protocol , so there's that. Oh and I added the << operator.

    public func send(_ event: Event) {
        eventBuffer.append(event)
        truncate()
        eventStream.send(event)
    }
    static public func << (_ bus: EventBus, _ event: Event) {
        bus.send(event)
    }

From now on, I can do bus << myEvent and the event is propagated.

Receiving an event

I wanted to be able to filter at subscription time, so I used the stream transformator compactMap that works exactly like its Array counterpart: if the transformation result is nil, it's not included in the output. Oh and I added the >> operator.

    public func subscribe<T:Event>(_ handler: @escaping (T) -> Void) {
        eventStream.compactMap { $0 as? T }.sink(receiveValue: handler)
    }
    static public func >><T:Event> (_ bus: EventBus, handler: @escaping (T) -> Void) {
        bus.subscribe(handler)
    }

The idea is that you define what kind of event you want from the block's input, and Swift should (hopefully) infer what to do.

I can now write something like

bus >> { (e : EventSubType) in
    print("We haz receifed \(e)")
}

EventSubType implements the Event protocol, and the generic type is correctly inferred.

The End (?)

It was actually super simple to write and test (with very high volumes too), but I'm guessing there would be memory retention issues as I can't figure out a way to properly unsubscribe from the bus, especially if you have self references in the block.

Then again it's a beta and this is a toy sample. I will need to dig deeper in the memory management stuff, but at first glance, it looks like the lifetime of the blocks is exactly the lifetime of the bus, which makes it impractical in real cases. Fun stuff, though.

[Talk] Combine+DSLs=SwiftUI

This week I gave a talk at Cocoaheads in their traditional "Back From WWDC" session. The format is short - 10 minutes - and is supposed to be about something I learned during the conference.

Now, it's not a secret that, while I am fine with SwiftUI being a thing, I can't say I'm impressed or excited about it. I don't think it saves time (compared to other means of achieving UI in either code or with the graphical UI builder in Xcode), I don't think it improves performance, and I definitely don't find it fun to write HTML-CSS-like code to draw pixels on a screen. Buuuuuuut I will grant that it might help newcomers get over the "I can't do it it's different" mentality, and that can't be a bad thing.

However, in order for SwiftUI to exist, Apple had to add some really cool things in Swift that do have a lot of potential.

Combine

The new official face of pubsub in Swift is called Combine. It allows for a fairly performant and clear way to implement event-driven mechanics into your app. I won't get into details, but basically, it obsoletes a lot of things we used to do with other means that were never completely satisfactory:

  • Notifications: that was a baaaaaad way to move messages around. They are fairly inconsistent in terms of delivery, and cause a ton of leaks if you aren't careful. The second part might not be solved yet, but the first one definitely is.
  • Bindings: for us old farts who remember the way we used to do things (it was never ported to iOS). They were the undead of UI tricks on the Mac, and it's fairly certain this will put them out of their misery.
  • KVO: this was always very badly supported in Swift, it always felt hacky and weird, because it relied on the amorphous nature of objects in the Objective-C runtime. After several weird proposals to bring them back in some way in Swift, it seems like we have ourselves a winner.

All in all, while neither revolutionary nor extra-cool, Combine seems to be the final answer from Apple to the age-old question "how in hell do I monitor the changes of a variable?". React and other similar frameworks hacked something together that worked fairly well, and this framework seems to be the simplest and cleanest way to achieve the same thing.

DSLs

Now this gets me excited. We've all had to generate text in specific formats using string interpolation and concatenation. Codable takes care of JSON, and there are some clever XML and whatnot APIs out there.

But allowing for DSLs directly in the language allows people to avoid writing stuff like

"""
SELECT \(columns.sqlFormat) FROM \(table)
WHERE deleted = false AND \(query)
"""

And replace it with something like

Select(from: table) {
  Column(columns.sqlFormat)
  Where {
    table.delete == false && query
  }
}

How's that better?

  • Syntax checking is done at compile time. No more "awwwww I forgot to close the parenthesis!"
  • You can have control code ( if, for, etc... ) directly inside of your "query"
  • Type safety is enforceable and enforced
  • The generator can optimize the output on the fly (or minify, obfuscate, format, it)

Swift UI is just the beginning

The underpinning tech that makes SwiftUI tick is way cooler than SwiftUI itself. I can't wait to dig my teeth into it and spit out some really nifty tricks with it. And sorry for the image.

A Little Bit Of Math #1

Aaaaaah the math tricks! When you know 'em, you love 'em, and when you don't, you pay for extra computing resources.

Today's math trick has to do with averages. Averages are easy, right? you take all the numbers in the list, you sum them, and then you divide by the count... Pft, that's no trick!

A_{list} = \frac{\sum_{i=0}^{list.size - 1} list[i]}{list.size}

Except... there is a little something called overflow. Let's take the case of integers, and let's assume we're working with UInt8 objects. What's the average of [233,212]? It is 222.5 which gets rounded to 223. But our good'ol summation doesn't work:

 1> let v1 : UInt8 = 233
 2> let v2 : UInt8 = 212
 3> let sum = v1 + v2

EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)

Depending on who you ask, 233+212 either wraps around or causes an error. 255 is the maximum value, after which there is nothing. Either way, we wouldn't be happy with the go-around either: 233+212 = 190, which gives an average of 95 when divided by 2.

Musical Interlude : "Zino, dear, I don't care, I can use BIGGER numbers!"

Yes, you can, up to a point. Most languages have a maximum integer width, and sure, you can probably find unbounded implementations of integers for your language. In Swift, you can check this out, it's really nice. BUT while it's technically possible to handle arbitrary precision, you start hitting all sorts of issues with storing that data ("I love using blobs in my database"), converting it for practical use ("My users can remember 200+ digits numbers easyyyyyyyyy"), etc. Plus, you generally don't want to replace every single Int use in your code by something coming from an external dependency, with all the headache that implies, just for the sake of type safety.

Enter the Maths (royal fanfare ♫♩🎺)

Let's start with a basic observation:

v1+v2 = ({\frac{v1}{2} + \frac{v2}{2}})*2

If I divide by two, then multiply by two, I've done nothing. In the case of integers, it's not quite true, as the division will be rounded to the closest value, but for big numbers, it's not that bad. But what does that give us?

Well...

\frac{v1}{2} + \frac{v2}{2} \lt Int_{max}

The sum of the two halves will fit in an integer, because each is guaranteed to be smaller than half the maximum. Right? Then we can multiply the result by 2 to get the sum, maybe. But it might overflow. Good thing we are trying to get the average, because we were about to divide by two, which cancels out the multiplication.

A_{(233,212)} = ((\frac{233}{2}+\frac{212}{2})*2)/2 = \frac{233}{2}+\frac{212}{2}

Musical Interlude: "Mind => Blown"

Sure, we kind of lose some precision: 233/2 will be rounded to 117 so the average calculated will be 223, but it could easily have been rounded down at some point.

Anyways... Onward and upward! What can we do with a big list of numbers? We could use the same trick, and just divide wholesale. The major issue is that we would severely compound the rounding errors. Imagine we're still playing with UInt8 elements and you have 200 of them. Any of them divided by 200 would result in 0 or maybe 1. Your average wouldn't look very good.

Cue the Return of the Maths (royal fanfare ♫♩🎺)

{\sum_{i=0}^{list.size - 1} list[i]} = {\sum_{i=0}^{list.size - 2} list[i]} + list[list.size - 1]

(As in (x+y+z+t) = (x+y+z) + t)

  • Okay, and?
  • Let's divide by list.size, and we get the average
A_{list} = \frac{\sum_{i=0}^{list.size - 2} list[i] + list[list.size-1]}{list.size}

The top-left part looks familiar, it's almost as if it was the average of the list minus the last element... 😬

All we would need to do is to divide by list.size - 1... But if we multiply and divide by the same thing... 🤔

\frac{(list.size-1)*\frac{\sum_{i=0}^{list.size - 2} list[i]}{list.size-1} + list[list.size-1]}{list.size}

Which is

A_{list} = \frac{(list.size-1)*A_{list-last} + list[list.size-1]}{list.size}

Musical Interlude: Smells like recursion

So... The code will basically look like this:

  • If the list is empty (because we're good programmers and handle edge cases), the result is 0
  • If the list contains one element, the average is easy
  • If the list contains two elements, we can use the divide by two trick, the rounding error shouldn't be that bad
  • If the list contains more elements, we do the average by aggregate, and hope the rounding errors will be somewhat contained.

Side note on the rounding errors: the bigger the divider, the higher the rounding error (potentially). But by doing a rolling average, we have a rounding error that worsens as we go through the list rather than being bad at every step. It's not ideal, but it's still better.

So, let's set the stage up: I have a list of big numbers I want the average of.

2988139172152746883
4545331521850540616
5693938727954663282
5884889191787885217
3111881160526182838
8720326064806005009
8427311181199404053
7983003740783657027
2965909035096967706
1211883882534796072
5703029716464526164
8424273336993151821
774296368044414872
14130533330426236
2230589047337383318
8337015733785964014
9153431205551083918
3249272057022384528
8254667294021634003
6758234862357239854

They are all Int64 integers, which is the highest bit native signed variant available (Int128 has been coming since 2017). They come from a PostgreSQL database that stores big numbers for a very good reason I won't get into.

Now, if I plug these numbers into an unbounded calculator, the average should be 5221577691680052871.55 or so I'm told.

My recursive Swift function looks like this:

func sumMean(_ input: [Int64]) -> Int64 {
    if input.count == 0 { // uninteresting
        return 0
    }
    if input.count == 1 { // easy
        return input[0]
    }
    
    // general trick : divide by two (will introduce rounding errors)
    if input.count == 2 {
        let i1 = input[0] / 2
        let i2 = input[1] / 2
        let mean = (i1+i2) // (/2, then *2)
        return mean
    }
    
    let depth = Int64(input.count) - 1
    // rolling average formula
    let last = input.last!
    var rest = [Int64](input)
    rest = rest.dropLast()
    
    let restMean = sumMean(rest)
    // should be (depth * restMean + last) / depth+1, but overflow...
    let num = (restMean/2) + ((last/2)/depth)
    let res = (num / (depth+1)) * depth * 2
    return res
}

The reason for why num and res exist is left as an exercise.

Here's the calling code and the output:

var numbers : [Int64] = [
    2988139172152746883,
    4545331521850540616,
    5693938727954663282,
    5884889191787885217,
    3111881160526182838,
    8720326064806005009,
    8427311181199404053,
    7983003740783657027,
    2965909035096967706,
    1211883882534796072,
    5703029716464526164,
    8424273336993151821,
    774296368044414872,
    14130533330426236,
    2230589047337383318,
    8337015733785964014,
    9153431205551083918,
    3249272057022384528,
    8254667294021634003,
    6758234862357239854
]

print(sumMean(numbers))
5221577691680052740

As expected, we have rounding errors creeping in. This isn't the exact mean, but it's close enough: the difference is 131.55, which is a whopping 0.0000000000000025193534936693344360660675977627565% deviation.

As a side note, ordering matters:

  • unordered and sorted crescendo yield the same error
  • ordered reversed yields a 169.55 error margin

Given the scale, it's not a big deal, but keep in mind that this trick is only useful for fairly large numbers in a fairly large list, not for the extremes.

[Dev Diaries] Tasks in parallel

Context

Back in the days of the old blog, I had posted a way to manage my asynchronous tasks by grouping them and basically having the kernel of something akin to promises in other languages/frameworks.

It was mostly based on operation queues and locks, and basically handled only the equivalent of "guarantees" in promise parlance.

A few months ago, John Sundell published a task based system that tickled me greatly. I immediately proceeded to forget about it until I got an Urge again to optimize my K-Means implementation. I tweaked his code to use my terminology and avoid rewriting everything where I used the concurrency stuff, as well as added a bunch of things I needed. Without further ado, here is some code and some comments.

Core feature: perform on queue

First, an alias that is inherited from the past and facilitates some operations further down the line:

public typealias constantFunction = (() throws -> ())

Then the main meat of the system: the Task class and ancillary utilities. My code was already fairly close from Sundell's, so I mostly adopted his style.

public class Task {
    // MARK: Ancillary stuff
    public enum TaskResult {
        case success
        case failure(Error)
    }
    
    public struct TaskManager {
        fileprivate let queue : DispatchQueue
        fileprivate let handler : (TaskResult) -> Void
        
        func finish() {
            handler(.success)
        }
        
        func fail(with error: Error) {
            handler(.failure(error))
        }
    }
    public typealias TaskFunction = (TaskManager) -> Void
    
    //MARK: Init
    private let closure: TaskFunction
    
    public init( _ closure: @escaping TaskFunction) {
        self.closure = closure
    }
    
    public convenience init(_ f: @escaping constantFunction) {
        self.init { manager in
            do {
                try f()
                manager.finish()
            } catch {
                manager.fail(with: error)
            }
        }
    }
    
    //MARK: Core
    public func perform(on queue: DispatchQueue = .global(),
                 handler: @escaping (TaskResult) -> Void) {
        queue.async {
            let manager = TaskManager(
                queue: queue,
                handler: handler
            )
            
            self.closure(manager)
        }
    }
}

In order to understand the gist of it, I really recommend reading the article, but in essence, it's "just" something that executes a function, then signals the manager that the task is complete.

I added an initializer that allows me to write my code like this, for backwards compatibility and stylistic reasons:

Task {
    print("Hello")
}.perform { result in
    switch result {
        case .success: // do something
        case .failure(let err): // here too, probably
    }
}

It's important to note that the block passed to task must take no argument and return nothing. But it doesn't prevent it from doing block stuff:

var nt = 0
Task {
    nt += 42
    }.perform { result in
        switch result {
        case .success: print(nt)
        case .failure(let err): break // not probable
        }
}

Outputs: 42

Of course, the block can throw, in which case we'll end up in the .failure case.

Sequence stuff

The task sequencing mechanism wasn't of any particular interest to my project, but I decided to treat it the same way I did the parallel one. Sundell's code is perfectly fine, I just wanted operators to have some syntactic sugar:

//MARK: Sequential
// FROM: https://www.swiftbysundell.com/posts/task-based-concurrency-in-swift
// replaces "then"
infix operator •: MultiplicationPrecedence
extension Task {
    static func sequence(_ tasks: [Task]) -> Task {
        var index = 0
        
        func performNext(using controller: TaskManager) {
            guard index < tasks.count else {
                // We’ve reached the end of our array of tasks,
                // time to finish the sequence.
                controller.finish()
                return
            }
            
            let task = tasks[index]
            index += 1
            
            task.perform(on: controller.queue) { outcome in
                switch outcome {
                case .success:
                    performNext(using: controller)
                case .failure(let error):
                    // As soon as an error was occurred, we’ll
                    // fail the entire sequence.
                    controller.fail(with: error)
                }
            }
        }
        
        return Task(performNext)
    }
    
    // Task • Task
    static func •(_ t1:  Task, _ t2 : Task ) -> Task {
        return Task.sequence([t1,t2])
    }
}

The comments say it all: we take the tasks one by one, essentially having a Task(Task(Task(...))) system that handles failure gracefully. I wanted to have a operator because I like writing code like this:

(Task {
    print("Hello")
    } • Task {
        print("You")
    } • Task {
        print("Beautiful")
    } • Task {
        print("Syntax!")
    }
).perform { (_) in
        print("done")
}

Outputs:

Hello
You
Beautiful
Syntax!
done

Because of the structure of the project I'm using parallelism in, I tend to manipulate [Task] objects a lot, so I added an operator on the array manipulation as well:

// [Task...]••
postfix operator ••
extension Array where Element:Task {
    var sequenceTask : Task { return Task.sequence(self) }
    static postfix func ••(_ f: Array<Element>) -> Task {
        return f.sequenceTask
    }
}

This allows me to write code like this:

var tasks = [Task]()
for i in 1..<10 {
    tasks.append(Task {
        print(i)
    })
}
tasks••.perform { _ in
    // this space for rent
}

Outputs the numbers from 1 to 9 sequentially. It is, admittedly, a fairly useless feature to be able to create tasks in a loop that will execute one after the other, instead of "just" looping in a more regular fashion, but I tend to like symmetry, which leads me to the main meat of the code.

Parallelism

Similarly to the sequence way of doing things, Sundell's approach is pitch perfect, and much more efficient than my own, especially in regards to error handling, so I modified my code to follow his recommendations.

Before reading the code, there are two things you should be aware of:

  • DispatchGroup allows for aggregate synchronization of work. It's a fairly unknown tool that you should read about
  • Sundell's code did not include a mechanism for waiting for the group's completion. I included a DispatchSemaphore that optionally lets me wait for the group to be done ( nil by default, meaning I do not wait for completion with the syntactic sugar)
// MARK: Parallel
infix operator |: AdditionPrecedence
extension Task {
    // Replaces "enqueue"
    static func group(_ tasks: [Task], semaphore: DispatchSemaphore? = nil) -> Task {
        return Task { controller in
            let group = DispatchGroup()
            
            // From: https://www.swiftbysundell.com/posts/task-based-concurrency-in-swift
            // To avoid race conditions with errors, we set up a private
            // queue to sync all assignments to our error variable
            let errorSyncQueue = DispatchQueue(label: "Task.ErrorSync")
            var anyError: Error?
            
            for task in tasks {
                group.enter()
                
                // It’s important to make the sub-tasks execute
                // on the same DispatchQueue as the group, since
                // we might cause unexpected threading issues otherwise.
                task.perform(on: controller.queue) { outcome in
                    switch outcome {
                    case .success:
                        break
                    case .failure(let error):
                        errorSyncQueue.sync {
                            anyError = anyError ?? error
                        }
                    }
                    
                    group.leave()
                }
            }
            
            group.notify(queue: controller.queue) {
                if let error = anyError {
                    controller.fail(with: error)
                } else {
                    controller.finish()
                }
                if let semaphore = semaphore {
                    semaphore.signal()
                }
            }
        }
    }
    
    // Task | Task
    static func |(_ t1:  Task, _ t2 : Task ) -> Task {
        return Task.group([t1,t2])
    }
}

Just like with the sequential code, it allows me to write:

(Task {
    print("Hello")
    } | Task {
        print("You")
    } | Task {
        print("Beautiful")
    } | Task {
        print("Syntax!")
    }
).perform { (_) in
        print("done")
}

Note that even though the tasks are marked as being parallel, because of the way operators work, you end up grouping the tasks for parallel execution two by two, which is fairly useless in general. The above code outputs (sometimes):

Syntax!
Beautiful
Hello
You
done

This highlights the point I was making: the first "pair" to be executed is the 3 first tasks together, in parallel with the last one. Since the latter finishes early in comparison to a group of tasks, it is output first. But I included this operator for symmetry (and because I can).

Much more interestingly, grouping tasks in an array performs them all in parallel, and is the only way to have them work in a way that resembles the instinct you probably have regarding parallel tasks:

// [Task...]||
postfix operator ||
extension Array where Element:Task {
    var groupTask : Task { return Task.group(self) }
    static postfix func ||(_ f: Array<Element>) -> Task {
        return f.groupTask
    }
}

This allows to write:

var tasks = [Task]()
for i in 1..<10 {
    tasks.append(Task {
        // for simulation purposes
        usleep(UInt32.random(in: 100...500))
        print(i)
    })
}
tasks||.perform { _ in
    // this space for rent
}

This outputs the following text after the longest task is done:

2
5
1
8
3
4
6
7
9

I'd like to include a ternary operator to wait for the group to be finished, but it's not currently possible in swift (in the same way a n-ary operator is currently impossible). This means a fairly sad syntax:

infix operator ~~
extension Array where Element:Task {
    static func ~~(_ f: Array<Element>, _ s: DispatchSemaphore) -> Task {
        let g = Task.group(f,semaphore: s)
        return g
    }
}

The following test code works:

var tasks = [Task]()
let s = DispatchSemaphore(value: 0)
for i in 1..<10 {
    tasks.append(Task {
        usleep(UInt32.random(in: 100...5500))
        print(i)
    })
}
(tasks~~s).perform { _ in
    // this space for rent
}
s.wait()

Sadly, we now need to parenthesize tasks~~s, which is why I'm bothered. But at least my code can be synchronous or asynchronous, as needed.

One last thing

Because I played a lot with syntactic stuff and my algorithms, I decided to make a sort of meta function that handles a lot of things in one go:

  • it allows me to collect the output of the functions in an array
  • it works like a group
  • it is optionally synchronous
//MARK: Syntactic sugar
extension Task {
    static func parallel(handler: @escaping (([Any], Error?)->()), wait: Bool = false, functions: (() throws -> Any)...) {
        var group = [Task]()
        var result = [Any]()
        let lock = NSLock()
        for f in functions {
            let t = Task {
                let r = try f()
                lock.lock()
                result.append(r)
                lock.unlock()
            }
            group.append(t)
        }
        if !wait {
            group||.perform { (local) in
                switch local {
                case .success:
                    handler(result, nil)
                case .failure(let e):
                    handler(result,e)
                }
            }
        } else {
            let sem = DispatchSemaphore(value: 0)
            Task.group(group, semaphore: sem).perform { (local) in
                switch local {
                case .success:
                    handler(result, nil)
                case .failure(let e):
                    handler(result,e)
                }
            }
            sem.wait()
        }
    }
}

And it can be used like this:

var n = 0
Task.parallel(handler: { (result, error) in
    print(result)
    print(error?.localizedDescription ?? "no error")
}, functions: {
    throw TE()
}, {
    n += 1
    return n
}, {
    n += 1
    return n
}, {
    n += 1
    return n
}, {
    n += 1
    return n
},...
)

And it will output something like:

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 45, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 34, 46]
The operation couldn’t be completed.

Of course, you can wait for all the tasks to be complete by using the wait parameter:

Task.parallel(handler: { (result, error) in
    print(result)
    print(error)
}, wait: true, functions: {
    throw TE()
}, {
    n += 1
    return n
},...
)
Conclusion

Thanks to John Sundell's excellent write-up, I refactored my code and made it more efficient and fairly less convoluted than it was before.

I also abstained from using OperationQueue, which has some quirks on Linux, whereas this implementation works just fine.