-
Notifications
You must be signed in to change notification settings - Fork 15
Description
I realized when investigating method chaining (e.g. sub_) in my other issue, a significant amount of time is spent in instantiating Array that contains 0, and very often, they may be intermediate result that never see the light of day, e.g.:
let v123 = v1 - v2 - v3 (i.e. result for v1 - v2)
The amount of time is much longer than using sub_() which won't create new array.
I did a quick benchmark (release):
benchmark(title: "array.create", num_trials: 10) {
let a = [Float](repeating: 0, count: len)
}
benchmark(title: "array.create faster", num_trials: 10) {
let p = UnsafeMutablePointer<Float>.allocate(capacity: len)
}
Results:
array.create: 9.340 ms
array.create faster: 0.005 ms
Thats just a great speed diff! I highly suspect one can do away with just the ptr for intermediate result, and wrap it in Array again when returned "outside" of BaseMath.
E.g. if you use pure Accelerate API where a dest ptr is specified, that dest ptr only need .allocate and zeroing isn't necessary.
Just a thought for possible optimizing. This library is looking very good. As an experiment, I am able to create my own library that depends BaseMath but would use Accelerate API in place of explicit pt loop, and I really like the concise clean syntax that look nice like swift code. The only think that bother me is the init that fill them with zero, which is not necessary for Accelerate (and probably elsewhere).