Skip to content

Conversation

@selmanozleyen
Copy link
Member

@selmanozleyen selmanozleyen commented Nov 12, 2025

New numba kernels that speed up distances that are based on the mean.

Here are my local benchmark results. The pairwise_mean doesn't speed-up with numba much but it's more memory efficient so I will keep it that way. We can maybe get into discussion about why sklearn mean pairwise is faster in a topic. Here is the benchmark script: https://gist.github.com/selmanozleyen/f78e2f87661348615bad03485935fcf0

Update: with fastmath its faster


--- n_samples=500, n_features=50 ---
Edistance:
  pertpy:      0.98 ms ± 0.05
  sklearn:     4.00 ms ± 0.47
  Speedup: 4.1x
  Results match: True
MeanPairwiseDistance:
  pertpy:      0.42 ms ± 0.02
  sklearn:     0.61 ms ± 0.02
  Speedup: 1.5x
  Results match: True

--- n_samples=1000, n_features=50 ---
Edistance:
  pertpy:      3.44 ms ± 0.14
  sklearn:    13.74 ms ± 1.64
  Speedup: 4.0x
  Results match: True
MeanPairwiseDistance:
  pertpy:      1.90 ms ± 0.22
  sklearn:     2.88 ms ± 0.53
  Speedup: 1.5x
  Results match: True

--- n_samples=2000, n_features=50 ---
Edistance:
  pertpy:     13.52 ms ± 0.72
  sklearn:    50.16 ms ± 0.53
  Speedup: 3.7x
  Results match: True
MeanPairwiseDistance:
  pertpy:      6.25 ms ± 0.46
  sklearn:     9.44 ms ± 0.39
  Speedup: 1.5x
  Results match: True

--- n_samples=5000, n_features=50 ---
Edistance:
  pertpy:     82.86 ms ± 1.96
  sklearn:   368.52 ms ± 9.17
  Speedup: 4.4x
  Results match: True
MeanPairwiseDistance:
  pertpy:     38.79 ms ± 1.85
  sklearn:    66.18 ms ± 0.62
  Speedup: 1.7x
  Results match: True

@selmanozleyen selmanozleyen changed the title init Speedup For EDistance like Distances Nov 12, 2025
@selmanozleyen selmanozleyen changed the title Speedup For EDistance like Distances Speedup For EDistance-like distances Nov 12, 2025
@codecov-commenter
Copy link

codecov-commenter commented Nov 12, 2025

Codecov Report

❌ Patch coverage is 69.62963% with 41 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.88%. Comparing base (12897e1) to head (cbfac60).
⚠️ Report is 15 commits behind head on main.

Files with missing lines Patch % Lines
pertpy/tools/_distances/_distances.py 68.93% 41 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #880      +/-   ##
==========================================
- Coverage   73.54%   71.88%   -1.67%     
==========================================
  Files          48       48              
  Lines        5613     5733     +120     
==========================================
- Hits         4128     4121       -7     
- Misses       1485     1612     +127     
Files with missing lines Coverage Δ
pertpy/tools/_mixscape.py 85.29% <100.00%> (ø)
pertpy/tools/_distances/_distances.py 86.29% <68.93%> (-4.29%) ⬇️

... and 7 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Zethson Zethson marked this pull request as draft November 12, 2025 10:46
@selmanozleyen
Copy link
Member Author

Ok I found out why the test fail. I checked myself and when I directly run this cell on the main branch I get Distance.precompute_distances() got an unexpected keyword argument 'verbose'

import matplotlib.pyplot as plt
import numpy as np
import pertpy as pt
import scanpy as sc
from seaborn import clustermap
adata = pt.dt.distance_example()
obs_key = "perturbation"  # defines groups to test

Turns out the notebook was giving this invalid kwarg but it didn't fail because the older implementation would precompute the distances and it wouldn't hit.

if f"{self.obsm_key}_{self.cell_wise_metric}_predistances" not in adata.obsp:
     self.precompute_distances(adata, n_jobs=n_jobs, **kwargs)

while it hits in newer implementation because it doesn't precompute the whole distance matrices in this cell anymore

distance = pt.tl.Distance(metric="euclidean", obsm_key="X_pca")
df = distance.pairwise(adata, groupby=obs_key)

So https://github.com/scverse/pertpy-tutorials/blob/main/distances.ipynb would need updating. That's why I am against kwargs and even if they are used all of the items in there should be checked if they are being passed or not.

@selmanozleyen selmanozleyen marked this pull request as ready for review November 30, 2025 08:43
@Zethson
Copy link
Member

Zethson commented Nov 30, 2025

That's why I am against kwargs and even if they are used all of the items in there should be checked if they are being passed or not.

Yes, I agree with you. We can happily change that.

So https://github.com/scverse/pertpy-tutorials/blob/main/distances.ipynb would need updating.

Are you willing to make this change or do you want me to do it? Ideally, we'd include the update commit in this PR.

Thank you!

Copy link
Member

@Zethson Zethson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very very much for your work!

I really appreciate that you took the time to make the code much more explicit and clearer.

@Zethson Zethson merged commit c36eb42 into scverse:main Dec 11, 2025
17 checks passed
@selmanozleyen selmanozleyen deleted the speedup/edistance branch December 11, 2025 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants