Skip to content

Conversation

@tayheau
Copy link
Contributor

@tayheau tayheau commented Dec 18, 2025

So following #4251, a PR with a few new sampling methods for random_spikes_selection :

  • percentage cap
  • maximum_rate cap
  • temporal_bins sampling for a better temporal representation of the original population, i tried to fully vectorize it so the temporal bin sampling problem is dual with a subsorting problem.

I also added to possibility to get_segment_durations to set segment_indices to None.

@alejoe91 alejoe91 added the core Changes to core module label Dec 18, 2025
@tayheau tayheau marked this pull request as draft December 18, 2025 14:00
@tayheau tayheau marked this pull request as ready for review December 18, 2025 21:03
@alejoe91 alejoe91 requested a review from yger December 19, 2025 08:54
@samuelgarcia samuelgarcia added this to the 0.104.0 milestone Dec 19, 2025
@samuelgarcia
Copy link
Member

Hi @tayheau.
I am ok with this change, could you put in the code and docstring a bit more explanation on the strategy for this new "temporal_bins" method, please ?
I am not sure to follow myself entirely the logic

@tayheau
Copy link
Contributor Author

tayheau commented Dec 22, 2025

Hey @samuelgarcia, yeah of course, just to summarize here quickly:

So the method is temporal bin sampling and i expressed it as a sub sampling problem.

  • bin_index = unit_spikes["sample_index"] // bin_size_freq allow me to allocate a bin_index to every spike in the unit, but bin_index will reset for each segment since we use a concatenated spike vector.

  • group_values = np.stack((segment_index, bin_index), axis=1) allow me to create a (bin, segment) key that share the same indices as unit_spikes so that the the (bin, segment) key of the spike found at spikes[index] can be find at group_values[index].

  • _, group_keys = np.unique(group_values, return_inverse=True, axis=0) just allow me to have a single integer key to rather than a (bin, segment) key.

  • I then assign a random value (score) to each spike in the unit. Instead of making bins and do a random uniform distribution of k samples for each bins, we just go for each (bin, segment) subgroups and randomly sort your samples and take the k first. So now we are working with 3 main data :

    • spikes which is the concatenated spike vector of the current unit
    • a group_keys vector which assimilate a spike of a given index to a (bin, segment) group identified by an integer
    • a score vector which assimilate a spike of a given index to a random value
  • order = np.lexsort((score, group_keys)) is an array of indices that will allow us to order spikes first by group_keys and then by score - i.e to have a concatenated array of spikes of ascending score per group (so per bin per segment).

  • Our final goal is to take the first k_per_bin spikes of each groups. Which is then equivalent to do a unifrom random pick per temporal bin. To do so we then compute the relative ranks of each spike inside their own group based the previously computed order - we have then to compute each group starting point and length, respectivly group_start and counts). ranks would look like something like this [0,1,2,3,4,0,0,0,1,2,0,1,0,1,2,...]

  • We then apply a mask to ranks to select the top k_per_bin for each group - i.e for each bin per segment. And then we apply the resulting mask to the ordered spikes vector.

I m not sure if my explainations are very clear, that's why i prefer to write it here in a first time to have some feed back on it.

@tayheau
Copy link
Contributor Author

tayheau commented Dec 23, 2025

By the way @samuelgarcia, @alejoe91 , perhaps we can add the temporal_bins explanation to the doc and just reference to it in the docstring - so that we have a better layout and so readability. What do you think ?

@alejoe91
Copy link
Member

By the way @samuelgarcia, @alejoe91 , perhaps we can add the temporal_bins explanation to the doc and just reference to it in the docstring - so that we have a better layout and so readability. What do you think ?

Yes it definitely deserve some more explanation! :)

@alejoe91
Copy link
Member

I don't think we have a specific place yet, but maybe it's worth adding a Core Analyzer Extensions section here?

https://spikeinterface.readthedocs.io/en/latest/modules/core.html#sortinganalyzer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Changes to core module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants