Skip to content

Conversation

@CascadingRadium
Copy link
Member

@CascadingRadium CascadingRadium commented Dec 26, 2025

  • Remove the obsolete faiss_SearchParametersIVF_new_with_sel API and consolidate eligible-vector handling through bitmap-based selectors.
  • Replace faiss_get_lists_for_keys with faiss_count_ivf_list_vectors, which efficiently iterates the eligible vector bitmap (IDSelectorBitmap) to count per-IVF-list vector membership, avoiding key materialization and enabling direct centroid eligibility computation in the storage layer.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR re-architects vector search by replacing key-based list retrieval with bitmap-based vector counting for IVF indexes. It removes the obsolete faiss_SearchParametersIVF_new_with_sel and faiss_get_lists_for_keys APIs, replacing them with count_ivf_list_vectors which efficiently counts vectors per IVF list using bitmap iteration instead of key materialization.

Key Changes:

  • New count_ivf_list_vectors method that iterates through IDSelectorBitmap to count vectors per IVF list
  • Removal of get_lists_for_keys and faiss_SearchParametersIVF_new_with_sel APIs
  • Type signature change for faiss_Search_closest_eligible_centroids from FaissIndex* to const FaissIndexIVF*

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.

File Description
faiss/IndexIVF.h Adds declaration and documentation for new count_ivf_list_vectors method; removes get_lists_for_keys declaration
faiss/IndexIVF.cpp Implements count_ivf_list_vectors with bitmap iteration logic; removes get_lists_for_keys implementation
c_api/IndexIVF_c_ex.h Updates C API: removes faiss_SearchParametersIVF_new_with_sel and faiss_get_lists_for_keys; adds faiss_count_ivf_list_vectors; updates faiss_Search_closest_eligible_centroids signature
c_api/IndexIVF_c_ex.cpp Implements C API wrapper for count_ivf_list_vectors; removes implementations of deleted functions; updates faiss_Search_closest_eligible_centroids to use const cast

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

FAISS_THROW_MSG("count_ivf_list_vectors supports only IDSelectorBitmap");
}
const uint8_t* bitmap = bitmap_sel->bitmap;
const size_t nbytes = bitmap_sel->n;
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a semantic mismatch between how the bitmap size is interpreted. According to IDSelectorBitmap's documentation and constructor (line 107 in IDSelector.h), the parameter 'n' represents "size of the bitmap array" (in bytes). However, on line 953, 'nbytes' is assigned from 'bitmap_sel->n', treating it as the number of bytes to iterate.

Looking at the is_member implementation in IDSelector.cpp line 120, it checks "if ((i >> 3) >= n)" which confirms that 'n' is indeed the number of bytes in the bitmap array. This means the implementation is correct.

However, there's potential confusion: if a caller constructs an IDSelectorBitmap with a bitmap sized for ntotal IDs (requiring ceil(ntotal/8) bytes), but accidentally passes a different value for n, this function would either under-iterate (missing valid IDs) or over-iterate (checking bits beyond the intended range). The safety check at line 966 mitigates over-iteration but doesn't detect under-iteration.

Consider adding an assertion or check that verifies: nbytes >= (ntotal + 7) / 8 to ensure the bitmap is large enough to cover all vectors in the index.

Suggested change
const size_t nbytes = bitmap_sel->n;
const size_t nbytes = bitmap_sel->n;
const size_t min_bytes = (ntotal + 7) / 8;
FAISS_ASSERT_MSG(
nbytes >= min_bytes,
"IDSelectorBitmap too small for index ntotal; bitmap size (bytes) "
"must be >= ceil(ntotal/8)");

Copilot uses AI. Check for mistakes.
* @param list_counts - Output array of size list_counts_size (must be == nlist).
* On return, list_counts[i] contains the number
* of selected vectors assigned to IVF list i.
* Must be initialized to zero by the caller.
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation comment for the 'list_counts' parameter states "Must be initialized to zero by the caller." However, the implementation uses the '++' operator (line 973 in IndexIVF.cpp) which increments the existing value. This means the assertion should verify that the array is zero-initialized, or the documentation should be updated to clarify that this is a requirement. Without verifying this precondition, if the caller passes an uninitialized or non-zero array, the results will be incorrect.

Suggested change
* Must be initialized to zero by the caller.
* Precondition: every element of list_counts must
* be initialized to 0 by the caller before
* calling this function; otherwise the resulting
* per-list counts will be incorrect.

Copilot uses AI. Check for mistakes.
Comment on lines +40 to +42
@param list_counts - Output array of size index->nlist.
On return, list_counts[i] contains the number
of selected vectors assigned to IVF list i.
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation states "Must be initialized to zero by the caller" but there is no corresponding note in the C API documentation. The C API documentation at lines 40-42 should also mention this requirement: "Output array of size index->nlist. Must be initialized to zero by the caller. On return, list_counts[i] contains the number of selected vectors assigned to IVF list i."

Suggested change
@param list_counts - Output array of size index->nlist.
On return, list_counts[i] contains the number
of selected vectors assigned to IVF list i.
@param list_counts - Output array of size index->nlist. Must be
initialized to zero by the caller. On return,
list_counts[i] contains the number of selected
vectors assigned to IVF list i.

Copilot uses AI. Check for mistakes.
*/
int faiss_Search_closest_eligible_centroids(
FaissIndex* index,
const FaissIndexIVF* index,
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function signature changed the index parameter from FaissIndex* to const FaissIndexIVF*. This is a breaking API change that will cause compilation errors for existing code that passes a FaissIndex*. While the change makes sense (the function specifically needs an IVF index and doesn't modify it), this should be clearly documented as a breaking change in the PR description or release notes to warn API users.

Suggested change
const FaissIndexIVF* index,
const FaissIndex* index,

Copilot uses AI. Check for mistakes.
Comment on lines +939 to 976
void IndexIVF::count_ivf_list_vectors(
idx_t* list_counts,
size_t list_counts_size,
const faiss::SearchParameters* params) const {
FAISS_ASSERT(list_counts != nullptr && list_counts_size > 0 && list_counts_size == nlist);
FAISS_ASSERT(params != nullptr && params->sel != nullptr);
FAISS_ASSERT(direct_map.type != DirectMap::NoMap);
const IDSelector* sel = params->sel;
// Optimized for bitmap selectors only
const IDSelectorBitmap* bitmap_sel = dynamic_cast<const IDSelectorBitmap*>(sel);
if (!bitmap_sel) {
FAISS_THROW_MSG("count_ivf_list_vectors supports only IDSelectorBitmap");
}
const uint8_t* bitmap = bitmap_sel->bitmap;
const size_t nbytes = bitmap_sel->n;
// Iterate over bitmap bytes
for (size_t byte_idx = 0; byte_idx < nbytes; ++byte_idx) {
uint8_t byte = bitmap[byte_idx];
if (byte == 0) {
continue; // fast skip
}
// Iterate over bits in the byte
for (uint8_t bit = 0; bit < 8; ++bit) {
if ((byte & (1 << bit)) == 0) {
continue;
}
idx_t id = (byte_idx << 3) + bit;
if (id >= ntotal) {
continue; // Safety check: skip invalid ids
}
uint64_t list_no = lo_listno(direct_map.get(id));
if (list_no >= nlist) {
continue; // Safety check: skip invalid list numbers
}
list_counts[list_no]++;
}
}
}
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new function count_ivf_list_vectors lacks test coverage. Given that the repository has comprehensive testing (e.g., test_lowlevel_ivf.cpp, test_ivf_index.cpp), this new functionality should be tested to ensure it correctly counts vectors per IVF list when using IDSelectorBitmap, handles edge cases (empty bitmaps, invalid IDs, etc.), and validates the preconditions (direct_map set, proper bitmap size, etc.).

Copilot uses AI. Check for mistakes.
idx_t* list_counts,
size_t list_counts_size,
const faiss::SearchParameters* params) const {
FAISS_ASSERT(list_counts != nullptr && list_counts_size > 0 && list_counts_size == nlist);
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertion combines multiple conditions with logical AND, making it harder to debug when it fails. If any of these conditions fails, the user won't know which specific condition caused the assertion. Consider splitting this into separate assertions with descriptive messages:

FAISS_ASSERT(list_counts != nullptr);
FAISS_ASSERT(list_counts_size > 0);
FAISS_ASSERT(list_counts_size == nlist);

Or at minimum, use FAISS_THROW_IF_NOT_MSG for better error messages.

Suggested change
FAISS_ASSERT(list_counts != nullptr && list_counts_size > 0 && list_counts_size == nlist);
FAISS_ASSERT(list_counts != nullptr);
FAISS_ASSERT(list_counts_size > 0);
FAISS_ASSERT(list_counts_size == nlist);

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants