fix: address code review findings for performance and quality #14

zircote · 2025-12-24T10:41:42Z

Summary

Performance improvements (PERF-001 through PERF-007): Batch git operations, batch embeddings, model prewarming, struct caching
Quality improvements (QUAL-001, QUAL-002): Function naming clarity, specific exception handling
Security improvements (SEC-001, SEC-002): Input length limits for ReDoS prevention
Test updates: All fixtures and assertions updated for new batch operations

Test plan

All 1806 tests passing
89.29% code coverage maintained
make quality passes (format, lint, typecheck, security)
Batch operations verified in test_sync.py
Exception handling verified in test_context_builder.py

🤖 Generated with Claude Code

CHANGES: - ServiceRegistry: Remove unused methods (register_factory, has, remove, reset_factories, get_registered_types) and unnecessary class variables - ServiceRegistry: Reduce from 235 lines to 120 lines (-48%) - Simplify CaptureService.get_default_service() to remove factory abstraction - Update SyncService.get_sync_service() to use simplified registry API RATIONALE: ServiceRegistry had premature optimization with unused factory registration pattern and close handlers that were never invoked. Kept the three methods that are actually used: get(), register(), and reset(). This improves clarity and maintainability while preserving all functionality. TESTING: - All 157 core service tests pass (capture, recall, sync) - Full test suite: 715/716 pass (1 pre-existing unrelated failure) - No behavior changes, functionality fully preserved

Performance improvements (PERF-001 through PERF-007): - Add show_notes_batch() for batch git notes retrieval using git cat-file --batch - Add embed_batch() for batch embedding generation - Add prewarm() method for eager model loading - Add @lru_cache for struct format caching in index.py - Implement batch hydration in recall.py using show_notes_batch() - Use batch operations in sync.py for index synchronization Quality improvements (QUAL-001, QUAL-002): - Rename _read_input to _read_input_with_fallback for clarity - Replace broad Exception catches with specific (MemoryIndexError, OSError) Security improvements (SEC-001, SEC-002): - Add input length limit in signal_detector.py for ReDoS prevention Test updates: - Update test fixtures to mock batch operations - Update test assertions for renamed functions and specific exceptions All 1806 tests passing with 89.29% coverage. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…ST-001, TEST-002) New test coverage: - tests/test_hook_utils.py: 59 tests for shared hook utilities - setup_timeout/cancel_timeout (SIGALRM handling) - validate_file_path (security validation) - read_json_input (input size limits) - get_hook_logger (logger caching) - setup_logging (debug/warning configuration) - tests/test_session_analyzer.py: 48 tests for session transcript analysis - parse_transcript (JSONL and plain text formats) - analyze/analyze_content (signal detection) - has_uncaptured_content (detection) - Novelty checking integration All 1913 tests passing. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…-003, DOC-005, DOC-008) README.md: - Add comprehensive API Reference section with services, models, operations - Document all environment variables with descriptions and defaults - Add Hook Configuration section with all hook-specific variables CLAUDE.md: - Add Code Intelligence (LSP) section with hook documentation - Add Navigation & Understanding guidance - Add Verification Workflow and Pre-Edit Checklist commands/*.md: - Add consistent --help support with man-page style output - Add help_check sections for early help detection Addresses: DOC-003 (API Reference), DOC-005 (environment variables), DOC-008 (Related Commands) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Copilot

Pull request overview

This pull request implements performance optimizations and quality improvements addressing code review findings. The changes introduce batch operations for git and embedding operations (PERF-001 through PERF-007), improve exception handling specificity (QUAL-001, QUAL-002), and add security safeguards against ReDoS attacks (SEC-001, SEC-002). A new ServiceRegistry class centralizes singleton management, replacing module-level global variables. All 1806 tests pass with 89.29% coverage maintained.

Key changes:

Batch git operations via git cat-file --batch reduce subprocess overhead when fetching multiple notes
Batch embedding generation processes multiple texts in one model call for better throughput
ServiceRegistry provides centralized singleton management with clean reset mechanism for testing

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
src/git_notes_memory/git_ops.py	Adds show_notes_batch() for bulk note retrieval via git cat-file; adds path sanitization to error messages
src/git_notes_memory/sync.py	Updates collect_notes(), reindex(), and verify_consistency() to use batch operations; migrates to ServiceRegistry
src/git_notes_memory/recall.py	Implements batch hydration with grouped git operations; optimizes token estimation with generator expression
src/git_notes_memory/index.py	Adds struct format caching for embedding serialization to avoid repeated format parsing
src/git_notes_memory/embedding.py	Adds prewarm() method for eager model loading; migrates to ServiceRegistry
src/git_notes_memory/capture.py	Extracts validation and front matter building into separate methods; migrates to ServiceRegistry
src/git_notes_memory/registry.py	New centralized singleton registry with type-safe get/register/reset methods
src/git_notes_memory/hooks/stop_handler.py	Renames _read_input to _read_input_with_fallback for clarity
src/git_notes_memory/hooks/signal_detector.py	Adds 100KB input length limit to prevent ReDoS attacks
src/git_notes_memory/hooks/context_builder.py	Replaces bare Exception with specific MemoryIndexError and OSError catches
tests/test_sync.py	Updates all test mocks for batch operations (show_notes_batch, embed_batch)
tests/test_hook_handlers.py	Updates imports and calls for renamed _read_input_with_fallback function
tests/test_context_builder.py	Updates exception mocking to use specific exception types
tests/conftest.py	Major refactor: uses ServiceRegistry.reset() instead of direct module access; adds comprehensive fixtures for git repos, mocks, and models

Comments suppressed due to low confidence (1)

tests/test_sync.py:800

The test mock setup may not match the actual behavior. In the implementation (sync.py:316), parser.parse_many(content) is called with a single content string, but the mock returns [record] (one record). However, parse_many can return multiple records from a single note. In the test at line 794, mock_note_parser.parse_many.return_value = [record] is set once, but parse_many is called twice (once for commit1 and once for commit2). This means the second call will also return the same single record, which is correct for this test but doesn't reflect that parse_many is called multiple times. The test expects 2 insertions (line 800), but only mocks one record being returned each time parse_many is called, which would work only if each note contains exactly one memory. This is fine for the test, but the comment should clarify this assumption.

        record = make_note_record()
        mock_note_parser.parse_many.return_value = [record]

        result = sync_service.reindex()

        assert result == 2
        assert mock_index.insert.call_count == 2

src/git_notes_memory/git_ops.py

src/git_notes_memory/index.py

src/git_notes_memory/hooks/stop_handler.py

tests/test_sync.py

src/git_notes_memory/git_ops.py

src/git_notes_memory/hooks/signal_detector.py

src/git_notes_memory/recall.py

src/git_notes_memory/registry.py

src/git_notes_memory/embedding.py

src/git_notes_memory/git_ops.py

- Archive 2025-12-20 code review files to docs/code-review/2025/12/20/ - Archive 2025-12-24 code review files to docs/code-review/2025/12/24/ - Add .claude/settings.local.json to .gitignore Organizes code review history for future reference while keeping root directory clean. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Changes made based on PR review comments: 1. git_ops.py: Improve path sanitization for SEC-002 - Handle Windows paths (C:\...), home-relative (~), and traversals (..) - More precise "missing" check using line.endswith pattern - Fix sha_index sync on malformed headers 2. index.py: Reduce lru_cache maxsize from 8 to 1 - Embedding dimensions are constant (384), maxsize=1 is sufficient 3. stop_handler.py: More robust empty input detection - Check for "empty input" pattern instead of just "empty" 4. signal_detector.py: Move MAX_TEXT_LENGTH to module level - Constant should not be redefined on every method call 5. registry.py: Reject kwargs on existing instances - Raise ValueError if kwargs passed when instance already exists - Prevents silently ignoring configuration parameters 6. embedding.py: Use error level for prewarm failures - Configuration issues should be visible to users Responds to Copilot review comments on PR #14. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Copilot · 2025-12-24T10:56:02Z

@zircote I've opened a new pull request, #15, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot

Pull request overview

Copilot reviewed 28 out of 33 changed files in this pull request and generated 4 comments.

src/git_notes_memory/git_ops.py

tests/test_session_analyzer.py

tests/test_hook_utils.py

- Remove redundant logging import in test_hook_utils.py - Use types_found variable in assertion (test_session_analyzer.py) - Add assertion for result variable (test_session_analyzer.py) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Use UTF-8 encoding for accurate byte counting when parsing git cat-file --batch output. The previous implementation used len(str) which counts characters, not bytes, causing incorrect content boundary detection for multi-byte UTF-8 characters. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Changes made based on PR review comments: 1. git_ops.py: Improve path sanitization for SEC-002 - Handle Windows paths (C:\...), home-relative (~), and traversals (..) - More precise "missing" check using line.endswith pattern - Fix sha_index sync on malformed headers 2. index.py: Reduce lru_cache maxsize from 8 to 1 - Embedding dimensions are constant (384), maxsize=1 is sufficient 3. stop_handler.py: More robust empty input detection - Check for "empty input" pattern instead of just "empty" 4. signal_detector.py: Move MAX_TEXT_LENGTH to module level - Constant should not be redefined on every method call 5. registry.py: Reject kwargs on existing instances - Raise ValueError if kwargs passed when instance already exists - Prevents silently ignoring configuration parameters 6. embedding.py: Use error level for prewarm failures - Configuration issues should be visible to users Responds to Copilot review comments on PR #14. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 32 out of 37 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (2)

tests/test_sync.py:1

These parameters are marked as unused with # noqa: ARG001 but the comment says 'Required by signature'. If they are truly required by the signature but not used in the function body, consider if the function signature itself needs refactoring. If this is a callback or interface requirement, add a more specific comment explaining why these parameters must match a specific signature.
tests/test_sync.py:1
The fixture is marked as unused with # noqa: ARG002 and noted as 'Required fixture'. If the fixture is truly required (e.g., for setup/teardown side effects), the comment should explain what side effect is needed. If it's not actually required, remove it from the function signature.

zircote and others added 2 commits December 24, 2025 03:54

Copilot AI review requested due to automatic review settings December 24, 2025 10:41

Copilot started reviewing on behalf of zircote December 24, 2025 10:42 View session

zircote and others added 2 commits December 24, 2025 05:44

Copilot AI reviewed Dec 24, 2025

View reviewed changes

zircote and others added 2 commits December 24, 2025 05:47

Copilot AI review requested due to automatic review settings December 24, 2025 10:55

Copilot AI mentioned this pull request Dec 24, 2025

No changes needed - path sanitization improvements already implemented #15

Closed

Copilot started reviewing on behalf of zircote December 24, 2025 10:56 View session

Copilot AI reviewed Dec 24, 2025

View reviewed changes

src/git_notes_memory/git_ops.py Show resolved Hide resolved

tests/test_session_analyzer.py Show resolved Hide resolved

tests/test_session_analyzer.py Outdated Show resolved Hide resolved

tests/test_hook_utils.py Outdated Show resolved Hide resolved

zircote and others added 3 commits December 24, 2025 06:00

chore(release): bump version 0.8.0 → 0.9.0

31a4f77

Copilot AI review requested due to automatic review settings December 24, 2025 11:07

Copilot started reviewing on behalf of zircote December 24, 2025 11:07 View session

zircote merged commit 0d673cd into main Dec 24, 2025
11 checks passed

Copilot AI reviewed Dec 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: address code review findings for performance and quality #14

fix: address code review findings for performance and quality #14

Uh oh!

zircote commented Dec 24, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI commented Dec 24, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: address code review findings for performance and quality #14

fix: address code review findings for performance and quality #14

Uh oh!

Conversation

zircote commented Dec 24, 2025

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI commented Dec 24, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants