Skip to content

Conversation

@zircote
Copy link
Owner

@zircote zircote commented Dec 24, 2025

Summary

  • Performance improvements (PERF-001 through PERF-007): Batch git operations, batch embeddings, model prewarming, struct caching
  • Quality improvements (QUAL-001, QUAL-002): Function naming clarity, specific exception handling
  • Security improvements (SEC-001, SEC-002): Input length limits for ReDoS prevention
  • Test updates: All fixtures and assertions updated for new batch operations

Test plan

  • All 1806 tests passing
  • 89.29% code coverage maintained
  • make quality passes (format, lint, typecheck, security)
  • Batch operations verified in test_sync.py
  • Exception handling verified in test_context_builder.py

🤖 Generated with Claude Code

zircote and others added 2 commits December 24, 2025 03:54
CHANGES:
- ServiceRegistry: Remove unused methods (register_factory, has, remove,
  reset_factories, get_registered_types) and unnecessary class variables
- ServiceRegistry: Reduce from 235 lines to 120 lines (-48%)
- Simplify CaptureService.get_default_service() to remove factory abstraction
- Update SyncService.get_sync_service() to use simplified registry API

RATIONALE:
ServiceRegistry had premature optimization with unused factory registration
pattern and close handlers that were never invoked. Kept the three methods
that are actually used: get(), register(), and reset(). This improves
clarity and maintainability while preserving all functionality.

TESTING:
- All 157 core service tests pass (capture, recall, sync)
- Full test suite: 715/716 pass (1 pre-existing unrelated failure)
- No behavior changes, functionality fully preserved
Performance improvements (PERF-001 through PERF-007):
- Add show_notes_batch() for batch git notes retrieval using git cat-file --batch
- Add embed_batch() for batch embedding generation
- Add prewarm() method for eager model loading
- Add @lru_cache for struct format caching in index.py
- Implement batch hydration in recall.py using show_notes_batch()
- Use batch operations in sync.py for index synchronization

Quality improvements (QUAL-001, QUAL-002):
- Rename _read_input to _read_input_with_fallback for clarity
- Replace broad Exception catches with specific (MemoryIndexError, OSError)

Security improvements (SEC-001, SEC-002):
- Add input length limit in signal_detector.py for ReDoS prevention

Test updates:
- Update test fixtures to mock batch operations
- Update test assertions for renamed functions and specific exceptions

All 1806 tests passing with 89.29% coverage.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings December 24, 2025 10:41
zircote and others added 2 commits December 24, 2025 05:44
…ST-001, TEST-002)

New test coverage:
- tests/test_hook_utils.py: 59 tests for shared hook utilities
  - setup_timeout/cancel_timeout (SIGALRM handling)
  - validate_file_path (security validation)
  - read_json_input (input size limits)
  - get_hook_logger (logger caching)
  - setup_logging (debug/warning configuration)

- tests/test_session_analyzer.py: 48 tests for session transcript analysis
  - parse_transcript (JSONL and plain text formats)
  - analyze/analyze_content (signal detection)
  - has_uncaptured_content (detection)
  - Novelty checking integration

All 1913 tests passing.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…-003, DOC-005, DOC-008)

README.md:
- Add comprehensive API Reference section with services, models, operations
- Document all environment variables with descriptions and defaults
- Add Hook Configuration section with all hook-specific variables

CLAUDE.md:
- Add Code Intelligence (LSP) section with hook documentation
- Add Navigation & Understanding guidance
- Add Verification Workflow and Pre-Edit Checklist

commands/*.md:
- Add consistent --help support with man-page style output
- Add help_check sections for early help detection

Addresses: DOC-003 (API Reference), DOC-005 (environment variables), DOC-008 (Related Commands)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request implements performance optimizations and quality improvements addressing code review findings. The changes introduce batch operations for git and embedding operations (PERF-001 through PERF-007), improve exception handling specificity (QUAL-001, QUAL-002), and add security safeguards against ReDoS attacks (SEC-001, SEC-002). A new ServiceRegistry class centralizes singleton management, replacing module-level global variables. All 1806 tests pass with 89.29% coverage maintained.

Key changes:

  • Batch git operations via git cat-file --batch reduce subprocess overhead when fetching multiple notes
  • Batch embedding generation processes multiple texts in one model call for better throughput
  • ServiceRegistry provides centralized singleton management with clean reset mechanism for testing

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
src/git_notes_memory/git_ops.py Adds show_notes_batch() for bulk note retrieval via git cat-file; adds path sanitization to error messages
src/git_notes_memory/sync.py Updates collect_notes(), reindex(), and verify_consistency() to use batch operations; migrates to ServiceRegistry
src/git_notes_memory/recall.py Implements batch hydration with grouped git operations; optimizes token estimation with generator expression
src/git_notes_memory/index.py Adds struct format caching for embedding serialization to avoid repeated format parsing
src/git_notes_memory/embedding.py Adds prewarm() method for eager model loading; migrates to ServiceRegistry
src/git_notes_memory/capture.py Extracts validation and front matter building into separate methods; migrates to ServiceRegistry
src/git_notes_memory/registry.py New centralized singleton registry with type-safe get/register/reset methods
src/git_notes_memory/hooks/stop_handler.py Renames _read_input to _read_input_with_fallback for clarity
src/git_notes_memory/hooks/signal_detector.py Adds 100KB input length limit to prevent ReDoS attacks
src/git_notes_memory/hooks/context_builder.py Replaces bare Exception with specific MemoryIndexError and OSError catches
tests/test_sync.py Updates all test mocks for batch operations (show_notes_batch, embed_batch)
tests/test_hook_handlers.py Updates imports and calls for renamed _read_input_with_fallback function
tests/test_context_builder.py Updates exception mocking to use specific exception types
tests/conftest.py Major refactor: uses ServiceRegistry.reset() instead of direct module access; adds comprehensive fixtures for git repos, mocks, and models
Comments suppressed due to low confidence (1)

tests/test_sync.py:800

  • The test mock setup may not match the actual behavior. In the implementation (sync.py:316), parser.parse_many(content) is called with a single content string, but the mock returns [record] (one record). However, parse_many can return multiple records from a single note. In the test at line 794, mock_note_parser.parse_many.return_value = [record] is set once, but parse_many is called twice (once for commit1 and once for commit2). This means the second call will also return the same single record, which is correct for this test but doesn't reflect that parse_many is called multiple times. The test expects 2 insertions (line 800), but only mocks one record being returned each time parse_many is called, which would work only if each note contains exactly one memory. This is fine for the test, but the comment should clarify this assumption.
        record = make_note_record()
        mock_note_parser.parse_many.return_value = [record]

        result = sync_service.reindex()

        assert result == 2
        assert mock_index.insert.call_count == 2

zircote and others added 2 commits December 24, 2025 05:47
- Archive 2025-12-20 code review files to docs/code-review/2025/12/20/
- Archive 2025-12-24 code review files to docs/code-review/2025/12/24/
- Add .claude/settings.local.json to .gitignore

Organizes code review history for future reference while keeping
root directory clean.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Changes made based on PR review comments:

1. git_ops.py: Improve path sanitization for SEC-002
   - Handle Windows paths (C:\...), home-relative (~), and traversals (..)
   - More precise "missing" check using line.endswith pattern
   - Fix sha_index sync on malformed headers

2. index.py: Reduce lru_cache maxsize from 8 to 1
   - Embedding dimensions are constant (384), maxsize=1 is sufficient

3. stop_handler.py: More robust empty input detection
   - Check for "empty input" pattern instead of just "empty"

4. signal_detector.py: Move MAX_TEXT_LENGTH to module level
   - Constant should not be redefined on every method call

5. registry.py: Reject kwargs on existing instances
   - Raise ValueError if kwargs passed when instance already exists
   - Prevents silently ignoring configuration parameters

6. embedding.py: Use error level for prewarm failures
   - Configuration issues should be visible to users

Responds to Copilot review comments on PR #14.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings December 24, 2025 10:55
Copy link

Copilot AI commented Dec 24, 2025

@zircote I've opened a new pull request, #15, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 28 out of 33 changed files in this pull request and generated 4 comments.

zircote and others added 3 commits December 24, 2025 06:00
- Remove redundant logging import in test_hook_utils.py
- Use types_found variable in assertion (test_session_analyzer.py)
- Add assertion for result variable (test_session_analyzer.py)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Use UTF-8 encoding for accurate byte counting when parsing git cat-file
--batch output. The previous implementation used len(str) which counts
characters, not bytes, causing incorrect content boundary detection for
multi-byte UTF-8 characters.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings December 24, 2025 11:07
@zircote zircote merged commit 0d673cd into main Dec 24, 2025
11 checks passed
zircote added a commit that referenced this pull request Dec 24, 2025
Changes made based on PR review comments:

1. git_ops.py: Improve path sanitization for SEC-002
   - Handle Windows paths (C:\...), home-relative (~), and traversals (..)
   - More precise "missing" check using line.endswith pattern
   - Fix sha_index sync on malformed headers

2. index.py: Reduce lru_cache maxsize from 8 to 1
   - Embedding dimensions are constant (384), maxsize=1 is sufficient

3. stop_handler.py: More robust empty input detection
   - Check for "empty input" pattern instead of just "empty"

4. signal_detector.py: Move MAX_TEXT_LENGTH to module level
   - Constant should not be redefined on every method call

5. registry.py: Reject kwargs on existing instances
   - Raise ValueError if kwargs passed when instance already exists
   - Prevents silently ignoring configuration parameters

6. embedding.py: Use error level for prewarm failures
   - Configuration issues should be visible to users

Responds to Copilot review comments on PR #14.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 32 out of 37 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (2)

tests/test_sync.py:1

  • These parameters are marked as unused with # noqa: ARG001 but the comment says 'Required by signature'. If they are truly required by the signature but not used in the function body, consider if the function signature itself needs refactoring. If this is a callback or interface requirement, add a more specific comment explaining why these parameters must match a specific signature.
    tests/test_sync.py:1
  • The fixture is marked as unused with # noqa: ARG002 and noted as 'Required fixture'. If the fixture is truly required (e.g., for setup/teardown side effects), the comment should explain what side effect is needed. If it's not actually required, remove it from the function signature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants