Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 38 additions & 0 deletions docs/getting-started/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -192,6 +192,44 @@ OPERATION_TIMEOUT=300
LARGE_DOCUMENT_TIMEOUT=600
```

### Ingestion Pipelines

Document ingestion now uses [LlamaIndex ingestion pipelines](https://docs.llamaindex.ai/) with pluggable connectors, transformations, and writers. The service ships with three pipelines (`manual_input`, `file`, `directory`), and you can override or extend them from configuration by providing a JSON-style mapping in your `.env` file:

```bash
INGESTION_PIPELINES='{
"file": {
"transformations": [
{
"class_path": "llama_index.core.node_parser.SimpleNodeParser",
"kwargs": {"chunk_size": 256, "chunk_overlap": 20}
},
{
"class_path": "codebase_rag.services.knowledge.pipeline_components.MetadataEnrichmentTransformation",
"kwargs": {"metadata": {"language": "python"}}
}
]
},
"git": {
"connector": {
"class_path": "my_project.pipeline.GitRepositoryConnector",
"kwargs": {"branch": "main"}
},
"transformations": [
{
"class_path": "my_project.pipeline.CodeBlockParser",
"kwargs": {"max_tokens": 400}
}
],
"writer": {
"class_path": "codebase_rag.services.knowledge.pipeline_components.Neo4jKnowledgeGraphWriter"
}
}
}'
```

Each entry is merged with the defaults. This means you can change chunking behaviour, add metadata enrichment steps, or register new data sources by publishing your own connector class. At runtime the knowledge service builds and reuses the configured pipeline instances so changes only require a service restart.

### Neo4j Performance Tuning

For large repositories:
Expand Down
6 changes: 5 additions & 1 deletion src/codebase_rag/config/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

from pydantic_settings import BaseSettings
from pydantic import Field
from typing import Optional, Literal
from typing import Optional, Literal, Dict, Any


class Settings(BaseSettings):
Expand Down Expand Up @@ -99,6 +99,10 @@ class Settings(BaseSettings):
# Document Processing Settings
max_document_size: int = Field(default=10 * 1024 * 1024, description="Maximum document size in bytes (10MB)")
max_payload_size: int = Field(default=50 * 1024 * 1024, description="Maximum task payload size for storage (50MB)")
ingestion_pipelines: Dict[str, Dict[str, Any]] = Field(
default_factory=dict,
description="Optional ingestion pipeline overrides",
)

# API Settings
cors_origins: list = Field(default=["*"], description="CORS allowed origins")
Expand Down
Loading
Loading