diff --git a/src/docs/EDGE_TYPES_VISUAL.md b/src/docs/EDGE_TYPES_VISUAL.md
new file mode 100644
index 00000000..c51473c6
--- /dev/null
+++ b/src/docs/EDGE_TYPES_VISUAL.md
@@ -0,0 +1,625 @@
+# Visual Guide to iSamples Edge Types
+
+This document provides visual representations of the iSamples property graph structure using diagrams and charts.
+
+## Table of Contents
+
+1. [Complete Entity Relationship Diagram](#complete-entity-relationship-diagram)
+2. [Edge Type Matrix](#edge-type-matrix)
+3. [Sample-Centric View](#sample-centric-view)
+4. [Event-Centric View](#event-centric-view)
+5. [Graph Traversal Examples](#graph-traversal-examples)
+6. [Edge Type Heatmap](#edge-type-heatmap)
+7. [Storage Structure Diagram](#storage-structure-diagram)
+
+---
+
+## Complete Entity Relationship Diagram
+
+This diagram shows all 8 entity types and the 14 relationship types (predicates) connecting them.
+
+```mermaid
+graph TB
+ MSR[MaterialSampleRecord
π Sample]
+ Event[SamplingEvent
π― Collection Event]
+ Site[SamplingSite
π Named Location]
+ Coords[GeospatialCoordLocation
π Coordinates]
+ Concept[IdentifiedConcept
π·οΈ Vocabulary Term]
+ Agent[Agent
π€ Person/Organization]
+ Curation[MaterialSampleCuration
π¦ Repository Info]
+ Relation[SampleRelation
π Sample Links]
+
+ MSR -->|produced_by| Event
+ MSR -->|has_material_category| Concept
+ MSR -->|has_context_category| Concept
+ MSR -->|has_sample_object_type| Concept
+ MSR -->|keywords| Concept
+ MSR -->|registrant| Agent
+ MSR -->|curation| Curation
+ MSR -->|related_resource| Relation
+
+ Event -->|sampling_site| Site
+ Event -->|sample_location| Coords
+ Event -->|has_context_category| Concept
+ Event -->|responsibility| Agent
+
+ Site -->|site_location| Coords
+
+ Curation -->|responsibility| Agent
+
+ classDef core fill:#e1f5ff,stroke:#0077be,stroke-width:3px
+ classDef event fill:#fff4e1,stroke:#ff8c00,stroke-width:2px
+ classDef location fill:#e8f5e9,stroke:#4caf50,stroke-width:2px
+ classDef vocab fill:#f3e5f5,stroke:#9c27b0,stroke-width:2px
+ classDef supporting fill:#fce4ec,stroke:#e91e63,stroke-width:2px
+
+ class MSR core
+ class Event event
+ class Site,Coords location
+ class Concept vocab
+ class Agent,Curation,Relation supporting
+```
+
+**Legend:**
+- **π MaterialSampleRecord (blue):** The physical sample - central entity
+- **π― SamplingEvent (orange):** When/how the sample was collected
+- **π SamplingSite (green):** Named locations (e.g., "ΓatalhΓΆyΓΌk")
+- **π GeospatialCoordLocation (green):** Latitude/longitude coordinates
+- **π·οΈ IdentifiedConcept (purple):** Controlled vocabulary terms
+- **π€ Agent (pink):** People and organizations
+- **π¦ MaterialSampleCuration (pink):** Repository/archive information
+- **π SampleRelation (pink):** Links between related samples
+
+---
+
+## Edge Type Matrix
+
+This table shows which entity types (subjects) connect to which entity types (objects) via which predicates.
+
+| **Subject Type** | **Predicate** | **Object Type** | **Multivalued** | **Required** |
+|------------------|---------------|-----------------|-----------------|--------------|
+| MaterialSampleRecord | `produced_by` | SamplingEvent | No | Yes |
+| MaterialSampleRecord | `has_material_category` | IdentifiedConcept | Yes | No |
+| MaterialSampleRecord | `has_context_category` | IdentifiedConcept | Yes | No |
+| MaterialSampleRecord | `has_sample_object_type` | IdentifiedConcept | Yes | No |
+| MaterialSampleRecord | `keywords` | IdentifiedConcept | Yes | No |
+| MaterialSampleRecord | `registrant` | Agent | No | No |
+| MaterialSampleRecord | `curation` | MaterialSampleCuration | No | No |
+| MaterialSampleRecord | `related_resource` | SampleRelation | Yes | No |
+| SamplingEvent | `sampling_site` | SamplingSite | No | No |
+| SamplingEvent | `sample_location` | GeospatialCoordLocation | No | No |
+| SamplingEvent | `has_context_category` | IdentifiedConcept | Yes | No |
+| SamplingEvent | `responsibility` | Agent | Yes | No |
+| SamplingSite | `site_location` | GeospatialCoordLocation | No | No |
+| MaterialSampleCuration | `responsibility` | Agent | Yes | No |
+
+**Total:** 14 edge types forming the complete iSamples grammar
+
+---
+
+## Sample-Centric View
+
+This diagram focuses on relationships emanating from a MaterialSampleRecord (the core entity).
+
+```mermaid
+graph LR
+ Sample[MaterialSampleRecord
'Pottery Sherd 42']
+
+ Sample -->|produced_by
REQUIRED| Event[SamplingEvent
'Excavation Layer 3']
+ Sample -->|has_material_category| Mat[IdentifiedConcept
'Ceramic']
+ Sample -->|has_context_category| Ctx[IdentifiedConcept
'Archaeological']
+ Sample -->|has_sample_object_type| Type[IdentifiedConcept
'Pottery']
+ Sample -->|keywords| KW1[IdentifiedConcept
'Neolithic']
+ Sample -->|keywords| KW2[IdentifiedConcept
'Painted']
+ Sample -->|registrant| Reg[Agent
'J. Smith']
+ Sample -->|curation| Cur[MaterialSampleCuration
'Museum Archive']
+ Sample -->|related_resource| Rel[SampleRelation
'Parent Sample Link']
+
+ classDef sample fill:#e1f5ff,stroke:#0077be,stroke-width:4px
+ classDef required fill:#ffebee,stroke:#c62828,stroke-width:3px
+ classDef optional fill:#f5f5f5,stroke:#757575,stroke-width:1px
+
+ class Sample sample
+ class Event required
+ class Mat,Ctx,Type,KW1,KW2,Reg,Cur,Rel optional
+```
+
+**Key observations:**
+- **Only `produced_by` is required** - every sample MUST link to a SamplingEvent
+- **Multiple keywords** can be assigned (multivalued)
+- **IdentifiedConcept used 4 different ways** - material, context, object type, keywords
+- **3 relationship types to IdentifiedConcept** enable rich categorization
+
+---
+
+## Event-Centric View
+
+This diagram shows how SamplingEvent acts as a bridge between samples and location/collector information.
+
+```mermaid
+graph TB
+ subgraph Samples
+ S1[Sample 1]
+ S2[Sample 2]
+ S3[Sample 3]
+ end
+
+ subgraph Event Context
+ Event[SamplingEvent
'2023-06-15 Excavation']
+ end
+
+ subgraph Location
+ Site[SamplingSite
'ΓatalhΓΆyΓΌk']
+ Coords1[GeospatialCoordLocation
'Event Location']
+ Coords2[GeospatialCoordLocation
'Site Centroid']
+ end
+
+ subgraph People
+ Agent1[Agent
'Dr. Smith']
+ Agent2[Agent
'Lab Tech']
+ end
+
+ subgraph Classification
+ Context[IdentifiedConcept
'Archaeological']
+ end
+
+ S1 -->|produced_by| Event
+ S2 -->|produced_by| Event
+ S3 -->|produced_by| Event
+
+ Event -->|sampling_site| Site
+ Event -->|sample_location| Coords1
+ Event -->|responsibility| Agent1
+ Event -->|responsibility| Agent2
+ Event -->|has_context_category| Context
+
+ Site -->|site_location| Coords2
+
+ classDef event fill:#fff4e1,stroke:#ff8c00,stroke-width:3px
+ classDef samples fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
+ classDef location fill:#e8f5e9,stroke:#4caf50,stroke-width:2px
+ classDef people fill:#fce4ec,stroke:#e91e63,stroke-width:2px
+ classDef vocab fill:#f3e5f5,stroke:#9c27b0,stroke-width:2px
+
+ class Event event
+ class S1,S2,S3 samples
+ class Site,Coords1,Coords2 location
+ class Agent1,Agent2 people
+ class Context vocab
+```
+
+**Key observations:**
+- **Multiple samples** can share the same SamplingEvent (batch collection)
+- **Two paths to coordinates:** Event location (specific) vs Site location (general)
+- **Multiple agents** can be responsible for an event (multivalued)
+- **Event bridges samples to context** - who, when, where
+
+---
+
+## Graph Traversal Examples
+
+### Example 1: Find Sample Coordinates (2-hop traversal)
+
+```mermaid
+graph LR
+ A[MaterialSampleRecord] -->|1. produced_by| B[SamplingEvent]
+ B -->|2. sample_location| C[GeospatialCoordLocation]
+
+ style A fill:#e1f5ff,stroke:#0077be,stroke-width:3px
+ style B fill:#fff4e1,stroke:#ff8c00,stroke-width:2px
+ style C fill:#e8f5e9,stroke:#4caf50,stroke-width:2px
+```
+
+**SQL Pattern:**
+```sql
+SELECT sample.*, coords.*
+FROM pqg AS sample
+JOIN pqg AS edge1 ON edge1.s = sample.row_id AND edge1.p = 'produced_by'
+JOIN pqg AS event ON event.row_id = ANY(edge1.o)
+JOIN pqg AS edge2 ON edge2.s = event.row_id AND edge2.p = 'sample_location'
+JOIN pqg AS coords ON coords.row_id = ANY(edge2.o)
+```
+
+### Example 2: Find Sample Site Name (3-hop traversal)
+
+```mermaid
+graph LR
+ A[MaterialSampleRecord] -->|1. produced_by| B[SamplingEvent]
+ B -->|2. sampling_site| C[SamplingSite]
+ C -->|3. site_location| D[GeospatialCoordLocation]
+
+ style A fill:#e1f5ff,stroke:#0077be,stroke-width:3px
+ style B fill:#fff4e1,stroke:#ff8c00,stroke-width:2px
+ style C fill:#e8f5e9,stroke:#4caf50,stroke-width:2px
+ style D fill:#e8f5e9,stroke:#4caf50,stroke-width:2px
+```
+
+**SQL Pattern:**
+```sql
+SELECT sample.*, site.label AS site_name, coords.*
+FROM pqg AS sample
+JOIN pqg AS edge1 ON edge1.s = sample.row_id AND edge1.p = 'produced_by'
+JOIN pqg AS event ON event.row_id = ANY(edge1.o)
+JOIN pqg AS edge2 ON edge2.s = event.row_id AND edge2.p = 'sampling_site'
+JOIN pqg AS site ON site.row_id = ANY(edge2.o)
+JOIN pqg AS edge3 ON edge3.s = site.row_id AND edge3.p = 'site_location'
+JOIN pqg AS coords ON coords.row_id = ANY(edge3.o)
+```
+
+### Example 3: Find Sample Collector (2-hop traversal)
+
+```mermaid
+graph LR
+ A[MaterialSampleRecord] -->|1. produced_by| B[SamplingEvent]
+ B -->|2. responsibility| C[Agent]
+
+ style A fill:#e1f5ff,stroke:#0077be,stroke-width:3px
+ style B fill:#fff4e1,stroke:#ff8c00,stroke-width:2px
+ style C fill:#fce4ec,stroke:#e91e63,stroke-width:2px
+```
+
+**SQL Pattern:**
+```sql
+SELECT sample.*, agent.*
+FROM pqg AS sample
+JOIN pqg AS edge1 ON edge1.s = sample.row_id AND edge1.p = 'produced_by'
+JOIN pqg AS event ON event.row_id = ANY(edge1.o)
+JOIN pqg AS edge2 ON edge2.s = event.row_id AND edge2.p = 'responsibility'
+JOIN pqg AS agent ON agent.row_id = ANY(edge2.o)
+```
+
+### Example 4: Find Repository Curator (2-hop traversal)
+
+```mermaid
+graph LR
+ A[MaterialSampleRecord] -->|1. curation| B[MaterialSampleCuration]
+ B -->|2. responsibility| C[Agent]
+
+ style A fill:#e1f5ff,stroke:#0077be,stroke-width:3px
+ style B fill:#fce4ec,stroke:#e91e63,stroke-width:2px
+ style C fill:#fce4ec,stroke:#e91e63,stroke-width:2px
+```
+
+**SQL Pattern:**
+```sql
+SELECT sample.*, curation.*, curator.*
+FROM pqg AS sample
+JOIN pqg AS edge1 ON edge1.s = sample.row_id AND edge1.p = 'curation'
+JOIN pqg AS curation ON curation.row_id = ANY(edge1.o)
+JOIN pqg AS edge2 ON edge2.s = curation.row_id AND edge2.p = 'responsibility'
+JOIN pqg AS curator ON curator.row_id = ANY(edge2.o)
+```
+
+---
+
+## Edge Type Heatmap
+
+This matrix shows the "connectivity density" between entity types in the OpenContext dataset.
+
+### Actual Edge Counts (OpenContext Dataset - 11.6M total records)
+
+| **From/To** | **Material
Sample
Record** | **Sampling
Event** | **Sampling
Site** | **Geospatial
Coord
Location** | **Identified
Concept** | **Agent** | **Material
Sample
Curation** | **Sample
Relation** |
+|-------------|:----------------------------------:|:----------------------:|:---------------------:|:--------------------------------------:|:--------------------------:|:---------:|:------------------------------------:|:-----------------------:|
+| **MaterialSampleRecord** | - | π₯π₯π₯
1.1M | - | - | π₯π₯π₯π₯π₯
9.4M | βοΈ
~1K | βοΈ
~1K | βοΈ
~1K |
+| **SamplingEvent** | - | - | π₯π₯
384K | π₯π₯π₯
1.1M | π₯π₯π₯
1.1M | π₯
73K | - | - |
+| **SamplingSite** | - | - | - | π₯π₯
384K | - | - | - | - |
+| **MaterialSampleCuration** | - | - | - | - | - | βοΈ
~1K | - | - |
+
+**Legend:**
+- π₯π₯π₯π₯π₯ = >5M edges (ultra-dense)
+- π₯π₯π₯ = 1M-5M edges (very dense)
+- π₯π₯ = 100K-1M edges (dense)
+- π₯ = 10K-100K edges (moderate)
+- βοΈ = <10K edges (sparse)
+- `-` = 0 edges (no relationship)
+
+**Key insights:**
+1. **MaterialSampleRecord β IdentifiedConcept** is the densest relationship (9.4M edges)
+ - Includes: material categories, context categories, object types, keywords
+2. **MaterialSampleRecord β SamplingEvent** is critical infrastructure (1.1M edges)
+ - Required relationship - every sample has exactly one event
+3. **Event β Coordinates** enables geospatial queries (1.1M edges)
+4. **Curation and Relation** are rarely used in OpenContext data
+ - More common in geology (SESAR) and biology (GEOME) domains
+
+---
+
+## Storage Structure Diagram
+
+This diagram shows how entities and edges are stored in the unified PQG table.
+
+```mermaid
+graph TB
+ subgraph "PQG Table (Unified Storage)"
+ subgraph "Entity Rows (otype != '_edge_')"
+ E1["row_id: 1
pid: 'iSamples:...'
otype: 'MaterialSampleRecord'
label: 'Sample 42'
description: '...'"]
+ E2["row_id: 2
pid: 'iSamples:...'
otype: 'SamplingEvent'
label: 'Excavation 2023'
event_date: '2023-06-15'"]
+ E3["row_id: 3
pid: 'iSamples:...'
otype: 'GeospatialCoordLocation'
latitude: 37.5
longitude: 32.8"]
+ end
+
+ subgraph "Edge Rows (otype = '_edge_')"
+ Edge1["row_id: 100
otype: '_edge_'
s: 1
p: 'produced_by'
o: [2]"]
+ Edge2["row_id: 101
otype: '_edge_'
s: 2
p: 'sample_location'
o: [3]"]
+ end
+ end
+
+ E1 -.->|"s=1"| Edge1
+ Edge1 -.->|"o=[2]"| E2
+ E2 -.->|"s=2"| Edge2
+ Edge2 -.->|"o=[3]"| E3
+
+ style E1 fill:#e1f5ff,stroke:#0077be,stroke-width:2px
+ style E2 fill:#fff4e1,stroke:#ff8c00,stroke-width:2px
+ style E3 fill:#e8f5e9,stroke:#4caf50,stroke-width:2px
+ style Edge1 fill:#ffebee,stroke:#c62828,stroke-width:2px
+ style Edge2 fill:#ffebee,stroke:#c62828,stroke-width:2px
+```
+
+**How it works:**
+1. **Entity rows** have `otype` set to their entity type (e.g., `MaterialSampleRecord`)
+2. **Edge rows** have `otype = '_edge_'`
+3. **Edge `s` field** points to subject entity's `row_id`
+4. **Edge `p` field** contains the predicate name (e.g., `produced_by`)
+5. **Edge `o` field** is an **array** of object entity `row_id`s (supports multivalued)
+6. **Joining** requires matching `edge.s = subject.row_id` and `object.row_id = ANY(edge.o)`
+
+---
+
+## Predicate Usage Patterns
+
+This chart shows how often each predicate appears in the OpenContext dataset.
+
+```mermaid
+%%{init: {'theme':'base'}}%%
+graph LR
+ subgraph "Most Common (>1M edges each)"
+ P1["has_sample_object_type
1,124,480 edges"]
+ P2["produced_by
1,096,352 edges"]
+ P3["has_material_category
1,095,920 edges"]
+ P4["has_context_category
1,095,912 edges"]
+ P5["keywords
1,070,912 edges"]
+ end
+
+ subgraph "Common (100K-1M edges)"
+ P6["sample_location
1,095,912 edges"]
+ P7["sampling_site
383,912 edges"]
+ P8["site_location
383,912 edges"]
+ end
+
+ subgraph "Moderate (10K-100K edges)"
+ P9["responsibility (Event)
72,520 edges"]
+ end
+
+ subgraph "Rare (<10K edges)"
+ P10["registrant
~1,000 edges"]
+ P11["curation
~500 edges"]
+ P12["responsibility (Curation)
~500 edges"]
+ end
+
+ subgraph "Not Used in OpenContext"
+ P13["related_resource
0 edges"]
+ end
+
+ style P1 fill:#c62828,color:#fff
+ style P2 fill:#c62828,color:#fff
+ style P3 fill:#c62828,color:#fff
+ style P4 fill:#c62828,color:#fff
+ style P5 fill:#c62828,color:#fff
+ style P6 fill:#f57c00,color:#fff
+ style P7 fill:#f57c00,color:#fff
+ style P8 fill:#f57c00,color:#fff
+ style P9 fill:#fbc02d
+ style P10 fill:#aed581
+ style P11 fill:#aed581
+ style P12 fill:#aed581
+ style P13 fill:#e0e0e0
+```
+
+**Domain patterns:**
+- **OpenContext (archaeology):** Heavy use of categorization (material, context, object type)
+- **SESAR (geology):** More use of `curation` and `registrant` (institutional tracking)
+- **GEOME (biology):** Heavy use of `related_resource` (parent-child sample chains)
+
+---
+
+## Multi-Hop Traversal Map
+
+This diagram shows common multi-hop query patterns and their path lengths.
+
+```mermaid
+graph TB
+ MSR[MaterialSampleRecord
'Start Here']
+
+ MSR -->|1 hop| Event[SamplingEvent]
+ MSR -->|1 hop| Material[Material Category]
+ MSR -->|1 hop| Context[Context Category]
+ MSR -->|1 hop| Registrant[Registrant]
+
+ Event -->|+1 = 2 hops| Coords1[Event Coordinates]
+ Event -->|+1 = 2 hops| Site[Sampling Site]
+ Event -->|+1 = 2 hops| Collector[Collector]
+ Event -->|+1 = 2 hops| EventContext[Event Context]
+
+ Site -->|+1 = 3 hops| Coords2[Site Coordinates]
+
+ MSR -->|1 hop| Curation[Curation Info]
+ Curation -->|+1 = 2 hops| Curator[Curator]
+
+ MSR -->|1 hop| Related[Related Samples]
+
+ classDef hop1 fill:#e1f5ff,stroke:#0077be,stroke-width:2px
+ classDef hop2 fill:#fff4e1,stroke:#ff8c00,stroke-width:2px
+ classDef hop3 fill:#e8f5e9,stroke:#4caf50,stroke-width:2px
+
+ class MSR,Material,Context,Registrant,Event,Curation,Related hop1
+ class Coords1,Site,Collector,EventContext,Curator hop2
+ class Coords2 hop3
+```
+
+**Path complexity:**
+- **1-hop queries:** Direct attributes (material, context, keywords, registrant)
+- **2-hop queries:** Location, collector, site name (most common complex queries)
+- **3-hop queries:** Site coordinates (rare - usually use event coordinates instead)
+
+---
+
+## Entity Type Connectivity
+
+This diagram shows how "connected" each entity type is (number of relationship types it participates in).
+
+```mermaid
+graph LR
+ subgraph "Highly Connected (Hub Nodes)"
+ MSR["MaterialSampleRecord
8 outgoing edge types
π Centrality: HIGH"]
+ Event["SamplingEvent
4 outgoing edge types
π Centrality: HIGH"]
+ end
+
+ subgraph "Moderately Connected"
+ Site["SamplingSite
1 outgoing edge type
π Centrality: MEDIUM"]
+ Curation["MaterialSampleCuration
1 outgoing edge type
π Centrality: MEDIUM"]
+ end
+
+ subgraph "Leaf Nodes (No Outgoing Edges)"
+ Concept["IdentifiedConcept
0 outgoing
5 incoming edge types
π Centrality: HIGH (target)"]
+ Agent["Agent
0 outgoing
3 incoming edge types
π Centrality: MEDIUM (target)"]
+ Coords["GeospatialCoordLocation
0 outgoing
2 incoming edge types
π Centrality: MEDIUM (target)"]
+ Relation["SampleRelation
0 outgoing
1 incoming edge type
π Centrality: LOW (target)"]
+ end
+
+ style MSR fill:#c62828,color:#fff
+ style Event fill:#f57c00,color:#fff
+ style Site fill:#fbc02d
+ style Curation fill:#fbc02d
+ style Concept fill:#9c27b0,color:#fff
+ style Agent fill:#7b1fa2,color:#fff
+ style Coords fill:#7b1fa2,color:#fff
+ style Relation fill:#aed581
+```
+
+**Key observations:**
+1. **MaterialSampleRecord** is the primary hub (8 outgoing relationship types)
+2. **SamplingEvent** is secondary hub (4 outgoing relationship types)
+3. **IdentifiedConcept** is most popular target (5 different incoming predicates)
+4. **Agent, Coords** are intermediate targets (2-3 incoming predicates each)
+5. **SampleRelation** is rarely used (1 incoming predicate, sparse in data)
+
+---
+
+## The 14 Sentence Types (Grammar Summary)
+
+Visual summary of the complete iSamples "grammar":
+
+```mermaid
+graph TB
+ subgraph "1. MaterialSampleRecord Sentences (8 types)"
+ S1["Sample --produced_byβ Event
REQUIRED"]
+ S2["Sample --has_material_categoryβ Concept
Material type"]
+ S3["Sample --has_context_categoryβ Concept
Sampled feature"]
+ S4["Sample --has_sample_object_typeβ Concept
Object classification"]
+ S5["Sample --keywordsβ Concept
Discovery terms"]
+ S6["Sample --registrantβ Agent
Who registered"]
+ S7["Sample --curationβ Curation
Archive info"]
+ S8["Sample --related_resourceβ Relation
Sample links"]
+ end
+
+ subgraph "2. SamplingEvent Sentences (4 types)"
+ E1["Event --sampling_siteβ Site
Named location"]
+ E2["Event --sample_locationβ Coords
Exact coordinates"]
+ E3["Event --has_context_categoryβ Concept
Event type"]
+ E4["Event --responsibilityβ Agent
Collector"]
+ end
+
+ subgraph "3. SamplingSite Sentences (1 type)"
+ T1["Site --site_locationβ Coords
Site centroid"]
+ end
+
+ subgraph "4. MaterialSampleCuration Sentences (1 type)"
+ C1["Curation --responsibilityβ Agent
Curator"]
+ end
+
+ style S1 fill:#c62828,color:#fff
+ style S2 fill:#e57373,color:#fff
+ style S3 fill:#e57373,color:#fff
+ style S4 fill:#e57373,color:#fff
+ style S5 fill:#e57373,color:#fff
+ style S6 fill:#e57373,color:#fff
+ style S7 fill:#e57373,color:#fff
+ style S8 fill:#e57373,color:#fff
+ style E1 fill:#f57c00,color:#fff
+ style E2 fill:#f57c00,color:#fff
+ style E3 fill:#f57c00,color:#fff
+ style E4 fill:#f57c00,color:#fff
+ style T1 fill:#4caf50,color:#fff
+ style C1 fill:#9c27b0,color:#fff
+```
+
+**Total:** 14 edge types = Complete grammar of iSamples property graphs
+
+---
+
+## Cross-Domain Comparison
+
+How different scientific domains use the 14 edge types:
+
+```mermaid
+%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#e1f5ff'}}}%%
+graph TB
+ subgraph "All Domains Use (Core Infrastructure)"
+ Core["produced_by
has_material_category
has_context_category
sample_location"]
+ end
+
+ subgraph "Archaeology Heavy Use (OpenContext)"
+ Arch["keywords
has_sample_object_type
sampling_site
site_location"]
+ end
+
+ subgraph "Geology Heavy Use (SESAR)"
+ Geo["registrant
curation
responsibility (Curation)"]
+ end
+
+ subgraph "Biology Heavy Use (GEOME)"
+ Bio["related_resource
responsibility (Event)"]
+ end
+
+ style Core fill:#4caf50,color:#fff
+ style Arch fill:#ff8c00,color:#fff
+ style Geo fill:#2196f3,color:#fff
+ style Bio fill:#9c27b0,color:#fff
+```
+
+**Why different patterns?**
+- **Archaeology:** Heavy emphasis on discovery/publication (keywords, object types)
+- **Geology:** Institutional tracking (registrants, repositories, curators)
+- **Biology:** Sample lineage (parent-child relationships via related_resource)
+- **All domains:** Need material classification and geographic coordinates
+
+---
+
+## Graph Query Complexity Chart
+
+This chart shows the complexity distribution of common queries:
+
+| **Query Type** | **Hops** | **Joins** | **Complexity** | **Example** |
+|----------------|:--------:|:---------:|:--------------:|-------------|
+| Get sample label | 0 | 0 | β | `SELECT label FROM pqg WHERE pid=?` |
+| Get material category | 1 | 2 | ββ | Sample β Category |
+| Get sample coordinates | 2 | 4 | βββ | Sample β Event β Coords |
+| Get collector name | 2 | 4 | βββ | Sample β Event β Agent |
+| Get site name | 2 | 4 | βββ | Sample β Event β Site |
+| Get site coordinates | 3 | 6 | ββββ | Sample β Event β Site β Coords |
+| Get all related samples | 2-4 | 4-8 | βββββ | Sample β Relation β Samples (recursive) |
+
+**Performance tip:** Cache 2-hop queries (coordinates, collectors) - they're the most common complex pattern.
+
+---
+
+## Next Steps
+
+- **SQL examples**: See [QUERYING_THE_GRAPH.md](QUERYING_THE_GRAPH.md) for detailed SQL patterns
+- **Predicate details**: See [PREDICATES_REFERENCE.md](PREDICATES_REFERENCE.md) for each relationship type
+- **Conceptual guide**: See [UNDERSTANDING_THE_GRAPH.md](UNDERSTANDING_THE_GRAPH.md) for foundations
+- **Real examples**: See [EXAMPLES_BY_DOMAIN.md](EXAMPLES_BY_DOMAIN.md) for complete YAML samples
+
+---
+
+**Last updated:** 2025-11-14
+**Part of:** iSamples Property Graph Documentation Suite
diff --git a/src/docs/EXAMPLES_BY_DOMAIN.md b/src/docs/EXAMPLES_BY_DOMAIN.md
new file mode 100644
index 00000000..8cec8ea9
--- /dev/null
+++ b/src/docs/EXAMPLES_BY_DOMAIN.md
@@ -0,0 +1,842 @@
+# iSamples Examples by Scientific Domain
+
+**Purpose:** Demonstrate how the same iSamples schema works across different scientific domains with concrete real-world examples.
+
+**Key Insight:** The iSamples model is truly **domain-agnostic** - the same 8 entity types and 14 predicates work for archaeology, geology, biology, and more. **Only the values change**, not the structure.
+
+---
+
+## Table of Contents
+
+1. [Archaeology (OpenContext)](#archaeology-opencontext)
+2. [Geology (SESAR - Projected)](#geology-sesar---projected)
+3. [Biology (GEOME - Projected)](#biology-geome---projected)
+4. [Cross-Domain Comparison](#cross-domain-comparison)
+5. [Domain-Specific Patterns](#domain-specific-patterns)
+
+---
+
+## Archaeology (OpenContext)
+
+**Data Source:** OpenContext (https://opencontext.org)
+**Dataset Size:** 1,096,352 samples from archaeological excavations worldwide
+**Primary Domain:** Cultural heritage, archaeological artifacts
+
+### Sample Profile: Pottery Sherd from ΓatalhΓΆyΓΌk
+
+#### Complete Graph Structure
+
+```
+MaterialSampleRecord (Pottery Sherd)
+ ββ produced_by ββββββββ SamplingEvent (2023 Excavation)
+ β ββ sampling_site ββββ SamplingSite (ΓatalhΓΆyΓΌk South Area)
+ β β ββ site_location ββββ GeospatialCoordLocation (37.666Β°N, 32.827Β°E)
+ β ββ sample_location βββ GeospatialCoordLocation (37.6665Β°N, 32.8274Β°E, depth: 3.2m)
+ β ββ responsibility ββββ Agent (Dr. Sarah Johnson)
+ β
+ ββ has_material_category ββ IdentifiedConcept (Earthenware)
+ ββ has_context_category βββ IdentifiedConcept (Terrestrial > Archaeological)
+ ββ has_sample_object_type ββ IdentifiedConcept (Sherd)
+ ββ keywords βββββββββββββββ IdentifiedConcept (Neolithic)
+ ββ keywords βββββββββββββββ IdentifiedConcept (Pottery)
+ ββ keywords βββββββββββββββ IdentifiedConcept (Red-slipped ware)
+ ββ registrant βββββββββββββ Agent (OpenContext Data Curator)
+```
+
+#### Full YAML Example
+
+```yaml
+# === SAMPLE NODE ===
+sample_pottery_001:
+ otype: MaterialSampleRecord
+ pid: "igsn:IEOCH0001"
+ label: "Ceramic bowl rim fragment, Trench 5, Level 3"
+ description: >
+ Red-slipped pottery sherd with geometric incised decoration.
+ Bowl rim fragment with 15cm estimated diameter.
+ Fine-grained clay matrix with minimal tempering.
+ sample_identifier: "CATAL-2023-T5-L3-P001"
+
+# === SAMPLING EVENT NODE ===
+event_excavation_001:
+ otype: SamplingEvent
+ pid: "event:catal-2023-t5-l3"
+ label: "ΓatalhΓΆyΓΌk 2023, Trench 5, Level 3"
+ description: >
+ Systematic excavation of Neolithic domestic structure.
+ Level 3 represents occupation phase dated 6500-6400 BCE.
+ Standard archaeological excavation methodology with 3D recording.
+ result_time: "2023-07-15T14:30:00Z"
+ has_feature_of_interest: "Neolithic architectural feature: building floor"
+ project: "ΓatalhΓΆyΓΌk Research Project"
+
+# === SAMPLING SITE NODE ===
+site_catalhoyuk:
+ otype: SamplingSite
+ pid: "site:catalhoyuk-south-area"
+ label: "ΓatalhΓΆyΓΌk South Area"
+ description: >
+ Neolithic settlement mound in central Anatolia, Turkey.
+ UNESCO World Heritage Site. Occupied 7100-5950 BCE.
+ place_name:
+ - "ΓatalhΓΆyΓΌk"
+ - "Γatal HΓΆyΓΌk"
+ - "Chatal Huyuk"
+
+# === GEOSPATIAL COORDINATE NODES ===
+coords_site:
+ otype: GeospatialCoordLocation
+ pid: "coords:catalhoyuk-site-center"
+ latitude: 37.666
+ longitude: 32.827
+ elevation: "1000 m above mean sea level"
+ obfuscated: false
+
+coords_sample:
+ otype: GeospatialCoordLocation
+ pid: "coords:catal-2023-t5-l3-p001"
+ latitude: 37.6665
+ longitude: 32.8274
+ elevation: "3.2 m below surface"
+ obfuscated: false
+
+# === IDENTIFIED CONCEPT NODES ===
+concept_earthenware:
+ otype: IdentifiedConcept
+ pid: "https://w3id.org/isample/vocabulary/material/0.9/earthenware"
+ label: "Earthenware"
+ scheme_name: "iSamples Material Type Vocabulary"
+ scheme_uri: "https://w3id.org/isample/vocabulary/material/"
+
+concept_archaeological:
+ otype: IdentifiedConcept
+ pid: "https://w3id.org/isample/vocabulary/sampledfeature/0.9/terrestrial_archaeological"
+ label: "Terrestrial environment > Archaeological site"
+ scheme_name: "iSamples Sampled Feature Vocabulary"
+
+concept_sherd:
+ otype: IdentifiedConcept
+ pid: "https://w3id.org/isample/vocabulary/materialsampleobjecttype/0.9/sherd"
+ label: "Sherd"
+ scheme_name: "iSamples Material Sample Object Type Vocabulary"
+
+keyword_neolithic:
+ otype: IdentifiedConcept
+ pid: "keyword:neolithic"
+ label: "Neolithic"
+
+keyword_pottery:
+ otype: IdentifiedConcept
+ pid: "keyword:pottery"
+ label: "Pottery"
+
+keyword_redslipped:
+ otype: IdentifiedConcept
+ pid: "keyword:red-slipped-ware"
+ label: "Red-slipped ware"
+
+# === AGENT NODES ===
+agent_collector:
+ otype: Agent
+ pid: "https://orcid.org/0000-0002-1234-5678"
+ name: "Dr. Sarah Johnson"
+ affiliation: "University of Cambridge, McDonald Institute"
+ contact_information: "sjohnson@cam.ac.uk"
+ role: "Field Supervisor"
+
+agent_registrant:
+ otype: Agent
+ pid: "agent:opencontext-curator"
+ name: "OpenContext Data Team"
+ affiliation: "The Alexandria Archive Institute"
+ contact_information: "info@opencontext.org"
+ role: "Data Curator"
+
+# === EDGES ===
+# Sample β Event
+edge_produced_by:
+ otype: _edge_
+ s: sample_pottery_001
+ p: produced_by
+ o: [event_excavation_001]
+
+# Event β Site
+edge_sampling_site:
+ otype: _edge_
+ s: event_excavation_001
+ p: sampling_site
+ o: [site_catalhoyuk]
+
+# Site β Site Coordinates
+edge_site_location:
+ otype: _edge_
+ s: site_catalhoyuk
+ p: site_location
+ o: [coords_site]
+
+# Event β Sample Coordinates
+edge_sample_location:
+ otype: _edge_
+ s: event_excavation_001
+ p: sample_location
+ o: [coords_sample]
+
+# Event β Collector
+edge_responsibility:
+ otype: _edge_
+ s: event_excavation_001
+ p: responsibility
+ o: [agent_collector]
+
+# Sample β Material Type
+edge_material:
+ otype: _edge_
+ s: sample_pottery_001
+ p: has_material_category
+ o: [concept_earthenware]
+
+# Sample β Context
+edge_context:
+ otype: _edge_
+ s: sample_pottery_001
+ p: has_context_category
+ o: [concept_archaeological]
+
+# Sample β Object Type
+edge_object_type:
+ otype: _edge_
+ s: sample_pottery_001
+ p: has_sample_object_type
+ o: [concept_sherd]
+
+# Sample β Keywords (multivalued)
+edge_keyword_1:
+ otype: _edge_
+ s: sample_pottery_001
+ p: keywords
+ o: [keyword_neolithic]
+
+edge_keyword_2:
+ otype: _edge_
+ s: sample_pottery_001
+ p: keywords
+ o: [keyword_pottery]
+
+edge_keyword_3:
+ otype: _edge_
+ s: sample_pottery_001
+ p: keywords
+ o: [keyword_redslipped]
+
+# Sample β Registrant
+edge_registrant:
+ otype: _edge_
+ s: sample_pottery_001
+ p: registrant
+ o: [agent_registrant]
+```
+
+### Archaeology-Specific Patterns
+
+**What's unique:**
+- Heavy use of **keywords** for taxonomic and cultural terms
+- **Detailed site names** (place_name with multiple spellings)
+- **Depth measurements** instead of elevation ("3.2 m below surface")
+- **Cultural periods** in keywords (Neolithic, Bronze Age, etc.)
+- **No curation information** (samples often remain at excavation sites)
+
+**Edge types used:** 10 of 14
+- β
produced_by, has_material_category, has_context_category, has_sample_object_type
+- β
keywords, registrant, sampling_site, sample_location, responsibility (Event), site_location
+- β curation, related_resource, has_context_category (Event), responsibility (Curation)
+
+---
+
+## Geology (SESAR - Projected)
+
+**Data Source:** SESAR (System for Earth Sample Registration)
+**Dataset Size:** ~1M+ rock, mineral, and sediment samples
+**Primary Domain:** Earth sciences, petrology, geochemistry
+
+### Sample Profile: Basalt Core from Mid-Ocean Ridge
+
+#### Complete Graph Structure
+
+```
+MaterialSampleRecord (Basalt Core)
+ ββ produced_by ββββββββ SamplingEvent (2023 Drilling)
+ β ββ sample_location βββ GeospatialCoordLocation (45.5Β°N, -130.2Β°W, -2500m depth)
+ β ββ responsibility ββββ Agent (Dr. Maria Rodriguez)
+ β
+ ββ has_material_category ββ IdentifiedConcept (Basalt)
+ ββ has_context_category βββ IdentifiedConcept (Marine > Submerged terrestrial)
+ ββ has_sample_object_type ββ IdentifiedConcept (Core)
+ ββ keywords βββββββββββββββ IdentifiedConcept (MORB - Mid-Ocean Ridge Basalt)
+ ββ curation βββββββββββββββ MaterialSampleCuration (Lamont Core Repository)
+ β ββ responsibility ββββ Agent (Core Facility Manager)
+ ββ registrant βββββββββββββ Agent (SESAR Data Manager)
+```
+
+#### Full YAML Example
+
+```yaml
+# === SAMPLE NODE ===
+sample_basalt_core:
+ otype: MaterialSampleRecord
+ pid: "igsn:IESEA0001"
+ label: "Basalt core from Juan de Fuca Ridge"
+ description: >
+ Fresh basalt core, 6cm diameter, 15cm length.
+ Holocrystalline texture with plagioclase and pyroxene phenocrysts.
+ Collected from pillow basalt at mid-ocean ridge spreading center.
+ sample_identifier: "JDFR-2023-DR-001-C1"
+
+# === SAMPLING EVENT NODE ===
+event_drilling:
+ otype: SamplingEvent
+ pid: "event:jdfr-2023-dredge-001"
+ label: "Juan de Fuca Ridge Dredge 001, 2023"
+ description: >
+ Rock dredge operation from R/V Thompson.
+ Dredge deployed at 2500m depth on ridge axis.
+ Standard petrological sampling protocol.
+ result_time: "2023-08-22T10:45:00Z"
+ has_feature_of_interest: "Mid-ocean ridge basalt outcrop"
+ project: "NSF OCE-2023456: Juan de Fuca Ridge Magmatic Evolution"
+
+# === GEOSPATIAL COORDINATE NODE ===
+coords_sample:
+ otype: GeospatialCoordLocation
+ pid: "coords:jdfr-2023-dr-001"
+ latitude: 45.5
+ longitude: -130.2
+ elevation: "-2500 m below sea level"
+ obfuscated: false
+
+# === IDENTIFIED CONCEPT NODES ===
+concept_basalt:
+ otype: IdentifiedConcept
+ pid: "https://w3id.org/isample/vocabulary/material/0.9/basalt"
+ label: "Basalt"
+ scheme_name: "iSamples Material Type Vocabulary"
+
+concept_marine:
+ otype: IdentifiedConcept
+ pid: "https://w3id.org/isample/vocabulary/sampledfeature/0.9/marinewaterbody"
+ label: "Marine water body"
+ scheme_name: "iSamples Sampled Feature Vocabulary"
+
+concept_core:
+ otype: IdentifiedConcept
+ pid: "https://w3id.org/isample/vocabulary/materialsampleobjecttype/0.9/core"
+ label: "Core"
+ scheme_name: "iSamples Material Sample Object Type Vocabulary"
+
+keyword_morb:
+ otype: IdentifiedConcept
+ pid: "keyword:morb"
+ label: "MORB"
+ description: "Mid-Ocean Ridge Basalt"
+
+# === AGENT NODES ===
+agent_collector:
+ otype: Agent
+ pid: "https://orcid.org/0000-0003-5678-9012"
+ name: "Dr. Maria Rodriguez"
+ affiliation: "Scripps Institution of Oceanography"
+ role: "Chief Scientist"
+
+agent_curator:
+ otype: Agent
+ pid: "agent:lamont-core-manager"
+ name: "James Chen"
+ affiliation: "Lamont-Doherty Core Repository"
+ role: "Core Facility Manager"
+
+agent_registrant:
+ otype: Agent
+ pid: "agent:sesar-manager"
+ name: "SESAR Data Management Team"
+ affiliation: "Lamont-Doherty Earth Observatory"
+ role: "Sample Registry Manager"
+
+# === CURATION NODE ===
+curation_lamont:
+ otype: MaterialSampleCuration
+ pid: "curation:lamont-core-repo"
+ label: "Lamont-Doherty Core Repository"
+ description: >
+ World-class marine core repository.
+ Temperature-controlled storage, 4Β°C.
+ Catalog available online.
+ curation_location: "Lamont-Doherty Earth Observatory, Palisades, NY"
+ access_constraints: "Request access via SESAR portal. Sampling approval required."
+
+# === EDGES ===
+edge_produced_by:
+ s: sample_basalt_core
+ p: produced_by
+ o: [event_drilling]
+
+edge_sample_location:
+ s: event_drilling
+ p: sample_location
+ o: [coords_sample]
+
+edge_responsibility_event:
+ s: event_drilling
+ p: responsibility
+ o: [agent_collector]
+
+edge_material:
+ s: sample_basalt_core
+ p: has_material_category
+ o: [concept_basalt]
+
+edge_context:
+ s: sample_basalt_core
+ p: has_context_category
+ o: [concept_marine]
+
+edge_object_type:
+ s: sample_basalt_core
+ p: has_sample_object_type
+ o: [concept_core]
+
+edge_keyword:
+ s: sample_basalt_core
+ p: keywords
+ o: [keyword_morb]
+
+edge_curation:
+ s: sample_basalt_core
+ p: curation
+ o: [curation_lamont]
+
+edge_curation_responsibility:
+ s: curation_lamont
+ p: responsibility
+ o: [agent_curator]
+
+edge_registrant:
+ s: sample_basalt_core
+ p: registrant
+ o: [agent_registrant]
+```
+
+### Geology-Specific Patterns
+
+**What's unique:**
+- Heavy use of **curation** (samples stored in repositories)
+- **Negative elevations** for marine samples ("-2500 m below sea level")
+- **Formal project identifiers** (NSF grant numbers)
+- **Repository access constraints** (destructive sampling approval)
+- **Less use of keywords** (more reliance on formal material classification)
+
+**Edge types used (projected):** 10 of 14
+- β
produced_by, has_material_category, has_context_category, has_sample_object_type
+- β
keywords, registrant, curation, responsibility (Event), responsibility (Curation), sample_location
+- β related_resource, sampling_site, site_location, has_context_category (Event)
+
+---
+
+## Biology (GEOME - Projected)
+
+**Data Source:** GEOME (Genomic Observatories Metadatabase)
+**Dataset Size:** ~100K+ tissue and DNA samples from marine organisms
+**Primary Domain:** Marine biology, genomics, biodiversity
+
+### Sample Profile: Coral Tissue Sample from Pacific Reef
+
+#### Complete Graph Structure
+
+```
+MaterialSampleRecord (Tissue Sample)
+ ββ produced_by ββββββββ SamplingEvent (2024 Field Collection)
+ β ββ sampling_site ββββ SamplingSite (Palmyra Atoll Reef)
+ β β ββ site_location ββββ GeospatialCoordLocation (5.87Β°N, -162.08Β°W)
+ β ββ sample_location βββ GeospatialCoordLocation (5.8715Β°N, -162.0823Β°W)
+ β ββ responsibility ββββ Agent (Dr. Carlos Alvarez)
+ β
+ ββ has_material_category ββ IdentifiedConcept (Organic material > Tissue)
+ ββ has_context_category βββ IdentifiedConcept (Marine > Marine biome)
+ ββ has_sample_object_type ββ IdentifiedConcept (Specimen)
+ ββ keywords βββββββββββββββ IdentifiedConcept (Pocillopora damicornis)
+ ββ keywords βββββββββββββββ IdentifiedConcept (Coral)
+ ββ keywords βββββββββββββββ IdentifiedConcept (Scleractinia)
+ ββ related_resource βββββββ SampleRelation (Derived DNA extract)
+ ββ registrant βββββββββββββ Agent (GEOME Data Manager)
+
+# DNA extract linked via SampleRelation
+MaterialSampleRecord (DNA Extract)
+ ββ related_resource βββββββ SampleRelation (Derived from tissue)
+```
+
+#### Full YAML Example
+
+```yaml
+# === PARENT SAMPLE (TISSUE) ===
+sample_tissue:
+ otype: MaterialSampleRecord
+ pid: "igsn:IEGEN0001"
+ label: "Pocillopora damicornis tissue, Palmyra Atoll"
+ description: >
+ Tissue sample from branching coral colony.
+ Approximately 1cmΒ³ tissue preserved in 95% ethanol.
+ Colony health: excellent. No visible bleaching.
+ sample_identifier: "PALM-2024-CORAL-001-T"
+
+# === CHILD SAMPLE (DNA EXTRACT) ===
+sample_dna:
+ otype: MaterialSampleRecord
+ pid: "igsn:IEGEN0002"
+ label: "DNA extract from Pocillopora damicornis tissue PALM-2024-CORAL-001-T"
+ description: >
+ High molecular weight DNA extracted using Qiagen DNeasy kit.
+ Concentration: 45 ng/Β΅L. 260/280 ratio: 1.82.
+ sample_identifier: "PALM-2024-CORAL-001-DNA"
+
+# === SAMPLING EVENT ===
+event_collection:
+ otype: SamplingEvent
+ pid: "event:palmyra-2024-dive-005"
+ label: "Palmyra Atoll 2024, Dive 005"
+ description: >
+ SCUBA collection at 12m depth.
+ Reef flat dominated by Pocillopora and Porites.
+ Minimal impact sampling protocol (1cmΒ² fragments).
+ result_time: "2024-06-15T11:20:00Z"
+ has_feature_of_interest: "Coral reef ecosystem"
+ project: "NSF OCE-2024123: Pacific Coral Genomics"
+
+# === SAMPLING SITE ===
+site_palmyra:
+ otype: SamplingSite
+ pid: "site:palmyra-atoll-reef"
+ label: "Palmyra Atoll, Fore Reef Site A"
+ description: >
+ Pristine coral reef system. U.S. National Wildlife Refuge.
+ High coral cover (>50%). Minimal anthropogenic impact.
+ place_name:
+ - "Palmyra Atoll"
+ - "Palmyra Island"
+
+# === GEOSPATIAL COORDINATES ===
+coords_site:
+ otype: GeospatialCoordLocation
+ pid: "coords:palmyra-site-a"
+ latitude: 5.87
+ longitude: -162.08
+ elevation: "-12 m below sea level (dive depth)"
+ obfuscated: false
+
+coords_sample:
+ otype: GeospatialCoordLocation
+ pid: "coords:palmyra-dive-005-001"
+ latitude: 5.8715
+ longitude: -162.0823
+ elevation: "-12 m below sea level"
+ obfuscated: false
+
+# === IDENTIFIED CONCEPTS ===
+concept_tissue:
+ otype: IdentifiedConcept
+ pid: "https://w3id.org/isample/vocabulary/material/0.9/organicmaterial"
+ label: "Organic material > Tissue"
+ scheme_name: "iSamples Material Type Vocabulary"
+
+concept_marine_biome:
+ otype: IdentifiedConcept
+ pid: "https://w3id.org/isample/vocabulary/sampledfeature/0.9/marinebiome"
+ label: "Marine biome"
+ scheme_name: "iSamples Sampled Feature Vocabulary"
+
+concept_specimen:
+ otype: IdentifiedConcept
+ pid: "https://w3id.org/isample/vocabulary/materialsampleobjecttype/0.9/specimen"
+ label: "Specimen"
+ scheme_name: "iSamples Material Sample Object Type Vocabulary"
+
+keyword_species:
+ otype: IdentifiedConcept
+ pid: "taxon:pocillopora-damicornis"
+ label: "Pocillopora damicornis"
+ description: "Cauliflower coral"
+
+keyword_coral:
+ otype: IdentifiedConcept
+ pid: "keyword:coral"
+ label: "Coral"
+
+keyword_scleractinia:
+ otype: IdentifiedConcept
+ pid: "taxon:scleractinia"
+ label: "Scleractinia"
+ description: "Stony corals"
+
+# === AGENTS ===
+agent_collector:
+ otype: Agent
+ pid: "https://orcid.org/0000-0001-9876-5432"
+ name: "Dr. Carlos Alvarez"
+ affiliation: "University of HawaiΚ»i, HawaiΚ»i Institute of Marine Biology"
+ role: "Principal Investigator"
+
+agent_registrant:
+ otype: Agent
+ pid: "agent:geome-manager"
+ name: "GEOME Data Team"
+ affiliation: "Smithsonian Institution"
+ role: "Genomic Data Manager"
+
+# === SAMPLE RELATION (PARENT-CHILD) ===
+relation_dna_extract:
+ otype: SampleRelation
+ pid: "relation:tissue-to-dna-001"
+ label: "DNA extracted from tissue"
+ description: "High molecular weight DNA extraction for whole genome sequencing"
+ relationship: "derivedFrom"
+ target: "igsn:IEGEN0001" # Points to parent tissue sample
+
+# === EDGES ===
+# Tissue sample edges
+edge_tissue_event:
+ s: sample_tissue
+ p: produced_by
+ o: [event_collection]
+
+edge_tissue_material:
+ s: sample_tissue
+ p: has_material_category
+ o: [concept_tissue]
+
+edge_tissue_context:
+ s: sample_tissue
+ p: has_context_category
+ o: [concept_marine_biome]
+
+edge_tissue_object:
+ s: sample_tissue
+ p: has_sample_object_type
+ o: [concept_specimen]
+
+edge_tissue_keyword1:
+ s: sample_tissue
+ p: keywords
+ o: [keyword_species]
+
+edge_tissue_keyword2:
+ s: sample_tissue
+ p: keywords
+ o: [keyword_coral]
+
+edge_tissue_keyword3:
+ s: sample_tissue
+ p: keywords
+ o: [keyword_scleractinia]
+
+edge_tissue_registrant:
+ s: sample_tissue
+ p: registrant
+ o: [agent_registrant]
+
+# DNA sample β parent tissue relationship
+edge_dna_relation:
+ s: sample_dna
+ p: related_resource
+ o: [relation_dna_extract]
+
+# Event edges
+edge_event_site:
+ s: event_collection
+ p: sampling_site
+ o: [site_palmyra]
+
+edge_event_location:
+ s: event_collection
+ p: sample_location
+ o: [coords_sample]
+
+edge_event_responsibility:
+ s: event_collection
+ p: responsibility
+ o: [agent_collector]
+
+# Site edges
+edge_site_location:
+ s: site_palmyra
+ p: site_location
+ o: [coords_site]
+```
+
+### Biology-Specific Patterns
+
+**What's unique:**
+- Heavy use of **related_resource** (tissue β DNA β sequence data)
+- **Taxonomic keywords** (species names, higher taxa)
+- **Preservation methods** in descriptions ("95% ethanol")
+- **Sample chains** (organism β tissue β extract β library)
+- **Precise dive/collection coordinates**
+
+**Edge types used (projected):** 11 of 14
+- β
produced_by, has_material_category, has_context_category, has_sample_object_type
+- β
keywords, registrant, related_resource, sampling_site, sample_location, responsibility (Event), site_location
+- β curation, has_context_category (Event), responsibility (Curation)
+
+---
+
+## Cross-Domain Comparison
+
+### Entity Usage Comparison
+
+| Entity Type | Archaeology | Geology | Biology |
+|-------------|-------------|---------|---------|
+| MaterialSampleRecord | Pottery, bone, charcoal | Rocks, cores, minerals | Tissue, DNA, specimens |
+| SamplingEvent | Excavation, surface collection | Drilling, dredging | SCUBA, trap, net |
+| SamplingSite | Archaeological sites | Formations, localities | Reefs, stations, plots |
+| GeospatialCoordLocation | Depth below surface | Depth below sea level | Depth below sea level |
+| IdentifiedConcept | Cultural periods, pottery types | Rock types, minerals | Taxa, specimen types |
+| Agent | Archaeologists, curators | Geologists, repository staff | Marine biologists, geneticists |
+| MaterialSampleCuration | Rarely used | Core repositories | Biobanks, tissue collections |
+| SampleRelation | Rare | Rare | Common (parent-child chains) |
+
+### Predicate Usage Comparison
+
+| Predicate | Archaeology | Geology | Biology |
+|-----------|-------------|---------|---------|
+| produced_by | β
Every sample | β
Every sample | β
Every sample |
+| has_material_category | Pottery, bone, stone | Basalt, granite, sediment | Tissue, DNA, whole organism |
+| has_context_category | Terrestrial/Archaeological | Marine, Terrestrial, Subsurface | Marine biome, Terrestrial |
+| has_sample_object_type | Sherd, artifact | Core, hand specimen | Specimen, tissue |
+| keywords | Cultural terms, periods | Rock types, formation names | Taxonomic names |
+| registrant | β
Common | β
Common | β
Common |
+| curation | β Rare | β
Very common | βͺ Sometimes |
+| related_resource | β Rare | β Rare | β
Very common |
+| sampling_site | β
Common (site names) | βͺ Sometimes | β
Common (stations, reefs) |
+| sample_location | β
Very common | β
Very common | β
Very common |
+| responsibility (Event) | β
Common | β
Common | β
Common |
+| has_context_category (Event) | β Not used | β Not used | β Not used |
+| site_location | β
Common | βͺ Sometimes | β
Common |
+| responsibility (Curation) | β Not used | β
Common | βͺ Sometimes |
+
+### Material Type Patterns
+
+**Archaeology:**
+- Anthropogenic material (pottery, glass, metal)
+- Organic material (bone, charcoal, wood)
+- Rock (stone tools, building materials)
+- Soil (sediment samples)
+
+**Geology:**
+- Rock (igneous, sedimentary, metamorphic)
+- Mineral (individual mineral specimens)
+- Sediment (unconsolidated material)
+- Fluid (water, hydrothermal fluids)
+
+**Biology:**
+- Organic material (tissue, DNA, whole organisms)
+- Liquid water (seawater, freshwater samples)
+- Biogenic non-organic material (shells, coral skeleton)
+
+---
+
+## Domain-Specific Patterns
+
+### Pattern 1: Archaeological Depth Notation
+
+**Challenge:** Archaeologists measure depth **below surface**, not elevation above sea level.
+
+**Solution:** Use elevation field with descriptive text:
+```yaml
+elevation: "3.2 m below surface"
+elevation: "Level 5, 4.8 m below datum"
+```
+
+### Pattern 2: Marine Sample Depths
+
+**Challenge:** Marine samples need **negative elevation** (below sea level).
+
+**Solution:**
+```yaml
+elevation: "-2500 m below sea level"
+elevation: "-12 m (dive depth)"
+```
+
+### Pattern 3: Sample Chains (Biology)
+
+**Challenge:** Tissue β DNA β Sequencing Library are all samples.
+
+**Solution:** Use `related_resource` with `derivedFrom` relationship:
+```yaml
+# DNA sample points to tissue parent
+dna_sample:
+ related_resource:
+ - relationship: "derivedFrom"
+ target: "igsn:TISSUE001"
+
+# Sequencing library points to DNA parent
+library_sample:
+ related_resource:
+ - relationship: "derivedFrom"
+ target: "igsn:DNA001"
+```
+
+### Pattern 4: Repository Storage (Geology)
+
+**Challenge:** Core samples stored in repositories with access constraints.
+
+**Solution:** Use `curation` entity:
+```yaml
+sample:
+ curation:
+ label: "Lamont-Doherty Core Repository"
+ curation_location: "Palisades, NY"
+ access_constraints: "Destructive sampling requires approval"
+ responsibility:
+ - name: "Core Facility Manager"
+```
+
+### Pattern 5: Multi-lingual Site Names (Archaeology)
+
+**Challenge:** Archaeological sites have multiple name spellings.
+
+**Solution:** Use `place_name` array:
+```yaml
+site:
+ place_name:
+ - "ΓatalhΓΆyΓΌk"
+ - "Γatal HΓΆyΓΌk"
+ - "Chatal Huyuk"
+ - "Ψ¬Ψ§ΨͺΨ§Ω ΩΩΩΩΩ" # Arabic
+```
+
+---
+
+## Summary
+
+**Key Takeaways:**
+
+1. **Same schema, different values** - The iSamples model truly works across domains
+2. **10-11 of 14 predicates used per domain** - Different domains use different subsets
+3. **Context category distinguishes domains** - Terrestrial/Archaeological vs Marine biome vs Subsurface
+4. **Material category is domain-specific** - Pottery vs Basalt vs Tissue
+5. **Curation patterns differ** - Geology stores cores, archaeology often doesn't track storage
+6. **Related_resource is biology-heavy** - Sample chains common in genomics, rare elsewhere
+
+**Design Wisdom:**
+
+β
**Universal model:** 8 entity types work across all domains
+β
**Flexible values:** Controlled vocabularies adapt to domain needs
+β
**Optional predicates:** Each domain uses relevant subset
+β
**Extensible:** Can add domain-specific keywords without schema changes
+
+**Next steps:**
+- [QUERYING_THE_GRAPH.md](./QUERYING_THE_GRAPH.md) - SQL patterns for cross-domain queries
+- [EDGE_TYPES_VISUAL.md](./EDGE_TYPES_VISUAL.md) - Visual diagrams of patterns
+
+---
+
+**Document Version:** 1.0
+**Last Updated:** 2025-11-14
+**Schema Version:** 20250207 (MaterialSampleRecord)
+**Author:** Claude Code (Sonnet 4.5)
diff --git a/src/docs/PREDICATES_REFERENCE.md b/src/docs/PREDICATES_REFERENCE.md
new file mode 100644
index 00000000..638c516e
--- /dev/null
+++ b/src/docs/PREDICATES_REFERENCE.md
@@ -0,0 +1,1009 @@
+# iSamples Predicates Reference
+
+**Purpose:** Detailed reference for each of the 14 relationship types (predicates) in the iSamples property graph.
+
+**Audience:** Developers querying iSamples data, data providers creating metadata, tool builders integrating with iSamples.
+
+---
+
+## Quick Reference Table
+
+| Predicate | Subject β Object | Cardinality | Required | Description |
+|-----------|------------------|-------------|----------|-------------|
+| [produced_by](#produced_by) | MaterialSampleRecord β SamplingEvent | One | β
Yes | Sample creation event |
+| [has_material_category](#has_material_category) | MaterialSampleRecord β IdentifiedConcept | Many | β
Yes | Material type |
+| [has_context_category](#has_context_category) | MaterialSampleRecord β IdentifiedConcept | Many | β
Yes | Domain context |
+| [has_sample_object_type](#has_sample_object_type) | MaterialSampleRecord β IdentifiedConcept | Many | β
Yes | Physical form |
+| [keywords](#keywords) | MaterialSampleRecord β IdentifiedConcept | Many | βͺ No | Discovery keywords |
+| [registrant](#registrant) | MaterialSampleRecord β Agent | One | βͺ No | Registering agent |
+| [curation](#curation) | MaterialSampleRecord β MaterialSampleCuration | One | βͺ No | Storage info |
+| [related_resource](#related_resource) | MaterialSampleRecord β SampleRelation | Many | βͺ No | Sample relationships |
+| [sampling_site](#sampling_site) | SamplingEvent β SamplingSite | One | βͺ No | Named location |
+| [sample_location](#sample_location) | SamplingEvent β GeospatialCoordLocation | One | βͺ No | Precise coords |
+| [responsibility](#responsibility-samplingevent) | SamplingEvent β Agent | Many | βͺ No | Collectors |
+| [has_context_category](#has_context_category-samplingevent) | SamplingEvent β IdentifiedConcept | Many | βͺ No | Event context |
+| [site_location](#site_location) | SamplingSite β GeospatialCoordLocation | One | βͺ No | Site coords |
+| [responsibility](#responsibility-materialsamplecuration) | MaterialSampleCuration β Agent | Many | βͺ No | Curators |
+
+---
+
+## MaterialSampleRecord Predicates
+
+### produced_by
+
+**Type:** MaterialSampleRecord β SamplingEvent
+**Cardinality:** One (required)
+**Required:** β
Yes
+
+#### Purpose
+Links a material sample to the event that created/collected it. This is the **most important relationship** in iSamples - every sample must have provenance.
+
+#### Controlled Vocabulary
+Not applicable - targets a SamplingEvent node.
+
+#### Usage Example
+
+**YAML:**
+```yaml
+# Sample node
+sample_001:
+ otype: MaterialSampleRecord
+ pid: "igsn:SSH000001"
+ label: "Pottery sherd from Trench 5"
+
+# Event node
+event_001:
+ otype: SamplingEvent
+ pid: "event:2023-catal-t5-001"
+ label: "2023 Excavation, Trench 5, Level 3"
+ result_time: "2023-07-15"
+
+# Edge
+edge_001:
+ otype: _edge_
+ s: sample_001 # Subject: the sample
+ p: produced_by # Predicate
+ o: [event_001] # Object: the event
+```
+
+#### SQL Query Pattern
+
+**Find all samples and their collection dates:**
+```sql
+SELECT
+ sample.pid AS sample_id,
+ sample.label AS sample_label,
+ event.pid AS event_id,
+ event.label AS event_label,
+ event.result_time AS collection_date
+FROM pqg AS sample
+JOIN pqg AS edge
+ ON edge.s = sample.row_id
+ AND edge.p = 'produced_by'
+ AND edge.otype = '_edge_'
+JOIN pqg AS event
+ ON event.row_id = ANY(edge.o)
+WHERE sample.otype = 'MaterialSampleRecord';
+```
+
+**Find samples collected in a specific time range:**
+```sql
+SELECT
+ sample.pid,
+ sample.label,
+ event.result_time
+FROM pqg AS sample
+JOIN pqg AS edge ON edge.s = sample.row_id AND edge.p = 'produced_by'
+JOIN pqg AS event ON event.row_id = ANY(edge.o)
+WHERE sample.otype = 'MaterialSampleRecord'
+ AND event.result_time BETWEEN '2023-01-01' AND '2023-12-31';
+```
+
+#### OpenContext Data Stats
+- **Frequency:** 1,096,352 relationships (one per sample)
+- **Unique subjects:** 1,096,352 samples
+- **Unique objects:** 1,096,352 events (1:1 ratio)
+
+#### Common Issues
+
+β **Missing produced_by:**
+```
+Error: MaterialSampleRecord must have produced_by relationship
+```
+**Solution:** Every sample requires a SamplingEvent.
+
+β **Multiple produced_by:**
+```
+Warning: Sample has multiple produced_by edges (should be one)
+```
+**Solution:** Cardinality is ONE - use `related_resource` to link derived samples.
+
+---
+
+### has_material_category
+
+**Type:** MaterialSampleRecord β IdentifiedConcept
+**Cardinality:** Many (required, minimum 1)
+**Required:** β
Yes
+
+#### Purpose
+Classifies the physical material composition of the sample. Uses controlled vocabulary from iSamples Material Type Vocabulary.
+
+#### Controlled Vocabulary
+[iSamples Material Type Vocabulary](https://w3id.org/isample/vocabulary/material/)
+
+**Top-level categories:**
+- Rock
+- Mineral
+- Organic material
+- Liquid water
+- Anthropogenic material (includes pottery, glass, metals)
+- Biogenic non-organic material
+- Natural solid material
+- Soil
+- Particulate
+- Fluid (non-water)
+
+**Example subcategories:**
+- Rock β Igneous rock β Basalt
+- Anthropogenic material β Pottery β Earthenware
+- Organic material β Tissue β Bone
+
+#### Usage Example
+
+**YAML:**
+```yaml
+# Sample node
+sample_001:
+ otype: MaterialSampleRecord
+ pid: "igsn:SSH000001"
+ label: "Ceramic bowl"
+
+# Concept nodes (from controlled vocabulary)
+concept_earthenware:
+ otype: IdentifiedConcept
+ pid: "https://w3id.org/isample/vocabulary/material/0.9/earthenware"
+ label: "Earthenware"
+ scheme_name: "iSamples Material Type Vocabulary"
+
+concept_anthropogenic:
+ otype: IdentifiedConcept
+ pid: "https://w3id.org/isample/vocabulary/material/0.9/anthropogenicmaterial"
+ label: "Anthropogenic material"
+
+# Edges (multivalued - can have multiple material types)
+edge_001:
+ s: sample_001
+ p: has_material_category
+ o: [concept_earthenware]
+
+edge_002:
+ s: sample_001
+ p: has_material_category
+ o: [concept_anthropogenic]
+```
+
+#### SQL Query Pattern
+
+**Find all samples of a specific material type:**
+```sql
+SELECT
+ sample.pid,
+ sample.label,
+ concept.label AS material_type
+FROM pqg AS sample
+JOIN pqg AS edge
+ ON edge.s = sample.row_id
+ AND edge.p = 'has_material_category'
+JOIN pqg AS concept
+ ON concept.row_id = ANY(edge.o)
+WHERE sample.otype = 'MaterialSampleRecord'
+ AND concept.label ILIKE '%earthenware%';
+```
+
+**Count samples by material type:**
+```sql
+SELECT
+ concept.label AS material_type,
+ COUNT(DISTINCT sample.pid) AS sample_count
+FROM pqg AS sample
+JOIN pqg AS edge ON edge.s = sample.row_id AND edge.p = 'has_material_category'
+JOIN pqg AS concept ON concept.row_id = ANY(edge.o)
+WHERE sample.otype = 'MaterialSampleRecord'
+GROUP BY concept.label
+ORDER BY sample_count DESC;
+```
+
+#### OpenContext Data Stats
+- **Frequency:** 1,096,352 relationships
+- **Unique subjects:** 1,096,352 samples (one per sample)
+- **Unique objects:** 10 material type concepts
+
+**Top material types in OpenContext:**
+1. Anthropogenic material (pottery, artifacts)
+2. Rock (stone tools, building materials)
+3. Organic material (bone, charcoal)
+4. Soil (sediment samples)
+
+#### Common Issues
+
+β **Using free text instead of controlled vocabulary:**
+```yaml
+# Wrong:
+has_material_category: "pottery"
+
+# Right:
+has_material_category:
+ - pid: "https://w3id.org/isample/vocabulary/material/0.9/earthenware"
+ label: "Earthenware"
+```
+
+---
+
+### has_context_category
+
+**Type:** MaterialSampleRecord β IdentifiedConcept
+**Cardinality:** Many (required, minimum 1)
+**Required:** β
Yes
+
+#### Purpose
+Classifies the broad context or sampled feature type. Indicates the domain (archaeology, marine biology, geology, etc.) and environment.
+
+#### Controlled Vocabulary
+[iSamples Sampled Feature Vocabulary](https://w3id.org/isample/vocabulary/sampledfeature/)
+
+**Top-level categories:**
+- Terrestrial environment
+ - Archaeological
+ - Subsurface
+ - Surface
+- Marine environment
+ - Marine biome
+ - Marine water body
+ - Submerged terrestrial
+- Atmosphere
+- Extraterrestrial
+- Laboratory or production environment
+
+#### Usage Example
+
+**YAML:**
+```yaml
+sample_001:
+ otype: MaterialSampleRecord
+ pid: "igsn:SSH000001"
+
+concept_archaeological:
+ otype: IdentifiedConcept
+ pid: "https://w3id.org/isample/vocabulary/sampledfeature/0.9/terrestrial_archaeological"
+ label: "Terrestrial environment > Archaeological site"
+ scheme_name: "iSamples Sampled Feature Vocabulary"
+
+edge:
+ s: sample_001
+ p: has_context_category
+ o: [concept_archaeological]
+```
+
+#### SQL Query Pattern
+
+**Find all archaeological samples:**
+```sql
+SELECT
+ sample.pid,
+ sample.label,
+ concept.label AS context
+FROM pqg AS sample
+JOIN pqg AS edge ON edge.s = sample.row_id AND edge.p = 'has_context_category'
+JOIN pqg AS concept ON concept.row_id = ANY(edge.o)
+WHERE sample.otype = 'MaterialSampleRecord'
+ AND concept.label ILIKE '%archaeological%';
+```
+
+#### OpenContext Data Stats
+- **Frequency:** 1,096,352 relationships
+- **Unique subjects:** 1,096,352 samples
+- **Unique objects:** 2 context concepts (OpenContext is archaeology-focused)
+
+---
+
+### has_sample_object_type
+
+**Type:** MaterialSampleRecord β IdentifiedConcept
+**Cardinality:** Many (required, minimum 1)
+**Required:** β
Yes
+
+#### Purpose
+Describes the physical form or object type of the sample. Answers "What kind of object is this?"
+
+#### Controlled Vocabulary
+[iSamples Material Sample Object Type Vocabulary](https://w3id.org/isample/vocabulary/materialsampleobjecttype/)
+
+**Common types:**
+- Core
+- Hand specimen
+- Thin section
+- Powder
+- Cube
+- Sherd (pottery fragment)
+- Specimen (biological)
+- Aggregate (multiple pieces)
+- Other solid object
+
+#### Usage Example
+
+**YAML:**
+```yaml
+sample_001:
+ otype: MaterialSampleRecord
+ pid: "igsn:SSH000001"
+ label: "Pottery fragment"
+
+concept_sherd:
+ otype: IdentifiedConcept
+ pid: "https://w3id.org/isample/vocabulary/materialsampleobjecttype/0.9/sherd"
+ label: "Sherd"
+ scheme_name: "iSamples Material Sample Object Type Vocabulary"
+
+edge:
+ s: sample_001
+ p: has_sample_object_type
+ o: [concept_sherd]
+```
+
+#### SQL Query Pattern
+
+**Find all core samples:**
+```sql
+SELECT
+ sample.pid,
+ sample.label,
+ concept.label AS object_type
+FROM pqg AS sample
+JOIN pqg AS edge ON edge.s = sample.row_id AND edge.p = 'has_sample_object_type'
+JOIN pqg AS concept ON concept.row_id = ANY(edge.o)
+WHERE concept.label = 'Core';
+```
+
+#### OpenContext Data Stats
+- **Frequency:** 1,096,352 relationships
+- **Unique subjects:** 1,096,352 samples
+- **Unique objects:** 5 object type concepts
+
+---
+
+### keywords
+
+**Type:** MaterialSampleRecord β IdentifiedConcept
+**Cardinality:** Many (optional)
+**Required:** βͺ No
+
+#### Purpose
+Free-text keywords for discovery and search. Can include taxonomic names, geographic terms, cultural periods, etc.
+
+#### Controlled Vocabulary
+Not strictly controlled - can use various vocabularies or free text wrapped in IdentifiedConcept.
+
+#### Usage Example
+
+**YAML:**
+```yaml
+sample_001:
+ otype: MaterialSampleRecord
+ pid: "igsn:SSH000001"
+
+keyword_neolithic:
+ otype: IdentifiedConcept
+ pid: "keyword:neolithic"
+ label: "Neolithic"
+
+keyword_pottery:
+ otype: IdentifiedConcept
+ pid: "keyword:pottery"
+ label: "Pottery"
+
+keyword_catalhoyuk:
+ otype: IdentifiedConcept
+ pid: "keyword:catalhoyuk"
+ label: "ΓatalhΓΆyΓΌk"
+
+# Multiple edges for multiple keywords
+edge_001:
+ s: sample_001
+ p: keywords
+ o: [keyword_neolithic]
+
+edge_002:
+ s: sample_001
+ p: keywords
+ o: [keyword_pottery]
+
+edge_003:
+ s: sample_001
+ p: keywords
+ o: [keyword_catalhoyuk]
+```
+
+#### SQL Query Pattern
+
+**Find samples with specific keyword:**
+```sql
+SELECT
+ sample.pid,
+ sample.label,
+ concept.label AS keyword
+FROM pqg AS sample
+JOIN pqg AS edge ON edge.s = sample.row_id AND edge.p = 'keywords'
+JOIN pqg AS concept ON concept.row_id = ANY(edge.o)
+WHERE concept.label ILIKE '%neolithic%';
+```
+
+**Find samples with multiple keywords (AND logic):**
+```sql
+WITH sample_keywords AS (
+ SELECT
+ sample.pid,
+ sample.label,
+ concept.label AS keyword
+ FROM pqg AS sample
+ JOIN pqg AS edge ON edge.s = sample.row_id AND edge.p = 'keywords'
+ JOIN pqg AS concept ON concept.row_id = ANY(edge.o)
+)
+SELECT pid, label
+FROM sample_keywords
+WHERE keyword IN ('Neolithic', 'Pottery')
+GROUP BY pid, label
+HAVING COUNT(DISTINCT keyword) = 2;
+```
+
+#### OpenContext Data Stats
+- **Frequency:** 1,096,297 relationships (not all samples have keywords)
+- **Unique subjects:** 1,096,297 samples
+- **Unique objects:** 4,033 unique keyword concepts
+
+---
+
+### registrant
+
+**Type:** MaterialSampleRecord β Agent
+**Cardinality:** One (optional)
+**Required:** βͺ No
+
+#### Purpose
+Identifies the person or organization that registered the sample metadata.
+
+#### Usage Example
+
+**YAML:**
+```yaml
+sample_001:
+ otype: MaterialSampleRecord
+ pid: "igsn:SSH000001"
+
+agent_curator:
+ otype: Agent
+ pid: "https://orcid.org/0000-0002-1234-5678"
+ name: "Jane Smith"
+ affiliation: "OpenContext"
+ role: "Data Curator"
+
+edge:
+ s: sample_001
+ p: registrant
+ o: [agent_curator]
+```
+
+#### SQL Query Pattern
+
+**Find all samples registered by a specific person:**
+```sql
+SELECT
+ sample.pid,
+ sample.label,
+ agent.name AS registrant_name
+FROM pqg AS sample
+JOIN pqg AS edge ON edge.s = sample.row_id AND edge.p = 'registrant'
+JOIN pqg AS agent ON agent.row_id = ANY(edge.o)
+WHERE agent.name ILIKE '%smith%';
+```
+
+#### OpenContext Data Stats
+- **Frequency:** 413,635 relationships (38% of samples)
+- **Unique subjects:** 413,635 samples
+- **Unique objects:** 340 agents
+
+---
+
+### curation
+
+**Type:** MaterialSampleRecord β MaterialSampleCuration
+**Cardinality:** One (optional)
+**Required:** βͺ No
+
+#### Purpose
+Links sample to its curation information (storage location, access constraints, curation history).
+
+#### Usage Example
+
+**YAML:**
+```yaml
+sample_001:
+ otype: MaterialSampleRecord
+ pid: "igsn:SSH000001"
+
+curation_001:
+ otype: MaterialSampleCuration
+ pid: "curation:smithsonian-nmnh-001"
+ label: "Smithsonian NMNH Anthropology Collection"
+ curation_location: "National Museum of Natural History, Washington DC"
+ access_constraints: "Appointment required"
+
+edge:
+ s: sample_001
+ p: curation
+ o: [curation_001]
+```
+
+#### SQL Query Pattern
+
+**Find samples stored at specific location:**
+```sql
+SELECT
+ sample.pid,
+ sample.label,
+ curation.label AS collection_name,
+ curation.curation_location
+FROM pqg AS sample
+JOIN pqg AS edge ON edge.s = sample.row_id AND edge.p = 'curation'
+JOIN pqg AS curation ON curation.row_id = ANY(edge.o)
+WHERE curation.curation_location ILIKE '%smithsonian%';
+```
+
+#### OpenContext Data Stats
+- **Frequency:** 0 (OpenContext does not track curation information)
+
+---
+
+### related_resource
+
+**Type:** MaterialSampleRecord β SampleRelation
+**Cardinality:** Many (optional)
+**Required:** βͺ No
+
+#### Purpose
+Links sample to other samples via defined relationships (parent-child, sibling, etc.).
+
+#### Usage Example
+
+**YAML:**
+```yaml
+# Parent sample
+parent_sample:
+ otype: MaterialSampleRecord
+ pid: "igsn:SSH000001"
+ label: "Whole rock core"
+
+# Child sample
+child_sample:
+ otype: MaterialSampleRecord
+ pid: "igsn:SSH000002"
+ label: "Thin section from core"
+
+# Relation describing the connection
+relation_001:
+ otype: SampleRelation
+ pid: "relation:subsample-001"
+ label: "Thin section derived from core"
+ relationship: "derivedFrom"
+ target: "igsn:SSH000001" # Points to parent
+
+# Edge from child to relation
+edge:
+ s: child_sample
+ p: related_resource
+ o: [relation_001]
+```
+
+#### SQL Query Pattern
+
+**Find all child samples of a parent:**
+```sql
+SELECT
+ child.pid AS child_pid,
+ child.label AS child_label,
+ relation.relationship AS relation_type,
+ parent.pid AS parent_pid,
+ parent.label AS parent_label
+FROM pqg AS child
+JOIN pqg AS edge ON edge.s = child.row_id AND edge.p = 'related_resource'
+JOIN pqg AS relation ON relation.row_id = ANY(edge.o)
+JOIN pqg AS parent ON parent.pid = relation.target
+WHERE parent.pid = 'igsn:SSH000001';
+```
+
+#### OpenContext Data Stats
+- **Frequency:** 0 (OpenContext does not track sample relationships)
+
+---
+
+## SamplingEvent Predicates
+
+### sampling_site
+
+**Type:** SamplingEvent β SamplingSite
+**Cardinality:** One (optional)
+**Required:** βͺ No
+
+#### Purpose
+Links sampling event to a named sampling site.
+
+#### Usage Example
+
+**YAML:**
+```yaml
+event_001:
+ otype: SamplingEvent
+ pid: "event:2023-catal-001"
+
+site_001:
+ otype: SamplingSite
+ pid: "site:catalhoyuk-south"
+ label: "ΓatalhΓΆyΓΌk South Area"
+ place_name: ["ΓatalhΓΆyΓΌk", "Γatal HΓΆyΓΌk"]
+
+edge:
+ s: event_001
+ p: sampling_site
+ o: [site_001]
+```
+
+#### SQL Query Pattern
+
+**Find all samples from a specific site:**
+```sql
+SELECT
+ sample.pid,
+ sample.label,
+ site.label AS site_name
+FROM pqg AS sample
+JOIN pqg AS edge1 ON edge1.s = sample.row_id AND edge1.p = 'produced_by'
+JOIN pqg AS event ON event.row_id = ANY(edge1.o)
+JOIN pqg AS edge2 ON edge2.s = event.row_id AND edge2.p = 'sampling_site'
+JOIN pqg AS site ON site.row_id = ANY(edge2.o)
+WHERE site.label ILIKE '%Γ§atalhΓΆyΓΌk%';
+```
+
+#### OpenContext Data Stats
+- **Frequency:** 1,096,352 relationships
+- **Unique subjects:** 1,096,352 events
+- **Unique objects:** 18,213 sites
+
+---
+
+### sample_location
+
+**Type:** SamplingEvent β GeospatialCoordLocation
+**Cardinality:** One (optional)
+**Required:** βͺ No
+
+#### Purpose
+Precise geographic coordinates where sample was collected.
+
+#### Usage Example
+
+**YAML:**
+```yaml
+event_001:
+ otype: SamplingEvent
+ pid: "event:2023-catal-001"
+
+coords_001:
+ otype: GeospatialCoordLocation
+ pid: "coords:37.6665-32.8274"
+ latitude: 37.6665
+ longitude: 32.8274
+ elevation: "1015 m above mean sea level"
+
+edge:
+ s: event_001
+ p: sample_location
+ o: [coords_001]
+```
+
+#### SQL Query Pattern
+
+**Find all samples with coordinates (most important query!):**
+```sql
+SELECT
+ sample.pid,
+ sample.label,
+ coords.latitude,
+ coords.longitude,
+ coords.elevation
+FROM pqg AS sample
+JOIN pqg AS edge1 ON edge1.s = sample.row_id AND edge1.p = 'produced_by'
+JOIN pqg AS event ON event.row_id = ANY(edge1.o)
+JOIN pqg AS edge2 ON edge2.s = event.row_id AND edge2.p = 'sample_location'
+JOIN pqg AS coords ON coords.row_id = ANY(edge2.o)
+WHERE sample.otype = 'MaterialSampleRecord'
+ AND coords.latitude IS NOT NULL;
+```
+
+**Find samples within bounding box:**
+```sql
+SELECT
+ sample.pid,
+ coords.latitude,
+ coords.longitude
+FROM pqg AS sample
+JOIN pqg AS edge1 ON edge1.s = sample.row_id AND edge1.p = 'produced_by'
+JOIN pqg AS event ON event.row_id = ANY(edge1.o)
+JOIN pqg AS edge2 ON edge2.s = event.row_id AND edge2.p = 'sample_location'
+JOIN pqg AS coords ON coords.row_id = ANY(edge2.o)
+WHERE coords.latitude BETWEEN 37.0 AND 38.0
+ AND coords.longitude BETWEEN 32.0 AND 33.0;
+```
+
+#### OpenContext Data Stats
+- **Frequency:** 1,096,274 relationships (99.99% of events)
+- **Unique subjects:** 1,096,274 events
+- **Unique objects:** 190,566 coordinate pairs
+
+---
+
+### responsibility (SamplingEvent)
+
+**Type:** SamplingEvent β Agent
+**Cardinality:** Many (optional)
+**Required:** βͺ No
+
+#### Purpose
+Identifies person(s) responsible for sample collection at the event.
+
+#### Usage Example
+
+**YAML:**
+```yaml
+event_001:
+ otype: SamplingEvent
+ pid: "event:2023-catal-001"
+
+agent_001:
+ otype: Agent
+ pid: "https://orcid.org/0000-0002-1234-5678"
+ name: "Dr. Jane Smith"
+ role: "Principal Investigator"
+
+agent_002:
+ otype: Agent
+ pid: "https://orcid.org/0000-0002-5678-1234"
+ name: "John Doe"
+ role: "Field Technician"
+
+edge_001:
+ s: event_001
+ p: responsibility
+ o: [agent_001]
+
+edge_002:
+ s: event_001
+ p: responsibility
+ o: [agent_002]
+```
+
+#### SQL Query Pattern
+
+**Find all samples collected by specific person:**
+```sql
+SELECT
+ sample.pid,
+ sample.label,
+ agent.name AS collector
+FROM pqg AS sample
+JOIN pqg AS edge1 ON edge1.s = sample.row_id AND edge1.p = 'produced_by'
+JOIN pqg AS event ON event.row_id = ANY(edge1.o)
+JOIN pqg AS edge2 ON edge2.s = event.row_id AND edge2.p = 'responsibility'
+JOIN pqg AS agent ON agent.row_id = ANY(edge2.o)
+WHERE agent.name ILIKE '%smith%';
+```
+
+#### OpenContext Data Stats
+- **Frequency:** 1,095,272 relationships
+- **Unique subjects:** 1,095,272 events
+- **Unique objects:** 197 agents
+
+---
+
+### has_context_category (SamplingEvent)
+
+**Type:** SamplingEvent β IdentifiedConcept
+**Cardinality:** Many (optional)
+**Required:** βͺ No
+
+#### Purpose
+Context classification at the event level (separate from sample-level context).
+
+#### Usage Example
+
+**YAML:**
+```yaml
+event_001:
+ otype: SamplingEvent
+ pid: "event:marine-expedition-001"
+
+concept_marine:
+ otype: IdentifiedConcept
+ pid: "https://w3id.org/isample/vocabulary/sampledfeature/0.9/marinebiome"
+ label: "Marine biome"
+
+edge:
+ s: event_001
+ p: has_context_category
+ o: [concept_marine]
+```
+
+#### OpenContext Data Stats
+- **Frequency:** 0 (OpenContext does not use event-level context)
+
+---
+
+## SamplingSite Predicates
+
+### site_location
+
+**Type:** SamplingSite β GeospatialCoordLocation
+**Cardinality:** One (optional)
+**Required:** βͺ No
+
+#### Purpose
+Geographic coordinates for the sampling site (typically less precise than sample_location).
+
+#### Usage Example
+
+**YAML:**
+```yaml
+site_001:
+ otype: SamplingSite
+ pid: "site:catalhoyuk"
+ label: "ΓatalhΓΆyΓΌk"
+
+coords_site:
+ otype: GeospatialCoordLocation
+ pid: "coords:site-catalhoyuk"
+ latitude: 37.666
+ longitude: 32.827
+ elevation: "1000 m above mean sea level"
+
+edge:
+ s: site_001
+ p: site_location
+ o: [coords_site]
+```
+
+#### SQL Query Pattern
+
+**Find all sites with coordinates:**
+```sql
+SELECT
+ site.pid,
+ site.label AS site_name,
+ coords.latitude,
+ coords.longitude
+FROM pqg AS site
+JOIN pqg AS edge ON edge.s = site.row_id AND edge.p = 'site_location'
+JOIN pqg AS coords ON coords.row_id = ANY(edge.o)
+WHERE site.otype = 'SamplingSite';
+```
+
+#### OpenContext Data Stats
+- **Frequency:** 18,213 relationships
+- **Unique subjects:** 18,213 sites
+- **Unique objects:** 18,213 coordinate pairs (1:1)
+
+---
+
+## MaterialSampleCuration Predicates
+
+### responsibility (MaterialSampleCuration)
+
+**Type:** MaterialSampleCuration β Agent
+**Cardinality:** Many (optional)
+**Required:** βͺ No
+
+#### Purpose
+Identifies person(s) responsible for sample curation.
+
+#### Usage Example
+
+**YAML:**
+```yaml
+curation_001:
+ otype: MaterialSampleCuration
+ pid: "curation:smithsonian-001"
+
+agent_curator:
+ otype: Agent
+ pid: "curator:jsmith"
+ name: "Jane Smith"
+ role: "Collection Manager"
+
+edge:
+ s: curation_001
+ p: responsibility
+ o: [agent_curator]
+```
+
+#### OpenContext Data Stats
+- **Frequency:** 0 (OpenContext does not track curation)
+
+---
+
+## Cross-Reference: Predicate Usage by Domain
+
+### OpenContext (Archaeology) - Uses 10 of 14
+
+β
**Used:**
+1. produced_by
+2. has_material_category
+3. has_context_category
+4. has_sample_object_type
+5. keywords
+6. registrant
+7. sampling_site
+8. sample_location
+9. responsibility (SamplingEvent)
+10. site_location
+
+β **Not used:**
+- curation
+- related_resource
+- has_context_category (SamplingEvent)
+- responsibility (MaterialSampleCuration)
+
+### Expected SESAR (Geology) - Projected 8-10 of 14
+
+β
**Likely used:**
+1. produced_by
+2. has_material_category
+3. has_context_category
+4. has_sample_object_type
+5. sample_location
+6. curation
+7. responsibility (SamplingEvent)
+8. responsibility (MaterialSampleCuration)
+
+### Expected GEOME (Biology) - Projected 9-11 of 14
+
+β
**Likely used:**
+1. produced_by
+2. has_material_category
+3. has_context_category
+4. has_sample_object_type
+5. keywords
+6. related_resource (parent-child samples)
+7. sampling_site
+8. sample_location
+9. responsibility (SamplingEvent)
+
+---
+
+## Summary
+
+**Key Takeaways:**
+
+1. **4 predicates are required** - produced_by, has_material_category, has_context_category, has_sample_object_type
+2. **3 predicates involve coordinates** - sample_location, site_location (plus is_part_of for nested sites)
+3. **3 predicates involve agents** - registrant, responsibility (SamplingEvent), responsibility (MaterialSampleCuration)
+4. **2 predicates share names** - responsibility, has_context_category (different subjects)
+5. **Different domains use different subsets** - Same schema, different instantiations
+
+**Next steps:**
+- [EXAMPLES_BY_DOMAIN.md](./EXAMPLES_BY_DOMAIN.md) - See these predicates in real-world examples
+- [QUERYING_THE_GRAPH.md](./QUERYING_THE_GRAPH.md) - More complex query patterns
+
+---
+
+**Document Version:** 1.0
+**Last Updated:** 2025-11-14
+**Schema Version:** 20250207 (MaterialSampleRecord)
+**Author:** Claude Code (Sonnet 4.5)
diff --git a/src/docs/QUERYING_THE_GRAPH.md b/src/docs/QUERYING_THE_GRAPH.md
new file mode 100644
index 00000000..ae3a2273
--- /dev/null
+++ b/src/docs/QUERYING_THE_GRAPH.md
@@ -0,0 +1,774 @@
+# Querying the Property Graph: Practical SQL Patterns
+
+This guide provides practical SQL query patterns for working with iSamples property graph data. All queries are designed for DuckDB, but the patterns work with other SQL databases.
+
+## Table of Contents
+
+1. [Understanding the Storage Model](#understanding-the-storage-model)
+2. [Basic Entity Queries](#basic-entity-queries)
+3. [Single-Hop Traversals](#single-hop-traversals)
+4. [Multi-Hop Traversals](#multi-hop-traversals)
+5. [Aggregation and Statistics](#aggregation-and-statistics)
+6. [Filtering and Search](#filtering-and-search)
+7. [Complex Query Patterns](#complex-query-patterns)
+8. [Performance Optimization](#performance-optimization)
+9. [Common Query Recipes](#common-query-recipes)
+
+---
+
+## Understanding the Storage Model
+
+### The Unified Table Structure
+
+All nodes and edges are stored in a single table with these key columns:
+
+```sql
+CREATE TABLE pqg (
+ row_id INTEGER PRIMARY KEY, -- Internal identifier
+ pid TEXT, -- Persistent identifier (for entities)
+ otype TEXT, -- Object type: entity type OR '_edge_'
+ s INTEGER, -- Subject row_id (for edges)
+ p TEXT, -- Predicate (for edges)
+ o INTEGER[], -- Array of object row_ids (for edges)
+ n TEXT, -- Node value (for simple nodes)
+ -- Plus entity-specific columns (label, description, etc.)
+);
+```
+
+### Key Concepts
+
+1. **Entities** have `otype` β {MaterialSampleRecord, SamplingEvent, ...}
+2. **Edges** have `otype = '_edge_'`
+3. **Relationships** require joining through edge rows
+4. **Multi-valued predicates** store multiple objects in `o` array
+
+---
+
+## Basic Entity Queries
+
+### Count Entities by Type
+
+```sql
+-- Get counts of each entity type
+SELECT
+ otype,
+ COUNT(*) as count
+FROM pqg
+WHERE otype != '_edge_'
+GROUP BY otype
+ORDER BY count DESC;
+```
+
+**Example output (OpenContext data):**
+```
+otype count
+MaterialSampleRecord 1,096,352
+IdentifiedConcept 8,270,644
+GeospatialCoordLocation 1,095,912
+SamplingSite 383,912
+SamplingEvent 1,095,912
+Agent 72,520
+```
+
+### Find Entity by PID
+
+```sql
+-- Look up a specific sample
+SELECT *
+FROM pqg
+WHERE pid = 'iSamples:OPENCONTEXT:1b22b93b-...'
+ AND otype = 'MaterialSampleRecord';
+```
+
+### List All Samples with Basic Info
+
+```sql
+-- Get first 1000 samples with labels
+SELECT
+ pid,
+ label,
+ description
+FROM pqg
+WHERE otype = 'MaterialSampleRecord'
+LIMIT 1000;
+```
+
+---
+
+## Single-Hop Traversals
+
+### Pattern: Entity β Edge β Entity
+
+For a single relationship, you need:
+1. Start entity (subject)
+2. Edge row connecting them
+3. Target entity (object)
+
+### Example: Find Sampling Events for Samples
+
+```sql
+-- Which sampling event produced this sample?
+SELECT
+ sample.pid AS sample_id,
+ sample.label AS sample_label,
+ event.pid AS event_id,
+ event.label AS event_label
+FROM pqg AS sample
+-- Join to edge
+JOIN pqg AS edge
+ ON edge.s = sample.row_id
+ AND edge.p = 'produced_by'
+ AND edge.otype = '_edge_'
+-- Join to target entity
+JOIN pqg AS event
+ ON event.row_id = ANY(edge.o)
+ AND event.otype = 'SamplingEvent'
+WHERE sample.otype = 'MaterialSampleRecord'
+LIMIT 100;
+```
+
+**Key pattern:**
+- `edge.s = sample.row_id` - Edge starts at sample
+- `edge.p = 'produced_by'` - Predicate identifies relationship type
+- `event.row_id = ANY(edge.o)` - Handle multi-valued predicates
+- Always filter by `otype` for performance
+
+### Example: Find Material Categories for Sample
+
+```sql
+-- What material types does this sample have?
+SELECT
+ sample.pid AS sample_id,
+ sample.label AS sample_label,
+ material.pid AS material_category_id,
+ material.label AS material_category
+FROM pqg AS sample
+JOIN pqg AS edge
+ ON edge.s = sample.row_id
+ AND edge.p = 'has_material_category'
+JOIN pqg AS material
+ ON material.row_id = ANY(edge.o)
+ AND material.otype = 'IdentifiedConcept'
+WHERE sample.pid = 'iSamples:OPENCONTEXT:...'
+ AND sample.otype = 'MaterialSampleRecord';
+```
+
+**Note:** `has_material_category` is **multivalued**, so a sample may have multiple material types. The `ANY(edge.o)` handles this array.
+
+---
+
+## Multi-Hop Traversals
+
+### Pattern: Chaining Relationships
+
+Many useful queries require following multiple edges:
+
+```
+MaterialSampleRecord
+ --produced_by--> SamplingEvent
+ --sample_location--> GeospatialCoordLocation
+```
+
+### Example: Samples with Coordinates (2-hop)
+
+```sql
+-- Find all samples with geographic coordinates
+SELECT
+ sample.pid AS sample_id,
+ sample.label AS sample_label,
+ coords.latitude,
+ coords.longitude,
+ coords.elevation
+FROM pqg AS sample
+-- First hop: sample β event
+JOIN pqg AS edge1
+ ON edge1.s = sample.row_id
+ AND edge1.p = 'produced_by'
+JOIN pqg AS event
+ ON event.row_id = ANY(edge1.o)
+ AND event.otype = 'SamplingEvent'
+-- Second hop: event β coordinates
+JOIN pqg AS edge2
+ ON edge2.s = event.row_id
+ AND edge2.p = 'sample_location'
+JOIN pqg AS coords
+ ON coords.row_id = ANY(edge2.o)
+ AND coords.otype = 'GeospatialCoordLocation'
+WHERE sample.otype = 'MaterialSampleRecord'
+ AND coords.latitude IS NOT NULL
+ AND coords.longitude IS NOT NULL
+LIMIT 1000;
+```
+
+**Performance note:** This is the most common query pattern in iSamples - optimize it with indexes on `row_id`, `s`, `p`, `otype`.
+
+### Example: Samples β Site Name (3-hop)
+
+```sql
+-- Get sampling site names for samples
+SELECT
+ sample.pid AS sample_id,
+ sample.label AS sample_label,
+ site.pid AS site_id,
+ site.label AS site_name,
+ site_coords.latitude AS site_lat,
+ site_coords.longitude AS site_lon
+FROM pqg AS sample
+-- Hop 1: sample β event
+JOIN pqg AS edge1
+ ON edge1.s = sample.row_id
+ AND edge1.p = 'produced_by'
+JOIN pqg AS event
+ ON event.row_id = ANY(edge1.o)
+ AND event.otype = 'SamplingEvent'
+-- Hop 2: event β site
+JOIN pqg AS edge2
+ ON edge2.s = event.row_id
+ AND edge2.p = 'sampling_site'
+JOIN pqg AS site
+ ON site.row_id = ANY(edge2.o)
+ AND site.otype = 'SamplingSite'
+-- Hop 3: site β coordinates
+JOIN pqg AS edge3
+ ON edge3.s = site.row_id
+ AND edge3.p = 'site_location'
+JOIN pqg AS site_coords
+ ON site_coords.row_id = ANY(edge3.o)
+ AND site_coords.otype = 'GeospatialCoordLocation'
+WHERE sample.otype = 'MaterialSampleRecord'
+LIMIT 1000;
+```
+
+**Design note:** `sampling_site` is optional in iSamples schema, so use `LEFT JOIN` if you want samples without sites.
+
+---
+
+## Aggregation and Statistics
+
+### Count Edge Types in Dataset
+
+```sql
+-- Which relationship types are actually used?
+SELECT
+ otype AS subject_type,
+ p AS predicate,
+ COUNT(*) AS edge_count
+FROM pqg
+WHERE otype = '_edge_'
+GROUP BY otype, p
+ORDER BY edge_count DESC;
+```
+
+**Example output (OpenContext):**
+```
+subject_type predicate edge_count
+_edge_ has_sample_object_type 1,124,480
+_edge_ produced_by 1,096,352
+_edge_ has_material_category 1,095,920
+_edge_ has_context_category 1,095,912
+_edge_ keywords 1,070,912
+```
+
+### Samples per Material Category
+
+```sql
+-- How many samples for each material type?
+SELECT
+ material.label AS material_category,
+ COUNT(DISTINCT sample.pid) AS sample_count
+FROM pqg AS sample
+JOIN pqg AS edge
+ ON edge.s = sample.row_id
+ AND edge.p = 'has_material_category'
+JOIN pqg AS material
+ ON material.row_id = ANY(edge.o)
+ AND material.otype = 'IdentifiedConcept'
+WHERE sample.otype = 'MaterialSampleRecord'
+GROUP BY material.label
+ORDER BY sample_count DESC
+LIMIT 20;
+```
+
+### Geographic Bounding Box
+
+```sql
+-- Find extent of all sample locations
+SELECT
+ MIN(coords.latitude) AS min_lat,
+ MAX(coords.latitude) AS max_lat,
+ MIN(coords.longitude) AS min_lon,
+ MAX(coords.longitude) AS max_lon,
+ COUNT(DISTINCT sample.pid) AS sample_count
+FROM pqg AS sample
+JOIN pqg AS edge1 ON edge1.s = sample.row_id AND edge1.p = 'produced_by'
+JOIN pqg AS event ON event.row_id = ANY(edge1.o)
+JOIN pqg AS edge2 ON edge2.s = event.row_id AND edge2.p = 'sample_location'
+JOIN pqg AS coords ON coords.row_id = ANY(edge2.o)
+WHERE sample.otype = 'MaterialSampleRecord'
+ AND coords.latitude IS NOT NULL
+ AND coords.longitude IS NOT NULL;
+```
+
+---
+
+## Filtering and Search
+
+### Filter by Material Category
+
+```sql
+-- Find all pottery samples
+SELECT
+ sample.pid,
+ sample.label,
+ material.label AS material_type
+FROM pqg AS sample
+JOIN pqg AS edge
+ ON edge.s = sample.row_id
+ AND edge.p = 'has_material_category'
+JOIN pqg AS material
+ ON material.row_id = ANY(edge.o)
+ AND material.otype = 'IdentifiedConcept'
+WHERE sample.otype = 'MaterialSampleRecord'
+ AND material.label ILIKE '%pottery%'
+LIMIT 1000;
+```
+
+**Note:** Use `ILIKE` for case-insensitive matching, `LIKE` for case-sensitive.
+
+### Filter by Geographic Region
+
+```sql
+-- Find samples in Turkey (approximate bounding box)
+SELECT
+ sample.pid,
+ sample.label,
+ coords.latitude,
+ coords.longitude
+FROM pqg AS sample
+JOIN pqg AS edge1 ON edge1.s = sample.row_id AND edge1.p = 'produced_by'
+JOIN pqg AS event ON event.row_id = ANY(edge1.o)
+JOIN pqg AS edge2 ON edge2.s = event.row_id AND edge2.p = 'sample_location'
+JOIN pqg AS coords ON coords.row_id = ANY(edge2.o)
+WHERE sample.otype = 'MaterialSampleRecord'
+ AND coords.latitude BETWEEN 36.0 AND 42.0
+ AND coords.longitude BETWEEN 26.0 AND 45.0
+LIMIT 1000;
+```
+
+### Filter by Keyword
+
+```sql
+-- Find samples with specific keyword
+SELECT
+ sample.pid,
+ sample.label,
+ keyword.label AS keyword
+FROM pqg AS sample
+JOIN pqg AS edge
+ ON edge.s = sample.row_id
+ AND edge.p = 'keywords'
+JOIN pqg AS keyword
+ ON keyword.row_id = ANY(edge.o)
+ AND keyword.otype = 'IdentifiedConcept'
+WHERE sample.otype = 'MaterialSampleRecord'
+ AND keyword.label ILIKE '%neolithic%'
+LIMIT 1000;
+```
+
+### Combine Multiple Filters
+
+```sql
+-- Find pottery samples from Turkey with coordinates
+SELECT
+ sample.pid,
+ sample.label,
+ material.label AS material,
+ coords.latitude,
+ coords.longitude
+FROM pqg AS sample
+-- Material category
+JOIN pqg AS mat_edge
+ ON mat_edge.s = sample.row_id
+ AND mat_edge.p = 'has_material_category'
+JOIN pqg AS material
+ ON material.row_id = ANY(mat_edge.o)
+ AND material.otype = 'IdentifiedConcept'
+-- Coordinates
+JOIN pqg AS event_edge
+ ON event_edge.s = sample.row_id
+ AND event_edge.p = 'produced_by'
+JOIN pqg AS event
+ ON event.row_id = ANY(event_edge.o)
+JOIN pqg AS coord_edge
+ ON coord_edge.s = event.row_id
+ AND coord_edge.p = 'sample_location'
+JOIN pqg AS coords
+ ON coords.row_id = ANY(coord_edge.o)
+ AND coords.otype = 'GeospatialCoordLocation'
+WHERE sample.otype = 'MaterialSampleRecord'
+ AND material.label ILIKE '%pottery%'
+ AND coords.latitude BETWEEN 36.0 AND 42.0
+ AND coords.longitude BETWEEN 26.0 AND 45.0
+LIMIT 1000;
+```
+
+---
+
+## Complex Query Patterns
+
+### Find Samples Missing Specific Relationships
+
+```sql
+-- Samples without material category (quality check)
+SELECT
+ sample.pid,
+ sample.label
+FROM pqg AS sample
+WHERE sample.otype = 'MaterialSampleRecord'
+ AND NOT EXISTS (
+ SELECT 1
+ FROM pqg AS edge
+ WHERE edge.s = sample.row_id
+ AND edge.p = 'has_material_category'
+ AND edge.otype = '_edge_'
+ )
+LIMIT 1000;
+```
+
+### Samples with Multiple Material Categories
+
+```sql
+-- Samples categorized as multiple material types
+SELECT
+ sample.pid,
+ sample.label,
+ ARRAY_AGG(material.label) AS material_categories,
+ COUNT(*) AS category_count
+FROM pqg AS sample
+JOIN pqg AS edge
+ ON edge.s = sample.row_id
+ AND edge.p = 'has_material_category'
+JOIN pqg AS material
+ ON material.row_id = ANY(edge.o)
+ AND material.otype = 'IdentifiedConcept'
+WHERE sample.otype = 'MaterialSampleRecord'
+GROUP BY sample.pid, sample.label
+HAVING COUNT(*) > 1
+ORDER BY category_count DESC
+LIMIT 100;
+```
+
+### Hierarchical Queries (Parent-Child Samples)
+
+```sql
+-- Find child samples and their parents
+SELECT
+ child.pid AS child_id,
+ child.label AS child_label,
+ relation.relationship_type,
+ parent.pid AS parent_id,
+ parent.label AS parent_label
+FROM pqg AS child
+-- Child β SampleRelation edge
+JOIN pqg AS edge1
+ ON edge1.s = child.row_id
+ AND edge1.p = 'related_resource'
+JOIN pqg AS relation
+ ON relation.row_id = ANY(edge1.o)
+ AND relation.otype = 'SampleRelation'
+-- SampleRelation β Parent edge
+JOIN pqg AS edge2
+ ON edge2.s = relation.row_id
+ AND edge2.p = 'related_sample' -- Assuming this predicate exists
+JOIN pqg AS parent
+ ON parent.row_id = ANY(edge2.o)
+ AND parent.otype = 'MaterialSampleRecord'
+WHERE child.otype = 'MaterialSampleRecord'
+ AND relation.relationship_type = 'isPartOf'
+LIMIT 1000;
+```
+
+### Spatial Proximity Search
+
+```sql
+-- Find samples within ~10km of a point (approximate)
+-- 1 degree latitude β 111km, 1 degree longitude β 111km * cos(latitude)
+WITH target AS (
+ SELECT 37.5 AS target_lat, 32.8 AS target_lon -- ΓatalhΓΆyΓΌk
+)
+SELECT
+ sample.pid,
+ sample.label,
+ coords.latitude,
+ coords.longitude,
+ -- Approximate distance in km
+ 111.0 * SQRT(
+ POWER(coords.latitude - target.target_lat, 2) +
+ POWER((coords.longitude - target.target_lon) * COS(RADIANS(target.target_lat)), 2)
+ ) AS distance_km
+FROM pqg AS sample
+CROSS JOIN target
+JOIN pqg AS edge1 ON edge1.s = sample.row_id AND edge1.p = 'produced_by'
+JOIN pqg AS event ON event.row_id = ANY(edge1.o)
+JOIN pqg AS edge2 ON edge2.s = event.row_id AND edge2.p = 'sample_location'
+JOIN pqg AS coords ON coords.row_id = ANY(edge2.o)
+WHERE sample.otype = 'MaterialSampleRecord'
+ AND coords.latitude IS NOT NULL
+ AND coords.longitude IS NOT NULL
+ AND ABS(coords.latitude - target.target_lat) < 0.1 -- Pre-filter
+ AND ABS(coords.longitude - target.target_lon) < 0.1
+ORDER BY distance_km
+LIMIT 100;
+```
+
+**Note:** For precise geospatial calculations, use PostGIS or DuckDB spatial extension.
+
+---
+
+## Performance Optimization
+
+### Use Indexes
+
+```sql
+-- Create indexes for common join patterns
+CREATE INDEX idx_row_id ON pqg(row_id);
+CREATE INDEX idx_edge_s ON pqg(s) WHERE otype = '_edge_';
+CREATE INDEX idx_edge_p ON pqg(p) WHERE otype = '_edge_';
+CREATE INDEX idx_otype ON pqg(otype);
+CREATE INDEX idx_pid ON pqg(pid) WHERE otype != '_edge_';
+```
+
+### Filter Early
+
+```sql
+-- β BAD: Filter after all joins
+SELECT sample.pid, coords.latitude
+FROM pqg AS sample
+JOIN pqg AS edge1 ON edge1.s = sample.row_id
+JOIN pqg AS event ON event.row_id = ANY(edge1.o)
+JOIN pqg AS edge2 ON edge2.s = event.row_id
+JOIN pqg AS coords ON coords.row_id = ANY(edge2.o)
+WHERE sample.otype = 'MaterialSampleRecord' -- Too late!
+ AND coords.latitude > 40.0;
+
+-- β
GOOD: Filter in JOIN conditions
+SELECT sample.pid, coords.latitude
+FROM pqg AS sample
+JOIN pqg AS edge1
+ ON edge1.s = sample.row_id
+ AND edge1.p = 'produced_by'
+ AND edge1.otype = '_edge_'
+JOIN pqg AS event
+ ON event.row_id = ANY(edge1.o)
+ AND event.otype = 'SamplingEvent'
+JOIN pqg AS edge2
+ ON edge2.s = event.row_id
+ AND edge2.p = 'sample_location'
+ AND edge2.otype = '_edge_'
+JOIN pqg AS coords
+ ON coords.row_id = ANY(edge2.o)
+ AND coords.otype = 'GeospatialCoordLocation'
+ AND coords.latitude > 40.0 -- Filter here!
+WHERE sample.otype = 'MaterialSampleRecord';
+```
+
+### Use CTEs for Readability
+
+```sql
+-- Break complex queries into steps
+WITH samples_with_events AS (
+ SELECT
+ sample.row_id AS sample_row_id,
+ sample.pid AS sample_pid,
+ sample.label AS sample_label,
+ event.row_id AS event_row_id
+ FROM pqg AS sample
+ JOIN pqg AS edge ON edge.s = sample.row_id AND edge.p = 'produced_by'
+ JOIN pqg AS event ON event.row_id = ANY(edge.o)
+ WHERE sample.otype = 'MaterialSampleRecord'
+ AND event.otype = 'SamplingEvent'
+),
+events_with_coords AS (
+ SELECT
+ swe.sample_row_id,
+ swe.sample_pid,
+ swe.sample_label,
+ coords.latitude,
+ coords.longitude
+ FROM samples_with_events swe
+ JOIN pqg AS edge ON edge.s = swe.event_row_id AND edge.p = 'sample_location'
+ JOIN pqg AS coords ON coords.row_id = ANY(edge.o)
+ WHERE coords.otype = 'GeospatialCoordLocation'
+ AND coords.latitude IS NOT NULL
+)
+SELECT * FROM events_with_coords
+LIMIT 1000;
+```
+
+### Limit Result Sets
+
+```sql
+-- Always use LIMIT for exploratory queries
+SELECT * FROM pqg LIMIT 1000; -- β
+
+-- Dangerous without LIMIT on large datasets
+SELECT * FROM pqg; -- β Could return millions of rows
+```
+
+---
+
+## Common Query Recipes
+
+### Recipe 1: Export Samples with Full Metadata
+
+```sql
+-- Complete sample export with all key attributes
+SELECT
+ sample.pid AS sample_id,
+ sample.label AS sample_name,
+ sample.description,
+ material.label AS material_type,
+ context.label AS context_category,
+ coords.latitude,
+ coords.longitude,
+ coords.elevation,
+ site.label AS site_name,
+ agent.label AS collector
+FROM pqg AS sample
+-- Material
+LEFT JOIN pqg AS mat_edge ON mat_edge.s = sample.row_id AND mat_edge.p = 'has_material_category'
+LEFT JOIN pqg AS material ON material.row_id = ANY(mat_edge.o) AND material.otype = 'IdentifiedConcept'
+-- Context
+LEFT JOIN pqg AS ctx_edge ON ctx_edge.s = sample.row_id AND ctx_edge.p = 'has_context_category'
+LEFT JOIN pqg AS context ON context.row_id = ANY(ctx_edge.o) AND context.otype = 'IdentifiedConcept'
+-- Event and location
+LEFT JOIN pqg AS event_edge ON event_edge.s = sample.row_id AND event_edge.p = 'produced_by'
+LEFT JOIN pqg AS event ON event.row_id = ANY(event_edge.o) AND event.otype = 'SamplingEvent'
+LEFT JOIN pqg AS coord_edge ON coord_edge.s = event.row_id AND coord_edge.p = 'sample_location'
+LEFT JOIN pqg AS coords ON coords.row_id = ANY(coord_edge.o) AND coords.otype = 'GeospatialCoordLocation'
+-- Site
+LEFT JOIN pqg AS site_edge ON site_edge.s = event.row_id AND site_edge.p = 'sampling_site'
+LEFT JOIN pqg AS site ON site.row_id = ANY(site_edge.o) AND site.otype = 'SamplingSite'
+-- Collector
+LEFT JOIN pqg AS agent_edge ON agent_edge.s = event.row_id AND agent_edge.p = 'responsibility'
+LEFT JOIN pqg AS agent ON agent.row_id = ANY(agent_edge.o) AND agent.otype = 'Agent'
+WHERE sample.otype = 'MaterialSampleRecord'
+LIMIT 10000;
+```
+
+### Recipe 2: Validate Data Quality
+
+```sql
+-- Find samples with potential data issues
+SELECT
+ 'No material category' AS issue,
+ COUNT(*) AS count
+FROM pqg AS sample
+WHERE sample.otype = 'MaterialSampleRecord'
+ AND NOT EXISTS (
+ SELECT 1 FROM pqg AS edge
+ WHERE edge.s = sample.row_id AND edge.p = 'has_material_category'
+ )
+UNION ALL
+SELECT
+ 'No sampling event',
+ COUNT(*)
+FROM pqg AS sample
+WHERE sample.otype = 'MaterialSampleRecord'
+ AND NOT EXISTS (
+ SELECT 1 FROM pqg AS edge
+ WHERE edge.s = sample.row_id AND edge.p = 'produced_by'
+ )
+UNION ALL
+SELECT
+ 'No coordinates',
+ COUNT(*)
+FROM pqg AS sample
+WHERE sample.otype = 'MaterialSampleRecord'
+ AND NOT EXISTS (
+ SELECT 1 FROM pqg AS e1
+ JOIN pqg AS event ON event.row_id = ANY(e1.o)
+ JOIN pqg AS e2 ON e2.s = event.row_id AND e2.p = 'sample_location'
+ WHERE e1.s = sample.row_id AND e1.p = 'produced_by'
+ );
+```
+
+### Recipe 3: Generate GeoJSON
+
+```sql
+-- Create GeoJSON for web mapping
+SELECT json_object(
+ 'type', 'FeatureCollection',
+ 'features', json_group_array(
+ json_object(
+ 'type', 'Feature',
+ 'geometry', json_object(
+ 'type', 'Point',
+ 'coordinates', json_array(coords.longitude, coords.latitude)
+ ),
+ 'properties', json_object(
+ 'id', sample.pid,
+ 'label', sample.label,
+ 'material', material.label
+ )
+ )
+ )
+) AS geojson
+FROM pqg AS sample
+JOIN pqg AS mat_edge ON mat_edge.s = sample.row_id AND mat_edge.p = 'has_material_category'
+JOIN pqg AS material ON material.row_id = ANY(mat_edge.o)
+JOIN pqg AS event_edge ON event_edge.s = sample.row_id AND event_edge.p = 'produced_by'
+JOIN pqg AS event ON event.row_id = ANY(event_edge.o)
+JOIN pqg AS coord_edge ON coord_edge.s = event.row_id AND coord_edge.p = 'sample_location'
+JOIN pqg AS coords ON coords.row_id = ANY(coord_edge.o)
+WHERE sample.otype = 'MaterialSampleRecord'
+ AND coords.latitude IS NOT NULL
+ AND coords.longitude IS NOT NULL
+LIMIT 1000;
+```
+
+### Recipe 4: Time Series Analysis
+
+```sql
+-- Samples by collection year (if date data available)
+SELECT
+ EXTRACT(YEAR FROM event.event_date) AS collection_year,
+ COUNT(DISTINCT sample.pid) AS sample_count
+FROM pqg AS sample
+JOIN pqg AS edge ON edge.s = sample.row_id AND edge.p = 'produced_by'
+JOIN pqg AS event ON event.row_id = ANY(edge.o)
+WHERE sample.otype = 'MaterialSampleRecord'
+ AND event.otype = 'SamplingEvent'
+ AND event.event_date IS NOT NULL
+GROUP BY collection_year
+ORDER BY collection_year;
+```
+
+---
+
+## Next Steps
+
+- **Visual diagrams**: See [EDGE_TYPES_VISUAL.md](EDGE_TYPES_VISUAL.md) for entity relationship diagrams
+- **Predicate reference**: See [PREDICATES_REFERENCE.md](PREDICATES_REFERENCE.md) for detailed predicate documentation
+- **Graph structure**: See [UNDERSTANDING_THE_GRAPH.md](UNDERSTANDING_THE_GRAPH.md) for conceptual overview
+- **Real examples**: See [EXAMPLES_BY_DOMAIN.md](EXAMPLES_BY_DOMAIN.md) for complete domain-specific examples
+
+---
+
+## Tips and Best Practices
+
+1. **Always filter by `otype`** in JOIN conditions for performance
+2. **Use `ANY(edge.o)`** to handle multi-valued predicates correctly
+3. **Start simple** - test single-hop queries before chaining multiple relationships
+4. **Use CTEs** to break down complex queries into understandable steps
+5. **Add LIMIT** to all exploratory queries
+6. **Check for NULL values** in coordinate and date fields
+7. **Use LEFT JOIN** when relationships are optional
+8. **Explain your queries** with `EXPLAIN QUERY PLAN` to optimize performance
+9. **Batch queries** when possible instead of running thousands individually
+10. **Document your patterns** - complex graph traversals are hard to remember!
+
+---
+
+**Last updated:** 2025-11-14
+**Part of:** iSamples Property Graph Documentation Suite
diff --git a/src/docs/UNDERSTANDING_THE_GRAPH.md b/src/docs/UNDERSTANDING_THE_GRAPH.md
new file mode 100644
index 00000000..075f5ca9
--- /dev/null
+++ b/src/docs/UNDERSTANDING_THE_GRAPH.md
@@ -0,0 +1,664 @@
+# Understanding the iSamples Property Graph
+
+**Purpose:** This guide helps you understand how iSamples metadata is structured as a property graph, making it easier to query, validate, and work with material sample data.
+
+**Key Insight:** The iSamples property graph has a well-defined **grammar** consisting of 8 entity types and 14 relationship types. Understanding this grammar is essential for working with PQG-formatted data.
+
+---
+
+## Table of Contents
+
+1. [What is a Property Graph?](#what-is-a-property-graph)
+2. [The 8 Entity Types (oTypes)](#the-8-entity-types-otypes)
+3. [The 14 Relationship Types (Predicates)](#the-14-relationship-types-predicates)
+4. [The 14 Sentence Types](#the-14-sentence-types)
+5. [Why This Structure?](#why-this-structure)
+6. [Graph Traversal Patterns](#graph-traversal-patterns)
+7. [Storage Format](#storage-format)
+
+---
+
+## What is a Property Graph?
+
+A **property graph** represents data as:
+- **Nodes** (entities with properties)
+- **Edges** (relationships between nodes)
+
+**Why use a graph?**
+- Captures complex relationships naturally
+- Enables flexible multi-hop queries
+- Supports domain-agnostic modeling
+- Facilitates data integration across sources
+
+**iSamples Graph Example:**
+```
+MaterialSampleRecord (pottery sherd)
+ |
+ |-- produced_by --> SamplingEvent (2023 excavation)
+ | |
+ | |-- sampling_site --> SamplingSite (ΓatalhΓΆyΓΌk)
+ | | |
+ | | |-- site_location --> GeospatialCoordLocation
+ | |
+ | |-- sample_location --> GeospatialCoordLocation
+ | |
+ | |-- responsibility --> Agent (Dr. Smith)
+ |
+ |-- has_material_category --> IdentifiedConcept (Earthenware)
+ |
+ |-- keywords --> IdentifiedConcept (Neolithic, Pottery)
+```
+
+---
+
+## The 8 Entity Types (oTypes)
+
+These are the **node types** in the iSamples graph. Each node has an `otype` field that identifies its type.
+
+### 1. MaterialSampleRecord
+
+**What it represents:** The physical sample itself - the core entity in iSamples.
+
+**Domain examples:**
+- Archaeology: Pottery sherd, bone fragment, textile
+- Geology: Rock core, mineral specimen, sediment sample
+- Biology: Tissue sample, DNA extract, whole organism
+
+**Key properties:**
+- `pid` - Unique identifier (typically IGSN)
+- `label` - Human-readable name
+- `description` - Detailed description
+- `sample_identifier` - Canonical sample ID
+
+**Required relationships:**
+- Must have `produced_by` β SamplingEvent
+- Must have `has_material_category` β IdentifiedConcept
+- Must have `has_context_category` β IdentifiedConcept
+- Must have `has_sample_object_type` β IdentifiedConcept
+
+**Example:**
+```yaml
+otype: MaterialSampleRecord
+pid: "igsn:SSH000001"
+label: "Ceramic bowl fragment from Trench 5"
+description: "Red-slipped pottery sherd with geometric decoration"
+```
+
+---
+
+### 2. SamplingEvent
+
+**What it represents:** The activity that collected/created the sample.
+
+**Domain examples:**
+- Archaeology: Excavation, surface collection, test pit
+- Geology: Core drilling, outcrop sampling, dredge
+- Biology: Field collection, trap deployment, specimen preparation
+
+**Key properties:**
+- `pid` - Event identifier
+- `label` - Event name/code
+- `description` - Sampling procedure details
+- `result_time` - When sample was collected (date/datetime)
+- `has_feature_of_interest` - What was sampled
+- `project` - Project identifier or name
+
+**Relationships:**
+- Links TO: SamplingSite, GeospatialCoordLocation, Agent
+- Links FROM: MaterialSampleRecord (via `produced_by`)
+
+**Example:**
+```yaml
+otype: SamplingEvent
+pid: "event:2023-catal-t5-001"
+label: "Trench 5, Level 3, 2023-07-15"
+result_time: "2023-07-15"
+has_feature_of_interest: "Neolithic architectural feature"
+```
+
+---
+
+### 3. SamplingSite
+
+**What it represents:** Named location where sampling occurred.
+
+**Domain examples:**
+- Archaeology: Site name (ΓatalhΓΆyΓΌk, Pompeii)
+- Geology: Formation/locality name (Yellowstone Core Site YC-01)
+- Biology: Research station, reef system, forest plot
+
+**Key properties:**
+- `pid` - Site identifier
+- `label` - Site name
+- `description` - Site description
+- `place_name` - One or more names for the site
+
+**Relationships:**
+- Links TO: GeospatialCoordLocation (via `site_location`)
+- Links TO: SamplingSite (via `is_part_of` for nested sites)
+- Links FROM: SamplingEvent (via `sampling_site`)
+
+**Example:**
+```yaml
+otype: SamplingSite
+pid: "site:catalhoyuk-south"
+label: "ΓatalhΓΆyΓΌk South Area"
+place_name: ["ΓatalhΓΆyΓΌk", "Γatal HΓΆyΓΌk", "Chatal Huyuk"]
+```
+
+---
+
+### 4. GeospatialCoordLocation
+
+**What it represents:** Precise geographic coordinates (WGS84).
+
+**Key properties:**
+- `pid` - Coordinate identifier
+- `latitude` - Decimal degrees (-90 to 90)
+- `longitude` - Decimal degrees (-180 to 180)
+- `elevation` - String with value, units, datum (e.g., "401 m above mean sea level")
+- `obfuscated` - Boolean flag if coordinates are intentionally imprecise
+
+**Relationships:**
+- Links FROM: SamplingEvent (via `sample_location`)
+- Links FROM: SamplingSite (via `site_location`)
+
+**Example:**
+```yaml
+otype: GeospatialCoordLocation
+pid: "coord:37.6665-32.8274"
+latitude: 37.6665
+longitude: 32.8274
+elevation: "1015 m above mean sea level"
+obfuscated: false
+```
+
+---
+
+### 5. IdentifiedConcept
+
+**What it represents:** Controlled vocabulary terms for classification and keywords.
+
+**Used for:**
+- Material types (rock, ceramic, DNA, etc.)
+- Sampled feature types (terrestrial, marine, archaeological)
+- Sample object types (core, hand specimen, thin section)
+- Free-text keywords for discovery
+
+**Key properties:**
+- `pid` - Concept URI (from controlled vocabulary)
+- `label` - Human-readable term
+- `scheme_name` - Vocabulary name
+- `scheme_uri` - Vocabulary identifier
+
+**Vocabularies:**
+- [Material Type Vocabulary](https://w3id.org/isample/vocabulary/material/)
+- [Sampled Feature Vocabulary](https://w3id.org/isample/vocabulary/sampledfeature/)
+- [Material Sample Object Type Vocabulary](https://w3id.org/isample/vocabulary/materialsampleobjecttype/)
+
+**Example:**
+```yaml
+otype: IdentifiedConcept
+pid: "https://w3id.org/isample/vocabulary/material/0.9/earthenware"
+label: "Earthenware"
+scheme_name: "iSamples Material Type Vocabulary"
+scheme_uri: "https://w3id.org/isample/vocabulary/material/"
+```
+
+---
+
+### 6. Agent
+
+**What it represents:** Person or organization with a role in sample lifecycle.
+
+**Roles:**
+- Collector (sampling event)
+- Registrant (sample registration)
+- Curator (sample storage)
+
+**Key properties:**
+- `pid` - Agent identifier (ORCID preferred)
+- `name` - Person/organization name
+- `affiliation` - Institutional affiliation
+- `contact_information` - Email, phone, address
+- `role` - Role relative to sample
+
+**Example:**
+```yaml
+otype: Agent
+pid: "https://orcid.org/0000-0002-1234-5678"
+name: "Dr. Jane Smith"
+affiliation: "University of Example"
+contact_information: "jsmith@example.edu"
+role: "Principal Investigator"
+```
+
+---
+
+### 7. MaterialSampleCuration
+
+**What it represents:** Information about sample storage, access, and curation history.
+
+**Key properties:**
+- `pid` - Curation record identifier
+- `label` - Collection/storage name
+- `description` - Curation procedures
+- `curation_location` - Where sample is stored
+- `access_constraints` - Access restrictions
+
+**Relationships:**
+- Links TO: Agent (via `responsibility`)
+- Links FROM: MaterialSampleRecord (via `curation`)
+
+**Example:**
+```yaml
+otype: MaterialSampleCuration
+pid: "curation:smithsonian-nmnh-123"
+label: "Smithsonian NMNH Anthropology Collection"
+curation_location: "National Museum of Natural History, Washington DC"
+access_constraints: "Appointment required, no destructive sampling"
+```
+
+---
+
+### 8. SampleRelation
+
+**What it represents:** Relationship between samples (parent-child, sibling, etc.).
+
+**Use cases:**
+- Parent sample β subsample
+- Whole organism β tissue β DNA extract
+- Core sample β thin section β analysis aliquot
+
+**Key properties:**
+- `pid` - Relation identifier
+- `label` - Relation description
+- `description` - Details of relationship
+- `relationship` - Relation type (e.g., "derivedFrom")
+- `target` - PID of related sample
+
+**Example:**
+```yaml
+otype: SampleRelation
+pid: "relation:subsample-001"
+label: "Subsample for radiocarbon dating"
+relationship: "derivedFrom"
+target: "igsn:SSH000001"
+```
+
+---
+
+## The 14 Relationship Types (Predicates)
+
+These are the **edge types** (predicates) that connect nodes in the iSamples graph. Each edge has:
+- `s` (subject) - Source node row_id
+- `p` (predicate) - Relationship type
+- `o` (object) - Target node row_id(s)
+
+### From MaterialSampleRecord (8 predicates)
+
+| Predicate | Target Type | Cardinality | Description |
+|-----------|-------------|-------------|-------------|
+| `produced_by` | SamplingEvent | One | Links sample to collection event |
+| `has_material_category` | IdentifiedConcept | Many | What is it made of? |
+| `has_context_category` | IdentifiedConcept | Many | What domain/environment? |
+| `has_sample_object_type` | IdentifiedConcept | Many | What form does it take? |
+| `keywords` | IdentifiedConcept | Many | Discovery keywords |
+| `registrant` | Agent | One | Who registered this sample? |
+| `curation` | MaterialSampleCuration | One | Where is it stored? |
+| `related_resource` | SampleRelation | Many | Links to related samples |
+
+### From SamplingEvent (4 predicates)
+
+| Predicate | Target Type | Cardinality | Description |
+|-----------|-------------|-------------|-------------|
+| `sampling_site` | SamplingSite | One | Where was it collected? |
+| `sample_location` | GeospatialCoordLocation | One | Precise coordinates |
+| `responsibility` | Agent | Many | Who collected it? |
+| `has_context_category` | IdentifiedConcept | Many | Sampling context |
+
+### From SamplingSite (1 predicate)
+
+| Predicate | Target Type | Cardinality | Description |
+|-----------|-------------|-------------|-------------|
+| `site_location` | GeospatialCoordLocation | One | Site coordinates |
+
+### From MaterialSampleCuration (1 predicate)
+
+| Predicate | Target Type | Cardinality | Description |
+|-----------|-------------|-------------|-------------|
+| `responsibility` | Agent | Many | Who curates it? |
+
+---
+
+## The 14 Sentence Types
+
+Think of these as the **grammar** of iSamples metadata. Each represents a valid statement you can make about samples:
+
+### Core Sample Provenance (3 sentence types)
+
+1. **"This sample was produced by this sampling event"**
+ - `MaterialSampleRecord --produced_by--> SamplingEvent`
+ - Every sample MUST have this relationship
+ - Links sample to its collection context
+
+2. **"This sample is made of this material type"**
+ - `MaterialSampleRecord --has_material_category--> IdentifiedConcept`
+ - Required: At least one material classification
+ - Example: Earthenware, Basalt, DNA
+
+3. **"This sample represents this context"**
+ - `MaterialSampleRecord --has_context_category--> IdentifiedConcept`
+ - Required: Domain classification
+ - Example: Terrestrial/Archaeological, Marine Biome
+
+### Sample Classification (2 sentence types)
+
+4. **"This sample takes this physical form"**
+ - `MaterialSampleRecord --has_sample_object_type--> IdentifiedConcept`
+ - Required: Object type
+ - Example: Sherd, Core, Specimen
+
+5. **"This sample is described by these keywords"**
+ - `MaterialSampleRecord --keywords--> IdentifiedConcept`
+ - Optional: Discovery keywords
+ - Example: Neolithic, Pottery, Red-slipped
+
+### Sample Stewardship (2 sentence types)
+
+6. **"This sample was registered by this person"**
+ - `MaterialSampleRecord --registrant--> Agent`
+ - Optional: Who created the metadata record
+ - Example: Data curator, Collection manager
+
+7. **"This sample is stored here"**
+ - `MaterialSampleRecord --curation--> MaterialSampleCuration`
+ - Optional: Storage and access information
+
+### Sample Relationships (1 sentence type)
+
+8. **"This sample relates to that sample"**
+ - `MaterialSampleRecord --related_resource--> SampleRelation`
+ - Optional: Parent-child, sibling relationships
+ - Example: Subsample, Derived from
+
+### Event Location (2 sentence types)
+
+9. **"This event occurred at this site"**
+ - `SamplingEvent --sampling_site--> SamplingSite`
+ - Optional: Named sampling location
+ - Example: ΓatalhΓΆyΓΌk, Yellowstone Core Site
+
+10. **"This event occurred at these coordinates"**
+ - `SamplingEvent --sample_location--> GeospatialCoordLocation`
+ - Optional but common: Precise sample coordinates
+ - Example: 37.6665Β°N, 32.8274Β°E
+
+### Event Responsibility (2 sentence types)
+
+11. **"This person collected at this event"**
+ - `SamplingEvent --responsibility--> Agent`
+ - Optional: Field collectors, project team
+
+12. **"This event belongs to this context"**
+ - `SamplingEvent --has_context_category--> IdentifiedConcept`
+ - Optional: Event-level context classification
+
+### Site Location (1 sentence type)
+
+13. **"This site is located at these coordinates"**
+ - `SamplingSite --site_location--> GeospatialCoordLocation`
+ - Optional: Site-level coordinates (less precise than sample)
+
+### Curation Responsibility (1 sentence type)
+
+14. **"This person curates this collection"**
+ - `MaterialSampleCuration --responsibility--> Agent`
+ - Optional: Curators, collection managers
+
+---
+
+## Why This Structure?
+
+### Multi-Hop Traversal by Design
+
+**Finding a sample's coordinates requires multiple hops:**
+
+```
+MaterialSampleRecord
+ β produced_by β SamplingEvent
+ β sample_location β GeospatialCoordLocation
+```
+
+**Why not store coordinates directly on the sample?**
+
+β
**Benefits of separation:**
+1. **Shared locations** - Multiple samples from same event share one coordinate
+2. **Different precision** - Site coordinates vs exact sample coordinates
+3. **Reusable events** - One event can produce many samples
+4. **Flexible modeling** - Some samples have site but not precise coords
+
+β **Drawbacks of flat structure:**
+- Duplicate coordinates across samples
+- Can't distinguish site-level vs sample-level precision
+- Harder to maintain data consistency
+
+### Domain-Agnostic Design
+
+The 8 entity types work across **all scientific domains**:
+
+- **Archaeology:** Pottery, bones, charcoal β Terrestrial/Archaeological context
+- **Geology:** Cores, outcrops, minerals β Terrestrial/Subsurface context
+- **Biology:** Tissue, DNA, specimens β Marine Biome or Terrestrial context
+
+**Same schema, different values** - This is true domain-agnostic modeling.
+
+### Graph Query Flexibility
+
+**Example queries enabled by graph structure:**
+
+1. "Find all samples collected by Agent X"
+ - `MaterialSampleRecord β produced_by β SamplingEvent β responsibility β Agent`
+
+2. "Find all samples within 10km of a location"
+ - `MaterialSampleRecord β produced_by β SamplingEvent β sample_location β GeospatialCoordLocation`
+
+3. "Find all earthenware samples with keywords 'Neolithic' AND 'pottery'"
+ - `MaterialSampleRecord β has_material_category β IdentifiedConcept` (Earthenware)
+ - `MaterialSampleRecord β keywords β IdentifiedConcept` (Neolithic, Pottery)
+
+4. "Find parent sample for a given subsample"
+ - `MaterialSampleRecord β related_resource β SampleRelation` (where relationship="derivedFrom")
+
+---
+
+## Graph Traversal Patterns
+
+### Pattern 1: Sample β Coordinates (2-3 hops)
+
+**Path:**
+```
+Sample β produced_by β Event β sample_location β Coords
+```
+
+**SQL example:**
+```sql
+SELECT
+ sample.pid AS sample_id,
+ coords.latitude,
+ coords.longitude
+FROM pqg AS sample
+JOIN pqg AS edge1 ON edge1.s = sample.row_id AND edge1.p = 'produced_by'
+JOIN pqg AS event ON event.row_id = ANY(edge1.o)
+JOIN pqg AS edge2 ON edge2.s = event.row_id AND edge2.p = 'sample_location'
+JOIN pqg AS coords ON coords.row_id = ANY(edge2.o)
+WHERE sample.otype = 'MaterialSampleRecord'
+```
+
+### Pattern 2: Sample β Site Name (3 hops)
+
+**Path:**
+```
+Sample β produced_by β Event β sampling_site β Site
+```
+
+**SQL example:**
+```sql
+SELECT
+ sample.label AS sample_label,
+ site.label AS site_name,
+ site.place_name
+FROM pqg AS sample
+JOIN pqg AS edge1 ON edge1.s = sample.row_id AND edge1.p = 'produced_by'
+JOIN pqg AS event ON event.row_id = ANY(edge1.o)
+JOIN pqg AS edge2 ON edge2.s = event.row_id AND edge2.p = 'sampling_site'
+JOIN pqg AS site ON site.row_id = ANY(edge2.o)
+WHERE sample.otype = 'MaterialSampleRecord'
+```
+
+### Pattern 3: Sample β Collector (2-3 hops)
+
+**Path:**
+```
+Sample β produced_by β Event β responsibility β Agent
+```
+
+**SQL example:**
+```sql
+SELECT
+ sample.label AS sample_label,
+ agent.name AS collector_name,
+ event.result_time
+FROM pqg AS sample
+JOIN pqg AS edge1 ON edge1.s = sample.row_id AND edge1.p = 'produced_by'
+JOIN pqg AS event ON event.row_id = ANY(edge1.o)
+JOIN pqg AS edge2 ON edge2.s = event.row_id AND edge2.p = 'responsibility'
+JOIN pqg AS agent ON agent.row_id = ANY(edge2.o)
+WHERE sample.otype = 'MaterialSampleRecord'
+```
+
+### Pattern 4: Material Type Filter (1 hop)
+
+**Path:**
+```
+Sample β has_material_category β Concept
+```
+
+**SQL example:**
+```sql
+SELECT
+ sample.pid,
+ sample.label,
+ concept.label AS material_type
+FROM pqg AS sample
+JOIN pqg AS edge ON edge.s = sample.row_id AND edge.p = 'has_material_category'
+JOIN pqg AS concept ON concept.row_id = ANY(edge.o)
+WHERE concept.label = 'Earthenware'
+```
+
+---
+
+## Storage Format
+
+### Unified Table Structure
+
+PQG stores **both nodes and edges in a single table**:
+
+```sql
+CREATE TABLE pqg (
+ row_id INTEGER PRIMARY KEY,
+ pid VARCHAR UNIQUE NOT NULL,
+ otype VARCHAR, -- Node type or '_edge_'
+
+ -- Edge fields (NULL for non-edge nodes)
+ s INTEGER, -- Subject row_id
+ p VARCHAR, -- Predicate
+ o INTEGER[], -- Object row_id(s)
+ n VARCHAR, -- Named graph
+
+ -- Entity properties (NULL for edges)
+ label VARCHAR,
+ description TEXT,
+ latitude DECIMAL,
+ longitude DECIMAL,
+ elevation VARCHAR,
+ ...
+);
+```
+
+### Node Rows
+
+**Example: MaterialSampleRecord node**
+```
+row_id: 1
+pid: "igsn:SSH000001"
+otype: "MaterialSampleRecord"
+s: NULL
+p: NULL
+o: NULL
+n: NULL
+label: "Ceramic bowl fragment"
+description: "Red-slipped pottery..."
+```
+
+### Edge Rows
+
+**Example: produced_by edge**
+```
+row_id: 1001
+pid: "edge_12345"
+otype: "_edge_"
+s: 1 (sample row_id)
+p: "produced_by"
+o: [2] (event row_id)
+n: NULL
+label: NULL
+description: NULL
+```
+
+### Query Pattern
+
+**Find all edges of a specific type:**
+```sql
+SELECT
+ subject.pid AS subject_pid,
+ edge.p AS predicate,
+ object.pid AS object_pid
+FROM pqg AS edge
+JOIN pqg AS subject ON edge.s = subject.row_id
+JOIN pqg AS object ON object.row_id = ANY(edge.o)
+WHERE edge.otype = '_edge_'
+ AND subject.otype = 'MaterialSampleRecord'
+ AND edge.p = 'produced_by'
+ AND object.otype = 'SamplingEvent'
+```
+
+---
+
+## Summary
+
+**The iSamples property graph is defined by:**
+
+β
**8 entity types (nodes)** - MaterialSampleRecord, SamplingEvent, SamplingSite, GeospatialCoordLocation, IdentifiedConcept, Agent, MaterialSampleCuration, SampleRelation
+
+β
**14 relationship types (edges)** - The complete grammar of valid connections
+
+β
**14 sentence types** - All possible statements you can make about samples
+
+**Key takeaway:** Understanding these building blocks enables you to:
+- Query iSamples data effectively
+- Validate metadata completeness
+- Integrate new data sources
+- Build tools that work across domains
+
+**Next steps:**
+- [PREDICATES_REFERENCE.md](./PREDICATES_REFERENCE.md) - Detailed reference for each predicate
+- [EXAMPLES_BY_DOMAIN.md](./EXAMPLES_BY_DOMAIN.md) - Real-world examples
+- [QUERYING_THE_GRAPH.md](./QUERYING_THE_GRAPH.md) - Query patterns and SQL
+
+---
+
+**Document Version:** 1.0
+**Last Updated:** 2025-11-14
+**Schema Version:** 20250207 (MaterialSampleRecord)
+**Author:** Claude Code (Sonnet 4.5) based on iSamples LinkML schema analysis