Skip to content

Alternative approaches to validating rules not expressible in static json schema #122

@testower

Description

@testower

I want to explore the idea presented here MobilityData/gbfs-validator#153 as an alternative to patching existing schemas while keeping the benefit of schema validation for these rules. We also consider the alternative of a programmatic (non-interoperable) approach.

A detailed analysis written with help from Claude:

Summary

The GBFS Validator currently implements custom validation rules (that cannot be expressed in static JSON schemas) by dynamically patching schemas at runtime. This approach has significant problems:

  1. High implementation complexity - Rules require deep knowledge of JSON Schema structure and JsonPath API
  2. Risk of rule conflicts - No enforcement mechanism prevents rules from interfering with each other
  3. Confusing validation reports - Errors reference constraints that don't exist in static schemas
  4. Implementation-specific logic - Other GBFS validators must reimplement the entire patching system

Two viable alternatives have been identified:

Option A: Programmatic Validation

Run custom validation checks after schema validation, generating errors directly in code.

  • Easiest to maintain - Clear validation logic
  • ❌ No interoperability (each validator reimplements)

Option B: Schema Templates with Placeholders

Replace placeholders in schema templates with actual values at runtime.

  • Interoperable - All validators use same templates from upstream GBFS spec
  • Transparent - Templates visible in schema files
  • Standards-based - Single source of truth
  • ❌ Requires upstream coordination
  • ❌ Only valuable for multi-implementation ecosystem

Current Implementation Analysis

How Schema Patching Works Today

The validation flow:

  1. Load static JSON schemas from src/main/resources/schema/v{version}/{feedName}.json
  2. For each feed being validated, retrieve applicable custom rules
  3. Apply rules sequentially, each receiving:
    • A JsonPath DocumentContext wrapping the schema JSON
    • A map of all loaded GBFS feeds
  4. Rules extract data from feeds (e.g., valid pricing plan IDs) and inject into schemas:
    • Add enum constraints for reference validation
    • Add required field constraints based on feed presence
    • Build if/then/else conditional schemas
  5. Convert patched JSONObject to Everit Schema and validate

Key Files:

  • CustomRuleSchemaPatcher.java:31-42 - Interface all rules implement
  • AbstractVersion.java:118-162 - Orchestrates rule application via stream reduce
  • FileValidator.java:59-84 - Entry point for validation

Current Custom Rules (8 total)

All rules in gbfs-validator-java/src/main/java/org/entur/gbfs/validation/validator/rules/:

Reference Validation Rules (enum constraints):

  1. NoInvalidReferenceToPricingPlansInVehicleStatus - pricing_plan_id must exist in system_pricing_plans
  2. NoInvalidReferenceToPricingPlansInVehicleTypes - pricing plan IDs in vehicle_types must be valid
  3. NoInvalidReferenceToRegionInStationInformation - region_id must exist in system_regions
  4. NoInvalidReferenceToVehicleTypesInStationStatus - vehicle_type_id must exist in vehicle_types

Conditional Required Field Rules:
5. NoMissingVehicleTypesAvailableWhenVehicleTypesExists - vehicle_types_available required when vehicle_types feed exists
6. NoMissingOrInvalidVehicleTypeIdInVehicleStatusWhenVehicleTypesExist - vehicle_type_id required and valid when vehicle_types exists
7. NoMissingStoreUriInSystemInformation - rental_apps required when rental_uris exist in stations/vehicles
8. NoMissingCurrentRangeMetersInVehicleStatusForMotorizedVehicles - current_range_meters required for motorized vehicles (if/then/else schema)

Implementation Complexity Examples

Simple Rule: Reference Validation (24 logic lines)

NoInvalidReferenceToRegionInStationInformation.java:41-58:

@Override
public DocumentContext addRule(
  DocumentContext rawSchemaDocumentContext,
  Map<String, JSONObject> feeds
) {
  // Extract valid region IDs from system_regions feed
  JSONObject systemRegionsFeed = feeds.get("system_regions");
  JSONArray regionIds = systemRegionsFeed != null
    ? JsonPath.parse(systemRegionsFeed).read("$.data.regions[*].region_id")
    : new JSONArray();

  // Navigate to region_id property in schema (6 levels deep)
  JSONObject regionIdSchema = rawSchemaDocumentContext.read(
    "$.properties.data.properties.stations.items.properties.region_id"
  );

  // Add enum constraint
  regionIdSchema.put("enum", regionIds);

  // Write back to schema
  return rawSchemaDocumentContext.set(path, regionIdSchema);
}

Complexity factors:

  • Must construct correct JsonPath expression (error-prone strings)
  • Requires understanding JSON Schema structure (where to inject constraint)
  • Mix of read/modify/write operations on JSONObjects
  • Defensive null handling for missing feeds
  • No compile-time safety for schema paths

Complex Rule: Multi-Feed Conditional (82 logic lines)

NoMissingStoreUriInSystemInformation.java:47-129:

This rule checks two different feeds (vehicle_status AND station_information) to determine if rental_apps should be required in system_information:

// Check vehicle_status for rental URIs
JSONObject vehicleStatusFeed = feeds.get(vehicleStatusFileName);
String vehiclesKey = vehicleStatusFileName.equals("vehicle_status")
  ? "vehicles" : "bikes";  // Backward compatibility

if (!(JSONArray) JsonPath.parse(vehicleStatusFeed)
    .read("$.data." + vehiclesKey + "[:1].rental_uris.ios")).isEmpty()) {
  hasIosRentalUris = true;
}

// Check station_information for rental URIs
JSONObject stationInformationFeed = feeds.get("station_information");
if (!(JSONArray) JsonPath.parse(stationInformationFeed)
    .read("$.data.stations[:1].rental_uris.ios")).isEmpty()) {
  hasIosRentalUris = true;
}

// Conditionally modify system_information schema
if (hasIosRentalUris || hasAndroidRentalUris) {
  JSONArray dataRequired = rawSchemaDocumentContext.read("$.properties.data.required");
  dataRequired.put("rental_apps");

  JSONObject rentalAppsSchema = rawSchemaDocumentContext.read(
    "$.properties.data.properties.rental_apps"
  );
  JSONArray rentalAppRequired = new JSONArray();
  if (hasIosRentalUris) rentalAppRequired.put("ios");
  if (hasAndroidRentalUris) rentalAppRequired.put("android");
  rentalAppsSchema.put("required", rentalAppRequired);
}

Additional complexity:

  • Dynamic JsonPath construction based on feed type
  • Checks multiple feeds with different structures
  • Accumulates boolean flags across feeds
  • Modifies multiple schema locations
  • Handles backward compatibility with legacy feeds

Most Complex: Conditional Schema with Filters (64 logic lines)

NoMissingCurrentRangeMetersInVehicleStatusForMotorizedVehicles.java:56-119:

Uses JsonPath filters to extract only motorized vehicle types, then builds an if/then/else schema:

// Filter for motorized vehicles
private static final Filter motorizedVehicleTypesFilter = Filter.filter(
  where("propulsion_type").in(List.of("electric_assist", "electric", "combustion"))
);

// Extract motorized vehicle type IDs
JSONArray motorizedVehicleTypeIds = JsonPath.parse(vehicleTypesFeed)
  .read("$.data.vehicle_types[?].vehicle_type_id", motorizedVehicleTypesFilter);

// Build complex if/then schema
bikeItemsSchema
  .put("if", new JSONObject()
    .put("properties", new JSONObject()
      .put("vehicle_type_id", new JSONObject().put("enum", motorizedVehicleTypeIds))
    )
    .put("required", new JSONArray().put("vehicle_type_id"))
  )
  .put("then", new JSONObject()
    .put("required", new JSONArray().put("current_range_meters"))
  );

Builds JSON Schema conditional: "If vehicle_type_id is a motorized type, then current_range_meters is required"

Problem 1: Implementation Complexity

What Makes Rules Hard to Write

  1. JsonPath Expertise Required:

    • Schema paths are 4-6 levels deep: $.properties.data.properties.stations.items.properties.region_id
    • Data extraction uses wildcards and filters: $.data.vehicle_types[?].vehicle_type_id
    • Array slicing for optimization: [:1] to get first element
    • No compile-time validation - wrong paths fail at runtime
  2. JSON Schema Structure Knowledge:

    • Must know where to inject constraints (properties vs items vs required array)
    • Different patterns for different constraint types (enum vs required vs if/then)
    • Schema structure varies by GBFS version (bikes vs vehicles)
  3. JSONObject Manipulation:

    • Mix of DocumentContext.read(), JSONObject.put(), JSONObject.append(), DocumentContext.set()
    • In-place mutations vs functional returns
    • Manual schema copying to prevent cache mutation
  4. Cross-Feed Data Dependencies:

    • Rules must extract data from multiple feeds
    • Different feeds have different structures
    • Defensive null handling required throughout

Maintenance Burden

  • Adding a new rule requires 20-80 lines of complex code
  • Rules are tightly coupled to schema structure - schema changes break rules
  • Backward compatibility adds conditional logic (bikes vs vehicles)
  • No abstraction layer - each rule duplicates navigation patterns
  • Testing requires understanding entire validation flow

Problem 2: Risk of Rule Conflicts

Current Conflict Mitigation

AbstractVersion.java:158-161:

// Must make a copy of the schema, otherwise it will be mutated by json-path
return patcher.addRule(
  JsonPath.parse(new JSONObject(schema.toMap())),
  feedMap
).json();
  • Each rule gets a fresh copy of the schema from the previous rule's output
  • Prevents mutation of cached raw schemas
  • Rules applied sequentially via stream reduce

What's NOT Protected

Scenario 1: Multiple rules modifying same required array

If two rules both append to $.properties.data.properties.stations.items.required:

  • First rule adds vehicle_types_available
  • Second rule adds vehicle_docks_available
  • Works correctly - both fields end up in required array

BUT if second rule replaces instead of appending:

requiredArray = new JSONArray().put("vehicle_docks_available");  // Oops, lost first rule's addition

Scenario 2: Rules modifying same property

If two rules both target vehicle_type_id:

  • First rule adds: { "enum": [...] }
  • Second rule adds: { "pattern": "..." }
  • Second rule could overwrite if it does vehicleTypeIdSchema.put("enum", ...) again

No Enforcement Mechanism

  • Rules are carefully designed by humans to avoid conflicts
  • No validation that combined rules produce valid JSON Schema
  • No declaration of which schema paths a rule modifies
  • Adding new rules requires manual review for conflicts
  • Refactoring risks introducing subtle conflicts

Problem 3: Confusing Validation Reports

Error Structure

FileValidationError.java:29-34:

public record FileValidationError(
  String schemaPath,      // From ValidationException.getSchemaLocation()
  String violationPath,   // From ValidationException.getPointerToViolation()
  String message,
  String keyword
)

These values come directly from Everit's ValidationException which validates against the patched schema.

The Confusion

Example Error:

{
  "schemaPath": "#/properties/data/properties/vehicles/items/properties/pricing_plan_id/enum",
  "violationPath": "#/data/vehicles/0/pricing_plan_id",
  "message": "instance value (plan_999) not found in enum (possible values: [\"plan_1\",\"plan_2\"])",
  "keyword": "enum"
}

User investigates: Opens vehicle_status.json schema and navigates to properties.data.properties.vehicles.items.properties.pricing_plan_id:

{
  "pricing_plan_id": {
    "type": "string",
    "description": "The plan_id of the pricing plan this vehicle is eligible for"
  }
}

No enum constraint! 😕

Why This Happens

  1. Static schema doesn't have the enum constraint
  2. NoInvalidReferenceToPricingPlansInVehicleStatus dynamically added it:
    pricingPlanIdSchema.put("enum", pricingPlanIds);  // Added at runtime
  3. Validation error references the patched schema location
  4. User has no way to know this came from a custom rule
  5. Error is technically correct but misleading

User Experience Impact

  • Schema inspection is useless - Errors reference constraints that aren't in schema files
  • Can't trace error source - No indication which custom rule caused the error
  • Documentation doesn't help - Static schema docs don't explain dynamic constraints
  • Debugging is hard - Must understand entire custom rules system to interpret errors
  • Other validator implementations will have different errors - No standard way to report these dynamic constraints

Problem 4: Implementation-Specific Logic

Current Architecture is Java-Specific

The patching system is tightly coupled to:

  1. Jayway JsonPath library (Java/JVM)

    • DocumentContext API for schema manipulation
    • Filter API for complex queries
    • Configuration with JsonOrgJsonProvider
  2. org.json JSONObject (Java)

    • JSONObject/JSONArray manipulation
    • Conversion to/from Maps
  3. Everit JSON Schema Validator (Java)

    • Schema loading from JSONObject
    • ValidationException structure
  4. Java Streams and Collections

    • Stream reduce for rule application
    • Map/List for rule registration

Other Validators Must Reimplement

For a Python validator to implement the same rules:

  • Reimplement all 8 custom rules in Python
  • Use different JSON manipulation library (likely different API)
  • Use different JsonPath library (or write path logic manually)
  • Use different schema validator (likely different error structure)
  • Results in different behavior - No guarantee of identical validation

For a JavaScript/Go/Rust validator:

  • Same story - complete reimplementation
  • Different libraries, different patterns
  • Risk of divergence in rule logic

No Interoperability

  • Each validator implementation has its own custom rules
  • No shared definition of what the rules should do
  • GBFS spec can't standardize the dynamic constraints
  • Validation results differ across implementations
  • Users get different errors depending on which validator they use

Proposed Solution: Schema Templates with Placeholders

High-Level Concept

Instead of patching schemas at runtime with code, use schema templates that declare placeholders for dynamic values:

Current approach (code injects enum):

// Code in NoInvalidReferenceToRegionInStationInformation
JSONArray regionIds = JsonPath.parse(systemRegionsFeed)
  .read("$.data.regions[*].region_id");
regionIdSchema.put("enum", regionIds);

Proposed approach (template with placeholder):

{
  "region_id": {
    "type": "string",
    "description": "ID of the region where station is located",
    "enum": "${VALID_REGION_IDS}"
  }
}

Validator performs simple string replacement:

String schema = loadSchemaAsString("station_information.json");
String regionIds = extractRegionIds(feeds.get("system_regions"));  // ["R1","R2","R3"]
schema = schema.replace("\"${VALID_REGION_IDS}\"", regionIds);  // Simple string replace

Result after replacement:

{
  "region_id": {
    "type": "string",
    "description": "ID of the region where station is located",
    "enum": ["R1", "R2", "R3"]
  }
}

Benefits

1. Dramatically Simpler Implementation

Before (24 lines of complex Java):

public DocumentContext addRule(DocumentContext rawSchemaDocumentContext, Map<String, JSONObject> feeds) {
  JSONObject systemRegionsFeed = feeds.get("system_regions");
  JSONObject regionIdSchema = rawSchemaDocumentContext.read(
    "$.properties.data.properties.stations.items.properties.region_id"
  );
  JSONArray regionIds = systemRegionsFeed != null
    ? JsonPath.parse(systemRegionsFeed).read("$.data.regions[*].region_id")
    : new JSONArray();
  regionIdSchema.put("enum", regionIds);
  return rawSchemaDocumentContext.set(
    "$.properties.data.properties.stations.items.properties.region_id",
    regionIdSchema
  );
}

After (3 lines of simple text processing):

String regionIds = extractIds(feeds.get("system_regions"), "$.data.regions[*].region_id");
schema = schema.replace("\"${VALID_REGION_IDS}\"", regionIds);
  • No JsonPath navigation of schemas
  • No JSONObject manipulation
  • No schema structure knowledge required
  • Just text find/replace operations

2. Eliminates Rule Conflicts

Templates define exactly where values go:

{
  "required": ["station_id", "num_bikes_available", "${CONDITIONAL_REQUIRED_FIELDS}"],
  "properties": {
    "vehicle_type_id": {
      "type": "string",
      "enum": "${VALID_VEHICLE_TYPE_IDS}"
    }
  }
}
  • Placeholders are pre-positioned by schema authors
  • No runtime conflict possible
  • Multiple rules can't modify same location - only one placeholder per location
  • Schema templating validates placeholders are well-formed

3. Transparent Validation Reports

Error example with templates:

{
  "schemaPath": "#/properties/data/properties/vehicles/items/properties/pricing_plan_id/enum",
  "violationPath": "#/data/vehicles/0/pricing_plan_id",
  "message": "instance value (plan_999) not found in enum (possible values: [\"plan_1\",\"plan_2\"])",
  "keyword": "enum"
}

User opens vehicle_status.json template schema:

{
  "pricing_plan_id": {
    "type": "string",
    "description": "The plan_id of the pricing plan this vehicle is eligible for",
    "enum": "${VALID_PRICING_PLAN_IDS}"
  }
}

Aha! 💡 The enum constraint exists in the template with a placeholder. User now understands:

  • The enum is populated from system_pricing_plans feed
  • The error means their pricing_plan_id doesn't match system_pricing_plans
  • The template serves as documentation of the dynamic behavior

4. Interoperability Across Validators

Schema templates live in upstream GBFS spec repository (e.g., MobilityData/gbfs)

All validators (Java, Python, JavaScript, Go, Rust, etc.) implement the same simple logic:

Python validator:

schema = load_schema_text("station_information.json")
region_ids = extract_ids(feeds["system_regions"], "$.data.regions[*].region_id")
schema = schema.replace('"${VALID_REGION_IDS}"', region_ids)

JavaScript validator:

let schema = loadSchemaText("station_information.json");
const regionIds = extractIds(feeds["system_regions"], "$.data.regions[*].region_id");
schema = schema.replace('"${VALID_REGION_IDS}"', regionIds);

Go validator:

schema := loadSchemaText("station_information.json")
regionIds := extractIds(feeds["system_regions"], "$.data.regions[*].region_id")
schema = strings.Replace(schema, `"${VALID_REGION_IDS}"`, regionIds, 1)

All produce identical results because:

  • Same template schemas from upstream
  • Same placeholder names
  • Same replacement logic
  • Same validation behavior

Implementation Strategy

Step 1: Define Placeholder Convention

Propose to GBFS spec maintainers:

Placeholder Syntax: ${VARIABLE_NAME}

  • Consistent with many templating systems
  • Easy to identify in JSON
  • Won't conflict with valid JSON values (requires escaping)

Example Placeholders:

  • ${VALID_PRICING_PLAN_IDS} - Array of valid pricing plan IDs
  • ${VALID_VEHICLE_TYPE_IDS} - Array of valid vehicle type IDs
  • ${VALID_REGION_IDS} - Array of valid region IDs
  • ${CONDITIONAL_REQUIRED_FIELDS} - Array of conditionally required field names

Placement Rules:

  • Placeholders for arrays: "enum": "${VALID_IDS}" (replace entire value)
  • Placeholders for arrays in arrays: "required": ["station_id", "${CONDITIONAL_FIELDS}"] (replace item in array)
  • Placeholders for objects: "if": "${CONDITIONAL_SCHEMA}" (replace entire object)

Step 2: Create Template Schemas

Update existing GBFS JSON schemas with placeholders:

Example: station_information.json

Before (static schema):

{
  "properties": {
    "region_id": {
      "type": "string",
      "description": "ID of the region where station is located"
    }
  }
}

After (template schema):

{
  "properties": {
    "region_id": {
      "type": "string",
      "description": "ID of the region where station is located",
      "enum": "${VALID_REGION_IDS}"
    }
  }
}

Example: station_status.json

Before:

{
  "required": ["station_id", "num_bikes_available", "is_installed", "is_renting", "is_returning", "last_reported"]
}

After:

{
  "required": ["station_id", "num_bikes_available", "is_installed", "is_renting", "is_returning", "last_reported", "${CONDITIONAL_REQUIRED_STATION_STATUS_FIELDS}"]
}

Step 3: Implement Template Processing

New class: SchemaTemplateProcessor

public class SchemaTemplateProcessor {

  public String processTemplate(String templateSchema, Map<String, JSONObject> feeds) {
    String processed = templateSchema;

    // Replace each placeholder
    processed = replaceValidPricingPlanIds(processed, feeds);
    processed = replaceValidVehicleTypeIds(processed, feeds);
    processed = replaceValidRegionIds(processed, feeds);
    processed = replaceConditionalRequiredFields(processed, feeds);
    // ... etc

    return processed;
  }

  private String replaceValidRegionIds(String schema, Map<String, JSONObject> feeds) {
    JSONObject systemRegions = feeds.get("system_regions");
    if (systemRegions == null) {
      return schema.replace("\"${VALID_REGION_IDS}\"", "[]");
    }

    JSONArray regionIds = JsonPath.parse(systemRegions)
      .read("$.data.regions[*].region_id");

    return schema.replace("\"${VALID_REGION_IDS}\"", regionIds.toString());
  }

  // Similar methods for other placeholders...
}

Integration in AbstractVersion.java:

public Schema getSchema(String feedName, Map<String, JSONObject> feedMap) {
  String templateSchema = loadSchemaAsString(feedName);  // Load as text, not JSONObject
  String processedSchema = templateProcessor.processTemplate(templateSchema, feedMap);
  return loadSchema(new JSONObject(processedSchema));  // Parse and build validator
}

Step 4: Maintain Backward Compatibility

During transition, support both approaches:

  1. Flag in configuration: useSchemaTemplates (default: false)
  2. When false: Use existing CustomRuleSchemaPatcher system
  3. When true: Use new SchemaTemplateProcessor
  4. Template schemas: Stored alongside static schemas (e.g., schema/v2.3/templates/)
  5. Gradual migration: One rule at a time, validate results match

Eventually deprecate and remove custom rule patching system.

Step 5: Upstream Contribution

Work with MobilityData/GBFS maintainers:

  1. Propose placeholder specification - Document placeholder syntax and semantics
  2. Create template schemas - For all versions (2.1, 2.2, 2.3, 3.0)
  3. Add template documentation - Explain dynamic constraints in spec
  4. Publish template schemas - In official GBFS schema repository
  5. Reference in spec - GBFS specification references template schemas

Mapping Current Rules to Templates

Reference Validation Rules → Enum Placeholders

Current Rule Template Placeholder Schema Location
NoInvalidReferenceToPricingPlansInVehicleStatus ${VALID_PRICING_PLAN_IDS} vehicle_status.json, free_bike_status.json → pricing_plan_id/enum
NoInvalidReferenceToPricingPlansInVehicleTypes ${VALID_PRICING_PLAN_IDS} vehicle_types.json → default_pricing_plan_id/enum, pricing_plan_ids/items/enum
NoInvalidReferenceToRegionInStationInformation ${VALID_REGION_IDS} station_information.json → region_id/enum
NoInvalidReferenceToVehicleTypesInStationStatus ${VALID_VEHICLE_TYPE_IDS} station_status.json → vehicle_type_id/enum, vehicle_type_ids/items/enum

Conditional Required Fields → Array Placeholders

Current Rule Template Placeholder Schema Location
NoMissingVehicleTypesAvailableWhenVehicleTypesExists ${CONDITIONAL_REQUIRED_STATION_STATUS_FIELDS} station_status.json → required (append)
NoMissingOrInvalidVehicleTypeIdInVehicleStatusWhenVehicleTypesExist ${CONDITIONAL_REQUIRED_VEHICLE_STATUS_FIELDS} vehicle_status.json → required (append)
NoMissingStoreUriInSystemInformation ${CONDITIONAL_REQUIRED_SYSTEM_INFO_FIELDS} system_information.json → required (append)

Complex Conditional → Object Placeholder

Current Rule Template Placeholder Schema Location
NoMissingCurrentRangeMetersInVehicleStatusForMotorizedVehicles ${MOTORIZED_VEHICLE_CONDITIONAL_SCHEMA} vehicle_status.json → vehicles/items (merge if/then)

Template for motorized vehicles (in vehicle_status.json):

{
  "items": {
    "allOf": [
      { "$ref": "#/definitions/vehicle" },
      "${MOTORIZED_VEHICLE_CONDITIONAL_SCHEMA}"
    ]
  }
}

Placeholder value (computed):

{
  "if": {
    "properties": {
      "vehicle_type_id": { "enum": ["type_1", "type_3"] }
    },
    "required": ["vehicle_type_id"]
  },
  "then": {
    "required": ["current_range_meters"]
  }
}

Alternative Approach: Programmatic Validation

There's a third option that was not initially considered: programmatic validation - checking data directly in code rather than modifying schemas or using templates.

How It Works

Instead of patching schemas or using templates, run additional validation after schema validation:

Load static schemas → Validate with Everit → Run custom validators → Combine errors → Report

New interface:

public interface CustomValidator {
  List<FileValidationError> validate(Map<String, JSONObject> feeds);
  String getTargetFeed();
  String getDescription();
}

Example implementation (~30 lines vs 60 for schema patching):

public class ValidateRegionReferences implements CustomValidator {

  @Override
  public List<FileValidationError> validate(Map<String, JSONObject> feeds) {
    List<FileValidationError> errors = new ArrayList<>();

    JSONObject systemRegions = feeds.get("system_regions");
    JSONObject stationInfo = feeds.get("station_information");
    if (systemRegions == null || stationInfo == null) return errors;

    // Extract valid region IDs
    List<String> regionIdList = JsonPath.parse(systemRegions)
      .read("$.data.regions[*].region_id");
    Set<String> validRegionIds = new HashSet<>(regionIdList);

    // Check each station
    JSONArray stations = stationInfo.getJSONObject("data").getJSONArray("stations");
    for (int i = 0; i < stations.length(); i++) {
      JSONObject station = stations.getJSONObject(i);

      if (station.has("region_id")) {
        String regionId = station.getString("region_id");

        if (!validRegionIds.contains(regionId)) {
          errors.add(new FileValidationError(
            null,
            "#/data/stations/" + i + "/region_id",
            "region_id '" + regionId + "' does not exist in system_regions",
            "invalid_reference"
          ));
        }
      }
    }

    return errors;
  }

  @Override
  public String getTargetFeed() { return "station_information"; }

  @Override
  public String getDescription() {
    return "Validates region_id values exist in system_regions";
  }
}

Advantages

  1. Dramatically simpler - 50% less code than schema patching, no JsonPath schema navigation
  2. Clearer logic - Direct iteration and checks, obvious what's being validated
  3. Better error messages - Custom messages like "region_id 'R999' does not exist in system_regions"
  4. Type safety - Working with Set<String>, not stringly-typed JSONObjects
  5. Easier testing - Direct unit tests, no schema knowledge required
  6. Faster - No schema parsing/modification overhead
  7. Quick implementation - 3-4 weeks total (no upstream coordination)
  8. No conflicts - Validators are independent, can't interfere

Disadvantages

  1. No interoperability - Each validator implementation must reimplement in their language
  2. Validation logic separate from schema - Can't see full validation picture in schema files
  3. Error format differences - schemaPath is null/N/A for programmatic checks

Comparison Summary

Aspect Schema Patching Templates Programmatic
Code per rule 24-82 lines ~3 (template) + 20-40 (replacement) 20-40 lines
Complexity High Low Low
Readability Poor Good Excellent
Error messages Confusing Clear Excellent
Performance Slow Medium Fast
Interoperability None ⭐⭐⭐⭐⭐ None
Maintainability Poor Good Excellent
Type safety None None Good
Testing Hard Medium Easy
Implementation time N/A (current) 2-3 months 3-4 weeks
Ecosystem benefit None High Low

Decision Framework

The right choice depends on the project's goals:

Choose Option A (Programmatic Validation) if:

  • ✅ This is the primary/only GBFS validator implementation
  • ✅ Simplicity and maintainability are top priorities
  • ❌ Ecosystem-wide standardization is not a primary goal

Implementation effort: 3-4 weeks

Choose Option B (Schema Templates) if:

  • ✅ Multiple GBFS validator implementations need to stay synchronized
  • ✅ Ecosystem-wide interoperability is a primary goal
  • ✅ Schemas should document all validation rules (transparency)

Implementation effort: 2-3 months (including upstream contribution)

Never Choose: Current Schema Patching ❌

The current approach has no advantages over either alternative:

  • ❌ Most complex (JsonPath + schema structure knowledge)
  • ❌ Worst error messages (phantom schema references)
  • ❌ No interoperability anyway
  • ❌ Hard to maintain and test
  • ❌ Slowest performance

Conclusion

Current assessment: The schema patching approach should be replaced with one of the two alternatives.

Both alternatives are significantly better than the current approach:

  • Option A (Programmatic): Best for developer experience, maintainability, and quick wins
  • Option B (Templates): Best for ecosystem standardization and interoperability

Neither option requires backwards compatibility support - both allow clean migration.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions