-
Notifications
You must be signed in to change notification settings - Fork 3
Description
I want to explore the idea presented here MobilityData/gbfs-validator#153 as an alternative to patching existing schemas while keeping the benefit of schema validation for these rules. We also consider the alternative of a programmatic (non-interoperable) approach.
A detailed analysis written with help from Claude:
Summary
The GBFS Validator currently implements custom validation rules (that cannot be expressed in static JSON schemas) by dynamically patching schemas at runtime. This approach has significant problems:
- High implementation complexity - Rules require deep knowledge of JSON Schema structure and JsonPath API
- Risk of rule conflicts - No enforcement mechanism prevents rules from interfering with each other
- Confusing validation reports - Errors reference constraints that don't exist in static schemas
- Implementation-specific logic - Other GBFS validators must reimplement the entire patching system
Two viable alternatives have been identified:
Option A: Programmatic Validation
Run custom validation checks after schema validation, generating errors directly in code.
- Easiest to maintain - Clear validation logic
- ❌ No interoperability (each validator reimplements)
Option B: Schema Templates with Placeholders
Replace placeholders in schema templates with actual values at runtime.
- Interoperable - All validators use same templates from upstream GBFS spec
- Transparent - Templates visible in schema files
- Standards-based - Single source of truth
- ❌ Requires upstream coordination
- ❌ Only valuable for multi-implementation ecosystem
Current Implementation Analysis
How Schema Patching Works Today
The validation flow:
- Load static JSON schemas from
src/main/resources/schema/v{version}/{feedName}.json - For each feed being validated, retrieve applicable custom rules
- Apply rules sequentially, each receiving:
- A JsonPath
DocumentContextwrapping the schema JSON - A map of all loaded GBFS feeds
- A JsonPath
- Rules extract data from feeds (e.g., valid pricing plan IDs) and inject into schemas:
- Add enum constraints for reference validation
- Add required field constraints based on feed presence
- Build if/then/else conditional schemas
- Convert patched JSONObject to Everit Schema and validate
Key Files:
CustomRuleSchemaPatcher.java:31-42- Interface all rules implementAbstractVersion.java:118-162- Orchestrates rule application via stream reduceFileValidator.java:59-84- Entry point for validation
Current Custom Rules (8 total)
All rules in gbfs-validator-java/src/main/java/org/entur/gbfs/validation/validator/rules/:
Reference Validation Rules (enum constraints):
NoInvalidReferenceToPricingPlansInVehicleStatus- pricing_plan_id must exist in system_pricing_plansNoInvalidReferenceToPricingPlansInVehicleTypes- pricing plan IDs in vehicle_types must be validNoInvalidReferenceToRegionInStationInformation- region_id must exist in system_regionsNoInvalidReferenceToVehicleTypesInStationStatus- vehicle_type_id must exist in vehicle_types
Conditional Required Field Rules:
5. NoMissingVehicleTypesAvailableWhenVehicleTypesExists - vehicle_types_available required when vehicle_types feed exists
6. NoMissingOrInvalidVehicleTypeIdInVehicleStatusWhenVehicleTypesExist - vehicle_type_id required and valid when vehicle_types exists
7. NoMissingStoreUriInSystemInformation - rental_apps required when rental_uris exist in stations/vehicles
8. NoMissingCurrentRangeMetersInVehicleStatusForMotorizedVehicles - current_range_meters required for motorized vehicles (if/then/else schema)
Implementation Complexity Examples
Simple Rule: Reference Validation (24 logic lines)
NoInvalidReferenceToRegionInStationInformation.java:41-58:
@Override
public DocumentContext addRule(
DocumentContext rawSchemaDocumentContext,
Map<String, JSONObject> feeds
) {
// Extract valid region IDs from system_regions feed
JSONObject systemRegionsFeed = feeds.get("system_regions");
JSONArray regionIds = systemRegionsFeed != null
? JsonPath.parse(systemRegionsFeed).read("$.data.regions[*].region_id")
: new JSONArray();
// Navigate to region_id property in schema (6 levels deep)
JSONObject regionIdSchema = rawSchemaDocumentContext.read(
"$.properties.data.properties.stations.items.properties.region_id"
);
// Add enum constraint
regionIdSchema.put("enum", regionIds);
// Write back to schema
return rawSchemaDocumentContext.set(path, regionIdSchema);
}Complexity factors:
- Must construct correct JsonPath expression (error-prone strings)
- Requires understanding JSON Schema structure (where to inject constraint)
- Mix of read/modify/write operations on JSONObjects
- Defensive null handling for missing feeds
- No compile-time safety for schema paths
Complex Rule: Multi-Feed Conditional (82 logic lines)
NoMissingStoreUriInSystemInformation.java:47-129:
This rule checks two different feeds (vehicle_status AND station_information) to determine if rental_apps should be required in system_information:
// Check vehicle_status for rental URIs
JSONObject vehicleStatusFeed = feeds.get(vehicleStatusFileName);
String vehiclesKey = vehicleStatusFileName.equals("vehicle_status")
? "vehicles" : "bikes"; // Backward compatibility
if (!(JSONArray) JsonPath.parse(vehicleStatusFeed)
.read("$.data." + vehiclesKey + "[:1].rental_uris.ios")).isEmpty()) {
hasIosRentalUris = true;
}
// Check station_information for rental URIs
JSONObject stationInformationFeed = feeds.get("station_information");
if (!(JSONArray) JsonPath.parse(stationInformationFeed)
.read("$.data.stations[:1].rental_uris.ios")).isEmpty()) {
hasIosRentalUris = true;
}
// Conditionally modify system_information schema
if (hasIosRentalUris || hasAndroidRentalUris) {
JSONArray dataRequired = rawSchemaDocumentContext.read("$.properties.data.required");
dataRequired.put("rental_apps");
JSONObject rentalAppsSchema = rawSchemaDocumentContext.read(
"$.properties.data.properties.rental_apps"
);
JSONArray rentalAppRequired = new JSONArray();
if (hasIosRentalUris) rentalAppRequired.put("ios");
if (hasAndroidRentalUris) rentalAppRequired.put("android");
rentalAppsSchema.put("required", rentalAppRequired);
}Additional complexity:
- Dynamic JsonPath construction based on feed type
- Checks multiple feeds with different structures
- Accumulates boolean flags across feeds
- Modifies multiple schema locations
- Handles backward compatibility with legacy feeds
Most Complex: Conditional Schema with Filters (64 logic lines)
NoMissingCurrentRangeMetersInVehicleStatusForMotorizedVehicles.java:56-119:
Uses JsonPath filters to extract only motorized vehicle types, then builds an if/then/else schema:
// Filter for motorized vehicles
private static final Filter motorizedVehicleTypesFilter = Filter.filter(
where("propulsion_type").in(List.of("electric_assist", "electric", "combustion"))
);
// Extract motorized vehicle type IDs
JSONArray motorizedVehicleTypeIds = JsonPath.parse(vehicleTypesFeed)
.read("$.data.vehicle_types[?].vehicle_type_id", motorizedVehicleTypesFilter);
// Build complex if/then schema
bikeItemsSchema
.put("if", new JSONObject()
.put("properties", new JSONObject()
.put("vehicle_type_id", new JSONObject().put("enum", motorizedVehicleTypeIds))
)
.put("required", new JSONArray().put("vehicle_type_id"))
)
.put("then", new JSONObject()
.put("required", new JSONArray().put("current_range_meters"))
);Builds JSON Schema conditional: "If vehicle_type_id is a motorized type, then current_range_meters is required"
Problem 1: Implementation Complexity
What Makes Rules Hard to Write
-
JsonPath Expertise Required:
- Schema paths are 4-6 levels deep:
$.properties.data.properties.stations.items.properties.region_id - Data extraction uses wildcards and filters:
$.data.vehicle_types[?].vehicle_type_id - Array slicing for optimization:
[:1]to get first element - No compile-time validation - wrong paths fail at runtime
- Schema paths are 4-6 levels deep:
-
JSON Schema Structure Knowledge:
- Must know where to inject constraints (properties vs items vs required array)
- Different patterns for different constraint types (enum vs required vs if/then)
- Schema structure varies by GBFS version (bikes vs vehicles)
-
JSONObject Manipulation:
- Mix of
DocumentContext.read(),JSONObject.put(),JSONObject.append(),DocumentContext.set() - In-place mutations vs functional returns
- Manual schema copying to prevent cache mutation
- Mix of
-
Cross-Feed Data Dependencies:
- Rules must extract data from multiple feeds
- Different feeds have different structures
- Defensive null handling required throughout
Maintenance Burden
- Adding a new rule requires 20-80 lines of complex code
- Rules are tightly coupled to schema structure - schema changes break rules
- Backward compatibility adds conditional logic (bikes vs vehicles)
- No abstraction layer - each rule duplicates navigation patterns
- Testing requires understanding entire validation flow
Problem 2: Risk of Rule Conflicts
Current Conflict Mitigation
AbstractVersion.java:158-161:
// Must make a copy of the schema, otherwise it will be mutated by json-path
return patcher.addRule(
JsonPath.parse(new JSONObject(schema.toMap())),
feedMap
).json();- Each rule gets a fresh copy of the schema from the previous rule's output
- Prevents mutation of cached raw schemas
- Rules applied sequentially via stream reduce
What's NOT Protected
Scenario 1: Multiple rules modifying same required array
If two rules both append to $.properties.data.properties.stations.items.required:
- First rule adds
vehicle_types_available - Second rule adds
vehicle_docks_available - Works correctly - both fields end up in required array
BUT if second rule replaces instead of appending:
requiredArray = new JSONArray().put("vehicle_docks_available"); // Oops, lost first rule's additionScenario 2: Rules modifying same property
If two rules both target vehicle_type_id:
- First rule adds:
{ "enum": [...] } - Second rule adds:
{ "pattern": "..." } - Second rule could overwrite if it does
vehicleTypeIdSchema.put("enum", ...)again
No Enforcement Mechanism
- Rules are carefully designed by humans to avoid conflicts
- No validation that combined rules produce valid JSON Schema
- No declaration of which schema paths a rule modifies
- Adding new rules requires manual review for conflicts
- Refactoring risks introducing subtle conflicts
Problem 3: Confusing Validation Reports
Error Structure
FileValidationError.java:29-34:
public record FileValidationError(
String schemaPath, // From ValidationException.getSchemaLocation()
String violationPath, // From ValidationException.getPointerToViolation()
String message,
String keyword
)These values come directly from Everit's ValidationException which validates against the patched schema.
The Confusion
Example Error:
{
"schemaPath": "#/properties/data/properties/vehicles/items/properties/pricing_plan_id/enum",
"violationPath": "#/data/vehicles/0/pricing_plan_id",
"message": "instance value (plan_999) not found in enum (possible values: [\"plan_1\",\"plan_2\"])",
"keyword": "enum"
}User investigates: Opens vehicle_status.json schema and navigates to properties.data.properties.vehicles.items.properties.pricing_plan_id:
{
"pricing_plan_id": {
"type": "string",
"description": "The plan_id of the pricing plan this vehicle is eligible for"
}
}No enum constraint! 😕
Why This Happens
- Static schema doesn't have the enum constraint
NoInvalidReferenceToPricingPlansInVehicleStatusdynamically added it:pricingPlanIdSchema.put("enum", pricingPlanIds); // Added at runtime
- Validation error references the patched schema location
- User has no way to know this came from a custom rule
- Error is technically correct but misleading
User Experience Impact
- Schema inspection is useless - Errors reference constraints that aren't in schema files
- Can't trace error source - No indication which custom rule caused the error
- Documentation doesn't help - Static schema docs don't explain dynamic constraints
- Debugging is hard - Must understand entire custom rules system to interpret errors
- Other validator implementations will have different errors - No standard way to report these dynamic constraints
Problem 4: Implementation-Specific Logic
Current Architecture is Java-Specific
The patching system is tightly coupled to:
-
Jayway JsonPath library (Java/JVM)
DocumentContextAPI for schema manipulationFilterAPI for complex queries- Configuration with
JsonOrgJsonProvider
-
org.json JSONObject (Java)
- JSONObject/JSONArray manipulation
- Conversion to/from Maps
-
Everit JSON Schema Validator (Java)
- Schema loading from JSONObject
- ValidationException structure
-
Java Streams and Collections
- Stream reduce for rule application
- Map/List for rule registration
Other Validators Must Reimplement
For a Python validator to implement the same rules:
- Reimplement all 8 custom rules in Python
- Use different JSON manipulation library (likely different API)
- Use different JsonPath library (or write path logic manually)
- Use different schema validator (likely different error structure)
- Results in different behavior - No guarantee of identical validation
For a JavaScript/Go/Rust validator:
- Same story - complete reimplementation
- Different libraries, different patterns
- Risk of divergence in rule logic
No Interoperability
- Each validator implementation has its own custom rules
- No shared definition of what the rules should do
- GBFS spec can't standardize the dynamic constraints
- Validation results differ across implementations
- Users get different errors depending on which validator they use
Proposed Solution: Schema Templates with Placeholders
High-Level Concept
Instead of patching schemas at runtime with code, use schema templates that declare placeholders for dynamic values:
Current approach (code injects enum):
// Code in NoInvalidReferenceToRegionInStationInformation
JSONArray regionIds = JsonPath.parse(systemRegionsFeed)
.read("$.data.regions[*].region_id");
regionIdSchema.put("enum", regionIds);Proposed approach (template with placeholder):
{
"region_id": {
"type": "string",
"description": "ID of the region where station is located",
"enum": "${VALID_REGION_IDS}"
}
}Validator performs simple string replacement:
String schema = loadSchemaAsString("station_information.json");
String regionIds = extractRegionIds(feeds.get("system_regions")); // ["R1","R2","R3"]
schema = schema.replace("\"${VALID_REGION_IDS}\"", regionIds); // Simple string replaceResult after replacement:
{
"region_id": {
"type": "string",
"description": "ID of the region where station is located",
"enum": ["R1", "R2", "R3"]
}
}Benefits
1. Dramatically Simpler Implementation
Before (24 lines of complex Java):
public DocumentContext addRule(DocumentContext rawSchemaDocumentContext, Map<String, JSONObject> feeds) {
JSONObject systemRegionsFeed = feeds.get("system_regions");
JSONObject regionIdSchema = rawSchemaDocumentContext.read(
"$.properties.data.properties.stations.items.properties.region_id"
);
JSONArray regionIds = systemRegionsFeed != null
? JsonPath.parse(systemRegionsFeed).read("$.data.regions[*].region_id")
: new JSONArray();
regionIdSchema.put("enum", regionIds);
return rawSchemaDocumentContext.set(
"$.properties.data.properties.stations.items.properties.region_id",
regionIdSchema
);
}After (3 lines of simple text processing):
String regionIds = extractIds(feeds.get("system_regions"), "$.data.regions[*].region_id");
schema = schema.replace("\"${VALID_REGION_IDS}\"", regionIds);- No JsonPath navigation of schemas
- No JSONObject manipulation
- No schema structure knowledge required
- Just text find/replace operations
2. Eliminates Rule Conflicts
Templates define exactly where values go:
{
"required": ["station_id", "num_bikes_available", "${CONDITIONAL_REQUIRED_FIELDS}"],
"properties": {
"vehicle_type_id": {
"type": "string",
"enum": "${VALID_VEHICLE_TYPE_IDS}"
}
}
}- Placeholders are pre-positioned by schema authors
- No runtime conflict possible
- Multiple rules can't modify same location - only one placeholder per location
- Schema templating validates placeholders are well-formed
3. Transparent Validation Reports
Error example with templates:
{
"schemaPath": "#/properties/data/properties/vehicles/items/properties/pricing_plan_id/enum",
"violationPath": "#/data/vehicles/0/pricing_plan_id",
"message": "instance value (plan_999) not found in enum (possible values: [\"plan_1\",\"plan_2\"])",
"keyword": "enum"
}User opens vehicle_status.json template schema:
{
"pricing_plan_id": {
"type": "string",
"description": "The plan_id of the pricing plan this vehicle is eligible for",
"enum": "${VALID_PRICING_PLAN_IDS}"
}
}Aha! 💡 The enum constraint exists in the template with a placeholder. User now understands:
- The enum is populated from system_pricing_plans feed
- The error means their pricing_plan_id doesn't match system_pricing_plans
- The template serves as documentation of the dynamic behavior
4. Interoperability Across Validators
Schema templates live in upstream GBFS spec repository (e.g., MobilityData/gbfs)
All validators (Java, Python, JavaScript, Go, Rust, etc.) implement the same simple logic:
Python validator:
schema = load_schema_text("station_information.json")
region_ids = extract_ids(feeds["system_regions"], "$.data.regions[*].region_id")
schema = schema.replace('"${VALID_REGION_IDS}"', region_ids)JavaScript validator:
let schema = loadSchemaText("station_information.json");
const regionIds = extractIds(feeds["system_regions"], "$.data.regions[*].region_id");
schema = schema.replace('"${VALID_REGION_IDS}"', regionIds);Go validator:
schema := loadSchemaText("station_information.json")
regionIds := extractIds(feeds["system_regions"], "$.data.regions[*].region_id")
schema = strings.Replace(schema, `"${VALID_REGION_IDS}"`, regionIds, 1)All produce identical results because:
- Same template schemas from upstream
- Same placeholder names
- Same replacement logic
- Same validation behavior
Implementation Strategy
Step 1: Define Placeholder Convention
Propose to GBFS spec maintainers:
Placeholder Syntax: ${VARIABLE_NAME}
- Consistent with many templating systems
- Easy to identify in JSON
- Won't conflict with valid JSON values (requires escaping)
Example Placeholders:
${VALID_PRICING_PLAN_IDS}- Array of valid pricing plan IDs${VALID_VEHICLE_TYPE_IDS}- Array of valid vehicle type IDs${VALID_REGION_IDS}- Array of valid region IDs${CONDITIONAL_REQUIRED_FIELDS}- Array of conditionally required field names
Placement Rules:
- Placeholders for arrays:
"enum": "${VALID_IDS}"(replace entire value) - Placeholders for arrays in arrays:
"required": ["station_id", "${CONDITIONAL_FIELDS}"](replace item in array) - Placeholders for objects:
"if": "${CONDITIONAL_SCHEMA}"(replace entire object)
Step 2: Create Template Schemas
Update existing GBFS JSON schemas with placeholders:
Example: station_information.json
Before (static schema):
{
"properties": {
"region_id": {
"type": "string",
"description": "ID of the region where station is located"
}
}
}After (template schema):
{
"properties": {
"region_id": {
"type": "string",
"description": "ID of the region where station is located",
"enum": "${VALID_REGION_IDS}"
}
}
}Example: station_status.json
Before:
{
"required": ["station_id", "num_bikes_available", "is_installed", "is_renting", "is_returning", "last_reported"]
}After:
{
"required": ["station_id", "num_bikes_available", "is_installed", "is_renting", "is_returning", "last_reported", "${CONDITIONAL_REQUIRED_STATION_STATUS_FIELDS}"]
}Step 3: Implement Template Processing
New class: SchemaTemplateProcessor
public class SchemaTemplateProcessor {
public String processTemplate(String templateSchema, Map<String, JSONObject> feeds) {
String processed = templateSchema;
// Replace each placeholder
processed = replaceValidPricingPlanIds(processed, feeds);
processed = replaceValidVehicleTypeIds(processed, feeds);
processed = replaceValidRegionIds(processed, feeds);
processed = replaceConditionalRequiredFields(processed, feeds);
// ... etc
return processed;
}
private String replaceValidRegionIds(String schema, Map<String, JSONObject> feeds) {
JSONObject systemRegions = feeds.get("system_regions");
if (systemRegions == null) {
return schema.replace("\"${VALID_REGION_IDS}\"", "[]");
}
JSONArray regionIds = JsonPath.parse(systemRegions)
.read("$.data.regions[*].region_id");
return schema.replace("\"${VALID_REGION_IDS}\"", regionIds.toString());
}
// Similar methods for other placeholders...
}Integration in AbstractVersion.java:
public Schema getSchema(String feedName, Map<String, JSONObject> feedMap) {
String templateSchema = loadSchemaAsString(feedName); // Load as text, not JSONObject
String processedSchema = templateProcessor.processTemplate(templateSchema, feedMap);
return loadSchema(new JSONObject(processedSchema)); // Parse and build validator
}Step 4: Maintain Backward Compatibility
During transition, support both approaches:
- Flag in configuration:
useSchemaTemplates(default: false) - When false: Use existing CustomRuleSchemaPatcher system
- When true: Use new SchemaTemplateProcessor
- Template schemas: Stored alongside static schemas (e.g.,
schema/v2.3/templates/) - Gradual migration: One rule at a time, validate results match
Eventually deprecate and remove custom rule patching system.
Step 5: Upstream Contribution
Work with MobilityData/GBFS maintainers:
- Propose placeholder specification - Document placeholder syntax and semantics
- Create template schemas - For all versions (2.1, 2.2, 2.3, 3.0)
- Add template documentation - Explain dynamic constraints in spec
- Publish template schemas - In official GBFS schema repository
- Reference in spec - GBFS specification references template schemas
Mapping Current Rules to Templates
Reference Validation Rules → Enum Placeholders
| Current Rule | Template Placeholder | Schema Location |
|---|---|---|
| NoInvalidReferenceToPricingPlansInVehicleStatus | ${VALID_PRICING_PLAN_IDS} |
vehicle_status.json, free_bike_status.json → pricing_plan_id/enum |
| NoInvalidReferenceToPricingPlansInVehicleTypes | ${VALID_PRICING_PLAN_IDS} |
vehicle_types.json → default_pricing_plan_id/enum, pricing_plan_ids/items/enum |
| NoInvalidReferenceToRegionInStationInformation | ${VALID_REGION_IDS} |
station_information.json → region_id/enum |
| NoInvalidReferenceToVehicleTypesInStationStatus | ${VALID_VEHICLE_TYPE_IDS} |
station_status.json → vehicle_type_id/enum, vehicle_type_ids/items/enum |
Conditional Required Fields → Array Placeholders
| Current Rule | Template Placeholder | Schema Location |
|---|---|---|
| NoMissingVehicleTypesAvailableWhenVehicleTypesExists | ${CONDITIONAL_REQUIRED_STATION_STATUS_FIELDS} |
station_status.json → required (append) |
| NoMissingOrInvalidVehicleTypeIdInVehicleStatusWhenVehicleTypesExist | ${CONDITIONAL_REQUIRED_VEHICLE_STATUS_FIELDS} |
vehicle_status.json → required (append) |
| NoMissingStoreUriInSystemInformation | ${CONDITIONAL_REQUIRED_SYSTEM_INFO_FIELDS} |
system_information.json → required (append) |
Complex Conditional → Object Placeholder
| Current Rule | Template Placeholder | Schema Location |
|---|---|---|
| NoMissingCurrentRangeMetersInVehicleStatusForMotorizedVehicles | ${MOTORIZED_VEHICLE_CONDITIONAL_SCHEMA} |
vehicle_status.json → vehicles/items (merge if/then) |
Template for motorized vehicles (in vehicle_status.json):
{
"items": {
"allOf": [
{ "$ref": "#/definitions/vehicle" },
"${MOTORIZED_VEHICLE_CONDITIONAL_SCHEMA}"
]
}
}Placeholder value (computed):
{
"if": {
"properties": {
"vehicle_type_id": { "enum": ["type_1", "type_3"] }
},
"required": ["vehicle_type_id"]
},
"then": {
"required": ["current_range_meters"]
}
}Alternative Approach: Programmatic Validation
There's a third option that was not initially considered: programmatic validation - checking data directly in code rather than modifying schemas or using templates.
How It Works
Instead of patching schemas or using templates, run additional validation after schema validation:
Load static schemas → Validate with Everit → Run custom validators → Combine errors → Report
New interface:
public interface CustomValidator {
List<FileValidationError> validate(Map<String, JSONObject> feeds);
String getTargetFeed();
String getDescription();
}Example implementation (~30 lines vs 60 for schema patching):
public class ValidateRegionReferences implements CustomValidator {
@Override
public List<FileValidationError> validate(Map<String, JSONObject> feeds) {
List<FileValidationError> errors = new ArrayList<>();
JSONObject systemRegions = feeds.get("system_regions");
JSONObject stationInfo = feeds.get("station_information");
if (systemRegions == null || stationInfo == null) return errors;
// Extract valid region IDs
List<String> regionIdList = JsonPath.parse(systemRegions)
.read("$.data.regions[*].region_id");
Set<String> validRegionIds = new HashSet<>(regionIdList);
// Check each station
JSONArray stations = stationInfo.getJSONObject("data").getJSONArray("stations");
for (int i = 0; i < stations.length(); i++) {
JSONObject station = stations.getJSONObject(i);
if (station.has("region_id")) {
String regionId = station.getString("region_id");
if (!validRegionIds.contains(regionId)) {
errors.add(new FileValidationError(
null,
"#/data/stations/" + i + "/region_id",
"region_id '" + regionId + "' does not exist in system_regions",
"invalid_reference"
));
}
}
}
return errors;
}
@Override
public String getTargetFeed() { return "station_information"; }
@Override
public String getDescription() {
return "Validates region_id values exist in system_regions";
}
}Advantages
- Dramatically simpler - 50% less code than schema patching, no JsonPath schema navigation
- Clearer logic - Direct iteration and checks, obvious what's being validated
- Better error messages - Custom messages like "region_id 'R999' does not exist in system_regions"
- Type safety - Working with
Set<String>, not stringly-typed JSONObjects - Easier testing - Direct unit tests, no schema knowledge required
- Faster - No schema parsing/modification overhead
- Quick implementation - 3-4 weeks total (no upstream coordination)
- No conflicts - Validators are independent, can't interfere
Disadvantages
- No interoperability - Each validator implementation must reimplement in their language
- Validation logic separate from schema - Can't see full validation picture in schema files
- Error format differences -
schemaPathis null/N/A for programmatic checks
Comparison Summary
| Aspect | Schema Patching | Templates | Programmatic |
|---|---|---|---|
| Code per rule | 24-82 lines | ~3 (template) + 20-40 (replacement) | 20-40 lines |
| Complexity | High | Low | Low |
| Readability | Poor | Good | Excellent |
| Error messages | Confusing | Clear | Excellent |
| Performance | Slow | Medium | Fast |
| Interoperability | None | ⭐⭐⭐⭐⭐ | None |
| Maintainability | Poor | Good | Excellent |
| Type safety | None | None | Good |
| Testing | Hard | Medium | Easy |
| Implementation time | N/A (current) | 2-3 months | 3-4 weeks |
| Ecosystem benefit | None | High | Low |
Decision Framework
The right choice depends on the project's goals:
Choose Option A (Programmatic Validation) if:
- ✅ This is the primary/only GBFS validator implementation
- ✅ Simplicity and maintainability are top priorities
- ❌ Ecosystem-wide standardization is not a primary goal
Implementation effort: 3-4 weeks
Choose Option B (Schema Templates) if:
- ✅ Multiple GBFS validator implementations need to stay synchronized
- ✅ Ecosystem-wide interoperability is a primary goal
- ✅ Schemas should document all validation rules (transparency)
Implementation effort: 2-3 months (including upstream contribution)
Never Choose: Current Schema Patching ❌
The current approach has no advantages over either alternative:
- ❌ Most complex (JsonPath + schema structure knowledge)
- ❌ Worst error messages (phantom schema references)
- ❌ No interoperability anyway
- ❌ Hard to maintain and test
- ❌ Slowest performance
Conclusion
Current assessment: The schema patching approach should be replaced with one of the two alternatives.
Both alternatives are significantly better than the current approach:
- Option A (Programmatic): Best for developer experience, maintainability, and quick wins
- Option B (Templates): Best for ecosystem standardization and interoperability
Neither option requires backwards compatibility support - both allow clean migration.