Skip to content

Oracle Sink Plugin: Lack of MERGE/UPSERT Leads to Duplicate Issues on Retries #635

@cjac

Description

@cjac

Oracle Sink Plugin: Lack of MERGE/UPSERT Leads to Duplicate Issues on Retries

Plugin: Oracle Batch Sink (oracle-plugin)
Component: Sink

Describe the bug
The current Oracle Sink plugin appears to use standard JDBC batch INSERT statements. When used in a Spark environment like Cloud Data Fusion, if a Spark task fails mid-batch (e.g., due to network issues, or database errors like ORA-08177 transaction serialization failures), Spark will retry the task.

Because the sink operation isn't idempotent, this retry mechanism can lead to ORA-00001: unique constraint violated errors if the target table has unique constraints. If no such constraints exist, it can lead to silent duplicate record insertion.

This issue has been observed by customers, causing data integrity problems and pipeline failures.

To Reproduce

  1. Configure a pipeline with an Oracle Sink.
  2. Ensure the target Oracle table has a unique constraint.
  3. Simulate a task failure during the sink write operation (e.g., by temporarily making the database unavailable or causing a serialization error).
  4. Observe that on retry, ORA-00001 errors are likely to occur.

Expected behavior
Ideally, the sink plugin should handle retries more gracefully. An idempotent sink would prevent duplicate inserts. Using Oracle's MERGE statement (or similar upsert logic) would allow the sink to either insert new rows or update existing ones if they are encountered again during a retry, based on the table's primary or unique keys.

Workarounds

  • Using TRANSACTION_READ_COMMITTED can sometimes mitigate the ORA-08177 errors that trigger retries, but doesn't solve the idempotency issue itself.
  • Ensuring the database environment is stable to prevent the initial task failures.

Feature Request
Enhance the Oracle Sink plugin to support an optional MERGE or UPSERT mode. This would significantly improve resilience and data integrity in environments where task retries are possible.

Affected Versions (if known): Observed with oracle-plugin version 1.10.7, but likely affects other versions too.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions