Skip to content

Deadlock/Stuck Session in SL_MODE during Extended Protocol (PBE) with PostgreSQL #136

@wang-haihua

Description

@wang-haihua

Hello Pgpool Team,

I am experiencing an issue with Pgpool-II version 4.4.4.

Environment:

  • Pgpool-II Version: 4.4.4
  • PostgreSQL Nodes: 3 nodes
  • Mode: Streaming Replication
  • Configuration: load_balance_mode is on.
  • backend_weight is set to 0:2:2 (primary : standby-1 : standby-2).
  • Client Driver: PostgreSQL JDBC Driver (Uses Extended Protocol)

Problem Description:
An intermittent session deadlock (stuck session) is observed when using pgpool-II 4.4 STABLE in Streaming Replication Mode (SL_MODE) to proxy connections to PostgreSQL 14. This occurs specifically when clients utilize the Extended Protocol (e.g., Prepared Statements via JDBC).

The root cause appears to be a race condition between the pgpool-II internal state machine (query_in_progress flag) and the asynchronous arrival of the final backend response message, likely exacerbated by network transient delays.

Observed Symptoms:

  1. Client Application: Connects via JDBC (Extended Protocol).

  2. pgpool-II Subprocess:

    • State: idle in transaction (pg_stat_activity on the client side, if monitored).
    • pstack: Stuck in a read(2) system call within the ProcessBackendResponse function.
  3. PostgreSQL Backend Process:

    • State: active
    • Wait Event: ClientRead (waiting for the client/pgpool-II to send the next command).
    • OS Process State: Observed to be in the BIND phase (e.g., postgres: postgres ... BIND), which confirms the connection was recently processing a Prepared Statement.

This combination is critical: The backend has finished execution (indicated by ClientRead) and is waiting for pgpool-II, while pgpool-II is stuck reading the backend's final response, creating a self-deadlock.

Code Context:
The issue is triggered when the flow executes the ProcessBackendResponse call in the final else block:

// src/protocol/pool_process_query.c/pool_process_query(POOL_CONNECTION * frontend, POOL_CONNECTION_POOL * backend, int reset_request)
// [Occurs when pool_is_query_in_progress() is FALSE]
// ...
					else
					{
						for (i = 0; i < NUM_BACKENDS; i++)
						{
                            // ... checks for pending data ...
							if (pool_ssl_pending(CONNECTION(backend, i)) ||
								!pool_read_buffer_is_empty(CONNECTION(backend, i)))
							{
								if (IS_MAIN_NODE_ID(i))
								{
									status = ProcessBackendResponse(frontend, backend, &state, &num_fields); // <-- Stuck here (read(2))
// ...

My Question:
I don't understand why this happens. Is this question related to the deadlock mentioned in fixme in Parse function?

//src/protocol/pool_proto_modules.c/Parse(POOL_CONNECTION * frontend, POOL_CONNECTION_POOL * backend, int len, char *contents)
//...
	else if (SL_MODE)
	{
		POOL_PENDING_MESSAGE *pmsg;

		/*
		 * XXX fix me:even with streaming replication mode, couldn't we have a
		 * deadlock
		 */
		pool_set_query_in_progress();
//...

Please let me know if you need any more information. Any help would be greatly appreciated.

Thank you.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions