-
Notifications
You must be signed in to change notification settings - Fork 97
Description
Hello Pgpool Team,
I am experiencing an issue with Pgpool-II version 4.4.4.
Environment:
- Pgpool-II Version: 4.4.4
- PostgreSQL Nodes: 3 nodes
- Mode: Streaming Replication
- Configuration: load_balance_mode is on.
- backend_weight is set to 0:2:2 (primary : standby-1 : standby-2).
- Client Driver: PostgreSQL JDBC Driver (Uses Extended Protocol)
Problem Description:
An intermittent session deadlock (stuck session) is observed when using pgpool-II 4.4 STABLE in Streaming Replication Mode (SL_MODE) to proxy connections to PostgreSQL 14. This occurs specifically when clients utilize the Extended Protocol (e.g., Prepared Statements via JDBC).
The root cause appears to be a race condition between the pgpool-II internal state machine (query_in_progress flag) and the asynchronous arrival of the final backend response message, likely exacerbated by network transient delays.
Observed Symptoms:
-
Client Application: Connects via JDBC (Extended Protocol).
-
pgpool-II Subprocess:
- State: idle in transaction (pg_stat_activity on the client side, if monitored).
- pstack: Stuck in a read(2) system call within the ProcessBackendResponse function.
-
PostgreSQL Backend Process:
- State: active
- Wait Event: ClientRead (waiting for the client/pgpool-II to send the next command).
- OS Process State: Observed to be in the BIND phase (e.g., postgres: postgres ... BIND), which confirms the connection was recently processing a Prepared Statement.
This combination is critical: The backend has finished execution (indicated by ClientRead) and is waiting for pgpool-II, while pgpool-II is stuck reading the backend's final response, creating a self-deadlock.
Code Context:
The issue is triggered when the flow executes the ProcessBackendResponse call in the final else block:
// src/protocol/pool_process_query.c/pool_process_query(POOL_CONNECTION * frontend, POOL_CONNECTION_POOL * backend, int reset_request)
// [Occurs when pool_is_query_in_progress() is FALSE]
// ...
else
{
for (i = 0; i < NUM_BACKENDS; i++)
{
// ... checks for pending data ...
if (pool_ssl_pending(CONNECTION(backend, i)) ||
!pool_read_buffer_is_empty(CONNECTION(backend, i)))
{
if (IS_MAIN_NODE_ID(i))
{
status = ProcessBackendResponse(frontend, backend, &state, &num_fields); // <-- Stuck here (read(2))
// ...My Question:
I don't understand why this happens. Is this question related to the deadlock mentioned in fixme in Parse function?
//src/protocol/pool_proto_modules.c/Parse(POOL_CONNECTION * frontend, POOL_CONNECTION_POOL * backend, int len, char *contents)
//...
else if (SL_MODE)
{
POOL_PENDING_MESSAGE *pmsg;
/*
* XXX fix me:even with streaming replication mode, couldn't we have a
* deadlock
*/
pool_set_query_in_progress();
//...Please let me know if you need any more information. Any help would be greatly appreciated.
Thank you.