Skip to content

Conversation

@royguo
Copy link

@royguo royguo commented Dec 9, 2025

What problem does this PR solve?

  • Implement Idle Hook: Added TaskGroup::SetWorkerIdleCallback to allow executing custom logic (e.g., IO polling) when a worker thread is idle.
  • Support Timeout Wait: Modified ParkingLot::wait to support an optional timeout, preventing workers from sleeping indefinitely when an idle callback is registered.
  • Enable Thread-per-Core IO: Enabled thread-local IO management (like io_uring ) by invoking the hook within the worker's thread context.
  • Add Unit Test: Added bthread_idle_unittest to verify worker isolation and idle callback execution.

The main reason that we need this is that:

  1. We want to make sure all iouring cqe (or other similar async engine, including some network call results), can be signaled within its original task group (we don't want cross-thread signal, which is very slow under observation)
  2. By using a user-defined callback, we can implement the following strategy:
  • bthread submit iouriing, and tries to reap cqe result
  • if no cqe found, wait() here and next bthread will wake it up.
  • But, if the current bthread is the last one, then we will rely on the Idle Callback in the task group to wake it up.
  1. Then we will make the whole stack thread-per-core and iouring-per-thread, we don't need another polling thread to reap all the CQEs, which will not be easy to avoid cross-thread signaling.

Issue Number: none

Problem Summary:

What is changed and the side effects?

Changed:

  • task_group.h/.cc
    • Added a new function and a few related static member variables to handle idle callbacks
  • parking_lot.h
    • Added a new timeout param to wait() function, with default NULL value, which will not break current implementation.

Side effects:

  • Performance effects:
    No

  • Breaking backward compatibility:
    NO


Check List:

Copilot AI review requested due to automatic review settings December 9, 2025 04:55
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for idle callbacks in TaskGroup to enable flexible thread-per-core I/O management. The feature allows users to register a callback that executes custom logic (e.g., io_uring polling) when worker threads are idle, enabling efficient async I/O without cross-thread signaling.

Key Changes

  • Added SetWorkerIdleCallback API to register user-defined callbacks that run when workers have no tasks
  • Modified ParkingLot::wait to support optional timeout for periodic wake-ups during idle periods
  • Implemented idle callback invocation in TaskGroup::wait_task with configurable timeout-based polling

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 12 comments.

File Description
test/bthread_idle_unittest.cpp Adds unit test verifying worker thread isolation and idle callback execution with thread-local contexts
src/bthread/task_group.h Declares idle callback API with function pointer typedef, static members for callback state, and timeout parameter
src/bthread/task_group.cpp Implements idle callback registration and invocation logic with timeout-based polling in wait loops
src/bthread/parking_lot.h Extends wait method signature to accept optional timeout parameter for conditional waiting

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@royguo royguo force-pushed the task-group-idle-task branch from 52da249 to 767005f Compare December 9, 2025 07:10
// should check the runqueue again immediately.
// |timeout_us|: The timeout for waiting if the callback returns false.
// 0 is not acceptable.
static bool SetWorkerIdleCallback(OnWorkerIdleFn fn, void* user_ctx,
Copy link
Contributor

@wwbmmm wwbmmm Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user interface of bthread should be placed in the src/bthread/bthread.h or src/bthread/unstable.h. And it should be C style, like bthread_set_worker_idle_callback(xxx).
See bthread_set_worker_startfn for reference.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refine the code by using brpc's design pattern, providing a bthread_set_xxx function for user.

return false;
}

std::call_once(g_worker_idle_once, [fn, user_ctx, timeout_us]() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is over-designed. These config only initialize on program startup, you don't need to use std::call_once or atomic variables. See bthread_set_worker_startfn for reference.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refine the code by using brpc's design pattern, providing a bthread_set_xxx function for user.

@yanglimingcn
Copy link
Contributor

If the request is submitted to the local io_uring function in pthread1, this bthread may be scheduled to pthread2 later. In this case, pthread1 still needs to reap the corresponding CQE and then notify it.

@royguo
Copy link
Author

royguo commented Dec 10, 2025

If the request is submitted to the local io_uring function in pthread1, this bthread may be scheduled to pthread2 later. In this case, pthread1 still needs to reap the corresponding CQE and then notify it.

I am not sure if I understand your comment correctly.

  1. bthread1 (under taskgroup1/pthread1) submitted to iouring1 (which is bound to taskgroup1)
  2. bthread1 butex.wait, for future wake up.
  3. inside task group1's idle function, it reap iouring1 and get cqe, then notify bthread1 by butex.signal

I assume the 3th step will put bthread1 to current taskgroup (taskgroup1)'s locak rq_, and taskgroup1 will pop rq_ inside the main loop instead of using remote_rq_ (which may introduce thread futex).

I am new to brpc, so correct me if i understand incorreclty, thanks.

BTW, the idle function here is only useful when there's only a few requests, to make sure we can reap the last one bthread. If we have heavy concurrency requests, we will use the working bthread reaping previous submitted IOs, insteand of waiting for the idle function. So even if we cannot avoid cross wake up inside the idle function, it will be ok.

- Implement Idle Hook: Added TaskGroup::SetWorkerIdleCallback to allow executing custom logic (e.g., IO polling) when a worker thread is idle.
- Support Timeout Wait: Modified ParkingLot::wait to support an optional timeout, preventing workers from sleeping indefinitely when an idle callback is registered.
- Enable Thread-per-Core IO: Enabled thread-local IO management (like io_uring ) by invoking the hook within the worker's thread context.
- Add Unit Test: Added bthread_idle_unittest to verify worker isolation and idle callback execution.
@royguo royguo force-pushed the task-group-idle-task branch from 767005f to cac054a Compare December 10, 2025 04:54
return true;
}
_pl->wait(st);
// Instead of waiting for signal, we shall wake up if there's a user idle task here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is repeated twice. Maybe wrap it into a helper function.

@yanglimingcn
Copy link
Contributor

yanglimingcn commented Dec 10, 2025

I believe you're trying to create a run-to-complete model. In a bRPC scenario, the easiest ways to implement this model seem to be:

  1. Using RDMA's polling mode
  2. Modifying TCP's epoll_wait to implement polling.
    Other methods, as I understand them, are event-triggered; io_uring can also achieve event triggering.

The network uses an event-triggered approach, while the storage uses a polling approach, which seems to mismatch the models.

@MalikHou
Copy link

I believe you're trying to create a run-to-complete model. In a bRPC scenario, the easiest ways to implement this model seem to be:

  1. Using RDMA's polling mode
  2. Modifying TCP's epoll_wait to implement polling.
    Other methods, as I understand them, are event-triggered; io_uring can also achieve event triggering.

The network uses an event-triggered approach, while the storage uses a polling approach, which seems to mismatch the models.

I disagree with the notion that asynchronous requests in storage during iouring require a one-loop (thread)-per-core concept. Because bthreads lack scheduler pause points, we can only simulate async operations across other threads, leading to greater overhead. essentially, we are trying to make bthread worker CPU resource utilization more efficient under async io

@yanglimingcn
Copy link
Contributor

What I mean is that in the io_uring scenario, if polling mode is not applicable, eventfd can be used to register io_uring events with epoll. I think the efficiency problem of the bthread scheduling model is a common issue unrelated to io_uring.

@royguo
Copy link
Author

royguo commented Dec 11, 2025

What I mean is that in the io_uring scenario, if polling mode is not applicable, eventfd can be used to register io_uring events with epoll. I think the efficiency problem of the bthread scheduling model is a common issue unrelated to io_uring.

The reason I added an idle function here is, we want to have a chance to run some user-defined code during the idle time of the task group's pthread worker. Reaping some iouring cqe is one of its use cases, we can also use this mechanism for other purposes:

  • Reap async tasks calls (e.g., RocksDB operations in another thread pool, offloaded async compute functions)

@yanglimingcn
Copy link
Contributor

Yes, I know you want a mechanism to harvest asynchronous responses, and this PR #2560 is actually designed to support this scenario. It doesn't require modifying the bthread scheduling strategy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants