Skip to content

Conversation

@Fix-Point
Copy link
Contributor

@Fix-Point Fix-Point commented Dec 25, 2025

Summary

This PR is the part IV of #17556. The main changes in this PR are:

  • Removed the functional incorrect hrtimer implementation.
  • Removed CONFIG_SCHED_TICKLESS_TICK_ARGUMENT to simplify kernel configuration and expiration handling.
  • Introduced a reusable hrtimer_queue component, allowing users to freely compose it with any hardware timer to implement their own hrtimer instance.
  • Introduced OS hrtimer submodule for OS scheduling and timing.

Background

High-resolution Timer (HRTimer) is a timer abstraction capable of achieving nanosecond-level timing precision, primarily used in scenarios requiring high-precision clock events. With the advancement of integrated circuit technology, modern high-precision timer hardware (such as the typical x86 HPET) can already meet sub-nanosecond timing requirements and offer femtosecond-level jitter control.

Although the current hardware timer abstraction (up_alarm/up_timer) in the NuttX kernel already supports nanosecond-level timing, its software timer abstraction, wdog, and the timer expiration interrupt handling process remain at microsecond-level (tick) precision, which falls short of high-precision timing demands. Therefore, it is necessary to implement a new timer abstraction—HRTimer, to address the precision limitations of wdog. HRTimer primarily provides the following functional interfaces:

  • Set a timer in nanoseconds: Configure a software timer to trigger at a specified nanosecond time.
  • Cancel a timer: Cancel the software timer.
  • Handle timer expiration: Execute expiration processing after the timer event is triggered.

Design

The new NuttX HRTimer is designed to address the issues of insufficient precision in the current NuttX wdog. It draws on the strengths of the Linux HRTimer design while improving upon its weaknesses. As Figure 1 shows, the HRTimer design is divided into two parts: the HRTimer Queue and the HRTimer. The HRTimer Queue is a reusable component that allows users to freely customize their own HRTimer interface by pairing it with a private timer driver, without needing to modify the kernel code.

graph TD
    subgraph "Public Headers"
        A[hrtimer_queue_type.h<br/> Public Type Definition]
    end

    subgraph "Internal Implementation Headers"
        B1[hrtimer_type_list.h<br/>List Implementation]
        B2[hrtimer_type_rb.h<br/>RB-Tree Implementation]
        B3[...Others]
        
        C[hrtimer_queue.h<br/>Reusable Component<br/>HRTimer Queue Internal]
    end

    subgraph "Instances"
        D[hrtimer.h<br/>OS HRTimer API ]
        E[hrtimer.c<br/>OS HRTimer Implementation]

        F[myhrtimer.h<br/>Customed HRTimer API ]
        G[myhrtimer.c<br/>Customed HRTimer Implementation]
    end

    %% Dependent
    A --> B1
    A --> B2
    A --> B3
    
    B1 --> C
    B2 --> C
    B3 --> C

    A --> D
    C --> E
    D --> E

    A --> F
    C --> G
    F --> G

    %% Style
    style A fill:#e1f5fe
    style B1 fill:#f3e5f5
    style B2 fill:#f3e5f5
    style B3 fill:#f3e5f5
    style C fill:#fff3e0
    style D fill:#e8f5e8
    style E fill:#e8f5e8
    style F fill:#e8f5e8
    style G fill:#e8f5e8
Loading

Figure 1. The architecture of the HRTimer implementation

API Design

The HRTimer Queue is a zero-performance-overhead, composable, and customizable abstraction that provides only asynchronous-style interfaces:

  • hrtimer_queue_start(queue, timer): Asynchronously sends an HRTimer to HRTimer queue.
  • hrtimer_queue_async_cancel(queue, timer): Asynchronously cancels an HRTimer and returns the current reference count of the timer.
  • hrtimer_queue_wait(queue, timer): Waits for the release of all references to the HRTimer to obtain ownership of the HRTimer data structure.

All other user interfaces can be composed based on these three interfaces.

On top of the HRTimer Queue, users only need to implement the following interfaces to customize their own HRTimer implementation:

  • hrtimer_expiry(current): Handles timer expiration, typically called within the execution path of the corresponding timer hardware interrupt handler.
  • hrtimer_reprogram(queue, next_expired): Sets the next timer event.
  • hrtimer_current(): Gets the current time to set relative timers.

After implementing the above three interfaces, users can include one of the hrtimer_type_xxx.h implementation to compose their own hrtimer implementation, which mainly includes the following interfaces:

  • hrtimer_restart(timer, func, arg, time, mode): Restarts a timer that has been asynchronously canceled (its callback function might still be executing). This interface is designed to explicitly remind users to be aware of concurrency issues, as concurrency problems are prone to occur in actual programming and are very difficult to locate. Providing such an interface facilitates quick identification of concurrency issues.
  • hrtimer_start(timer, func, arg, time, mode): Starts a stopped timer. The mode parameter indicates whether it is a relative or absolute timer.
  • hrtimer_async_cancel(timer): Asynchronously cancels a timer. Note that the semantics of this interface are completely different from Linux's try_to_cancel. It ensures that the timer can definitely be canceled successfully, but may need to wait for its callback function to finish execution.
  • hrtimer_cancel(timer): Synchronously cancels a timer. If the timer's callback function is still executing, this function will spin-wait until the callback completes. It ensures that the user can always obtain ownership of the timer.

The design characteristics of HRTimer are as follows:

  • Strict and Simplified HRTimer State Machine: In the old wdog design, wdog could be reset in any state, which introduced unnecessary complexity to certain function implementations. For example, wd_start had to account for the possibility of restarting. In the new HRTimer design, an HRTimer that has already been started and not canceled cannot be started again.

  • Abstracted Sorting Queue: Since no single design can be optimal for all application scenarios, HRTimer abstracts interfaces for inserting and deleting nodes in the sorting queue. This allows for different data structure implementations to be configured for different application scenarios, as shown in Table 1.

    Table 1: Comparison of Several Sorting Queue Implementations

    Sorting Queue Implementation Insert Delete Delete Head Determinism Suitable Scenarios
    Doubly Linked List O(n) O(1) O(1) Moderate Embedded / Soft Real-time Systems
    Red-Black Tree O(logn) O(logn) O(logn) Slightly Poor General Purpose
  • Callback Execution Without Lock Held: HRTimer implements callback execution without lock held, ensuring that the system's blocking time is not limited by the user's callback function. However, this introduces additional states and waits, where waiting for reference release is primarily implemented using hazard pointers. This will be explained in detail in the subsequent state transition diagram.

  • Clear HRTimer Object Ownership Transfer Path: In the wdog implementation, the wdog callback function could restart the current timer directly without regard to ownership, potentially causing concurrency issues. In the new implementation, the HRTimer callback function cannot restart itself. Instead, inspired by Linux's design, the callback function returns whether a restart is needed. If a restart is required, the thread executing the callback function re-enqueues it; otherwise, the thread releases ownership. This change ensures a clear ownership transfer path for the HRTimer object.

  • Non-blocking Timer Restart: To address the issue in Linux where restarting a timer must wait for an already-started callback function to finish, which reduces the real-time performance, the new HRTimer implements a non-blocking timer restart mechanism. This mechanism reuses the last bit of the hazard pointer to mark whether the thread executing the callback has lost write ownership of the HRTimer object. After hrtimer_async_cancel is called, other threads executing callbacks will lose write ownership of the HRTimer (though their callback functions may still be executing). This means the HRTimer can be restarted and repurposed for other callbacks without waiting for the callback function to complete. However, note that the callback function might still be executing, requiring users to consider this concurrency and implement proper synchronization mechanisms within their callback functions. To explicitly remind users of this concurrency, an HRTimer whose callback function has not yet completed execution must be restarted using hrtimer_restart. This function relaxes the state checks on the HRTimer, allowing a timer with the callback running to be started.

  • Deterministic Timer Cancellation: To address the starvation issue present in Linux's timer cancellation, the new HRTimer implementation sets a cancellation state via hrtimer_async_cancel. This cancellation state has a unique and deterministic state transition, eliminating starvation. Memory reclamation is performed through hazard pointer checking loops. Hazard pointer checking ensures that all threads finish executing the callback function and release read ownership (reference release) of the specified HRTimer object.

The valid state transitions of an HRTimer object are shown in Figure 2. States are represented using a simplified notation of State|Ownership, such as HRTIMER_PENDING|shared. The meanings of the simplified ownership markers are as follows:

Ownership Markers

  • |private indicates that the resource is exclusively owned by a specific thread t. Only the owning thread t can read from or write to this resource.
  • |shared indicates that the resource is globally shared and can be read by any thread. However, only the thread t that holds the global lock l (t = Owned(l)) can obtain write ownership of this resource.
  • |half_shared indicates that the resource may be accessed by multiple threads, but only the thread that called async_cancel holds write ownership of this resource. Modifications to it by threads executing callback functions are prevented.

The resource ownership here uses a simplified notation. In actual static analysis or formal verification processes, more complex abstractions such as resource algebra might be employed.

All state transitions not described in the diagram must return failure. For example, a timer in the HRTIMER_PENDING state cannot be started (start) again. Note that there is one exception: a thread that is already in the HRTIMER_CANCELED state can legally call hrtimer_async_cancel again, and the state remains unchanged.

To avoid the overhead caused by threads waiting for callback functions to finish executing, HRTimer adds a restart interface. Under normal circumstances, the start interface cannot start a timer that is already in the canceled state. Only when the user uses this restart interface can a timer whose callback function has not yet completed be started normally. Using this interface serves to explicitly remind users to pay attention to concurrency within their callback functions. Furthermore, when concurrency issues arise with HRTimer, it helps in pinpointing the source of the problem—issues can only originate from callback functions where restart was used to restart the timer.

%%{
  init: {
    'theme': 'base',
    'themeVariables': {
      'primaryColor': '#FFFFFF',
      'primaryTextColor' : '#000000',
      'mermiad-container': "#FFFFFF",
      'primaryBorderColor': '#000000',
      'lineColor': '#000000',
      'secondaryColor': '#FFFFFF',
      'tertiaryColor': '#000000'
    },
    'sequence': { 'mirrorActors': false }
  }
}%%

stateDiagram-v2
    HRTIMER_COMPLETED|private --> HRTIMER_PENDING|shared : hrtimer_start
    HRTIMER_PENDING|shared --> HRTIMER_COMPLETED|private : hrtimer callback return 0 in hrtimer_expiry
    HRTIMER_PENDING|shared --> HRTIMER_PENDING|shared : hrtimer callback return non-zero in hrtimer_expiry
    HRTIMER_PENDING|shared --> HRTIMER_CANCELED|half_shared : hrtimer_async_cancel
    HRTIMER_CANCELED|half_shared --> HRTIMER_CANCELED|private : hrtimer_cancel wait all cores release the references to the timer.
    HRTIMER_CANCELED|half_shared --> HRTIMER_PENDING|shared : hrtimer_restart
    HRTIMER_CANCELED|private --> HRTIMER_COMPLETED|private : Complete the cancel
Loading

Figure 2. HRTimer State Transition Diagram

Performance Evaluation

We conducted 1 million interface calls on the intel64:nsh (Intel Core i7 12700) platform and measured their average execution CPU cycles, as shown in the Figure 3 below. It can be observed that the overhead for starting and asynchronously canceling timers is significantly reduced compared to wdog.

  • The speedup of hrtimer_start compared to wd_start is 2.10x.
  • The speedup of hrtimer_start & cancel compared to wd_start & cancel is 1.57x.
    Additionally, after enabling hrtimer, wdog processing is treated as an hrtimer timer, which lowers the overhead of the wdog interface.
  • wd_start achieved a speedup of 1.73x when hrtimer is enabled.
hrtimer tsv

Figure 3. HRtimer API Latency Test

Plan

The merge plan for this PR is as follows:

  1. Introduce seqcount [Done]
  2. Simplify the timer expiration processing flow in preparation for introducing HRtimer. [Done]
  3. Modify the semantics of scheduling functions to allow immediate timer event triggering. [Done]
  4. Fix buggy wdog module. [Done]
  5. Introduce hrtimer_queue.
  6. Introduce hrtimer.
  7. [WIP] Introdude hrtimer_test.
  8. [WIP] Add HRTimer documents.
  9. [WIP] Enhance the oneshot arch_alarm to support non-tick arguments mode.

Impact

HRTimer currently is disabled by default, so it has no effect on system.

Testing

Tested on intel64:nsh, rv-virt:smp, qemu-armv8a:smp, ostest passed. The hrtimer SMP stress test ran for over 72 hours without errors. The parallel stress test cases is showed in Appendix.

Explanation

Here we need to provide some explanations to avoid misunderstandings with others' work:

Is this hrtimer an improvement based on @wangchdo work #17517?

Why are we removing the existing hrtimer implementation and replacing it with this one? Is this disrespectful to @wangchdo 's work?

Simple modifications cannot fix @wangchdo implementation. Therefore, I believe the most effective approach is to remove the existing hrtimer implementation and replace it with this one.

For example, two key implementations severely violate the ownership invariant:

  1. hrtimer expiration callback API design: The hrtimer pointer is explicitly exposed to users by being passed to the callback function. If the hrtimer is restarted while the callback is executing, the hrtimer has already been modified, posing risks and potential errors for the user-obtained parameters. For instance, in the following example, if the second test_callback is triggered before the first test_callback finishes executing, it may cause B to be updated twice.
uint32_t tmp;
static uint32_t test_callback(FAR struct hrtimer_s *hrtimer)
{
  .... // some heavy jobs
  uint32_t *count = hrtimer->arg; // Incorrect! Already updated by restarting the hrtimer.
  *count++;
  ....
  
}
...
static uint32_t A = 0;
static uint32_t B = 0;

timer.arg = &A;
hrtimer_start(&timer, 1000, HRTIMER_MODE_REL); // delay 1000ns to update A
...
hrtimer_cancel(&timer);
timer.arg = &B; 
hrtimer_start(&timer, 0, HRTIMER_MODE_REL); // delay 0ns to update B;

ASSERT(A <= 1 && B <= 1) // ASSERT!!! B can be updated twice
  1. hrtimer expiration handling process also violates the invariant, where multiple owners may operate an expired hrtimer. Even after I pointed out the issue, @wangchdo added a reference count to address memory reclamation during cancellation. However, it still suffers from concurrency serialization issues, which can cause a newly started periodic timer to be overwritten by an old periodic timer, as shown in the following thread interleaving:
Timeline:
CPU1 (Timer Callback)                     CPU2 (Set New Timer)
------|--------------------------------------|-------------------------
      |                                      |
      t1: timer triggers at t1               |
      |--- Callback starts                   |
      |   hrtimer->state <- running          |
      |                                      |
      | [Unlock]                             |--- hrtimer_cancel(hrtimer)
      |                                      |--- hrtimer_start(hrtimer, t2)
      |                                      |    hrtimer->state <- armed
      |   ...                                | [Unlock]
      |                                      | ...
      |   Callback executes...               | [Lock]
      |                                      |--- New timer triggered
      |                                      |    hrtimer->state <- running
      |                                      | [Unlock]
      |                                      |    Calllback executes....
      |                                      |
      |   Returns next = UINT32_MAX          |
      | [Lock]                               |
      | if (hrtimer->state == running)       |
      |   hrtimer->expired                   | 
      |    <- t2 + UINT32_MAX (Incorrect! expected t1 + UINT32_MAX)
      |   hrtimer->state <- armed            | 
      | [Unlock]                             |
      |                                      |  Returns next = 1
      |                                      |  [Lock]
      |                                      |--- hrtimer->state != running
      |                                      |    failed to set next (Incorrect!)
      |                                      |    The previous cancelled timer
      |                                      |    callback restarted the new timer.
------|--------------------------------------|-------------------------

More similar concurrency issues could be cited here. As I have emphasized again and again, the fundamental problem is the violation of the ownership invariant of hrtimer: only one owner can modify the hrtimer object at a time.

Designing functionally correct concurrent algorithms is not easy at all. Relying solely on engineering experience is insufficient; theoretical methods are necessary to avoid errors, such as adapting resource invariants and using structured diagrams to clarify every possible concurrent state transition. @wangchdo's design failed to consider how to handle concurrency correctly, making it nearly impossible to improve upon his code base.

From an efficiency perspective, replacing @wangchdo 's implementation with this PR's implementation—which is functionally correct, offers better code reusability, and is more user-friendly—can save both of us time and allow us to focus on more meaningful optimizations.

Appendix

The hrtimer parallel stress test cases is showed as following, and they will be pushed to nuttx-apps after this PR is merged:

/****************************************************************************
 * apps/testing/ostest/hrtimer.c
 *
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.  The
 * ASF licenses this file to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance with the
 * License.  You may obtain a copy of the License at
 *
 *   http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
 * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  See the
 * License for the specific language governing permissions and limitations
 * under the License.
 *
 ****************************************************************************/

/****************************************************************************
 * Included Files
 ****************************************************************************/

#include <nuttx/config.h>
#include <nuttx/arch.h>
#include <nuttx/sched.h>
#include <nuttx/spinlock.h>

#include <assert.h>
#include <pthread.h>
#include <stdio.h>
#include <syslog.h>
#include <unistd.h>

#include <nuttx/hrtimer.h>

/****************************************************************************
 * Pre-processor Definitions
 ****************************************************************************/

#define HRTIMER_TEST_RAND_ITER     (1024 * 2)
#define HRTIMER_TEST_THREAD_NR     (CONFIG_SMP_NCPUS * 8)
#define HRTIMER_TEST_TOLERENT_NS   (10 * NSEC_PER_TICK)
#define HRTIMER_TEST_CRITICAL_SECTION  1024

#define hrtest_printf(s, ...)      printf("[%d] " s, this_cpu(), __VA_ARGS__)

#define hrtest_delay(delay_ns)     usleep(delay_ns / 1000 + 1)

#define hrtimer_start(timer, cb, arg, delay_ns) hrtimer_start(timer, cb, arg, delay_ns, HRTIMER_MODE_REL)
#define hrtimer_restart(timer, cb, arg, delay_ns) hrtimer_restart(timer, cb, arg, delay_ns, HRTIMER_MODE_REL)

/****************************************************************************
 * Private Type
 ****************************************************************************/

typedef struct hrtimer_tparam_s
{
  FAR hrtimer_t    *timer;
  FAR spinlock_t   *lock;
  uint64_t          interval;
  volatile uint64_t callback_cnt;
  volatile uint64_t triggered_ns;
  volatile uint8_t  current_cpu;
  volatile uint8_t  state;
} hrtimer_tparam_t;

/****************************************************************************
 * Private Functions
 ****************************************************************************/

static uint64_t hrtimer_test_callback(void *arg, uint64_t expired_ns)
{
  FAR hrtimer_tparam_t *hrtimer_tparam = (FAR hrtimer_tparam_t *)arg;

  /* Record the system tick at which the callback was triggered */

  // clock_systime_nsec(&hrtimer_tparam->triggered_ns);
  hrtimer_tparam->triggered_ns = clock_systime_nsec();

  /* Increment the callback count */

  hrtimer_tparam->callback_cnt   += 1;

  return 0;
}

static void hrtimer_test_checkdelay(int64_t diff, uint64_t delay_ns)
{
  /* Ensure the watchdog trigger time is not earlier than expected. */

  ASSERT(diff - delay_ns >= 0);

  /* If the timer latency exceeds the tolerance, print a warning. */

  if (diff - delay_ns > HRTIMER_TEST_TOLERENT_NS)
    {
      hrtest_printf("WARNING: hrtimer latency ns %" PRId64
                    "(> %u may indicate timing error)\n",
                    diff - delay_ns,
                    (unsigned)HRTIMER_TEST_TOLERENT_NS);
    }
}

static void hrtimer_test_once(FAR hrtimer_t *timer,
                              FAR hrtimer_tparam_t *param,
                              uint64_t delay_ns)
{
  uint64_t   cnt;
  int64_t    diff;
  uint64_t   timerset_ns;
  irqstate_t flags;

  hrtest_printf("hrtimer_test_once %" PRIu64 " ns\n", delay_ns);

  /* Save the current callback count. */

  cnt = param->callback_cnt;

  /* Enter a critical section to prevent interruptions. */

  flags = up_irq_save();
  sched_lock();

  /* Record the current system tick before setting the watchdog. */

  // clock_systime_nsec(&timerset_ns);
  timerset_ns = clock_systime_nsec();

  ASSERT(hrtimer_start(timer, hrtimer_test_callback,
                       param, delay_ns) == OK);

  up_irq_restore(flags);
  sched_unlock();

  /* Wait until the callback is triggered exactly once. */

  while (cnt + 1 != param->callback_cnt)
    {
      hrtest_delay(delay_ns);
    }

  /* Check if the delay is within the acceptable tolerance. */

  diff = param->triggered_ns - timerset_ns;

  hrtimer_test_checkdelay(diff, delay_ns);

  hrtimer_cancel(timer);
}

static void hrtimer_test_rand(FAR hrtimer_t *timer,
                              FAR hrtimer_tparam_t *param, uint64_t rand_ns)
{
  uint64_t   cnt;
  unsigned   idx;
  uint64_t   delay_ns;
  uint64_t   timer_setns;
  int64_t    diff;
  irqstate_t flags = 0;

  hrtest_printf("hrtimer_test_rand %" PRIu64 " ns\n", rand_ns);

  /* Perform multiple iterations with random delays. */

  for (idx = 0; idx < HRTIMER_TEST_RAND_ITER; idx++)
    {
      /* Generate a random delay within the specified range. */

      delay_ns = rand() % rand_ns;

      DEBUGASSERT(timer->func == NULL);

      /* Enter critical section if the callback count is odd. */

      cnt = param->callback_cnt;

      if (cnt % 2u)
        {
          flags = up_irq_save();
        }

      timer_setns = clock_systime_nsec();
      ASSERT(hrtimer_start(timer, hrtimer_test_callback, param,
                           delay_ns) == 0);
      if (cnt % 2u)
        {
          up_irq_restore(flags);
        }

      /* Decide to wait for the callback or cancel the watchdog. */

      if (delay_ns % 2u)
        {
          /* Wait for the callback. */

          while (cnt + 1u != param->callback_cnt)
            {
              hrtest_delay(delay_ns);
            }

          /* Check the delay if the callback count is odd. */

          if (cnt % 2u)
            {
              diff = (sclock_t)(param->triggered_ns - timer_setns);
              hrtimer_test_checkdelay(diff, delay_ns);
            }
        }

      hrtimer_cancel(timer);
      DEBUGASSERT(timer->func == NULL);
    }

  hrtimer_cancel(timer);
}

static uint64_t hrtimer_test_rand_cancel_callback(void *arg, uint64_t expired_ns)
{
  FAR hrtimer_tparam_t *tparam = (FAR hrtimer_tparam_t *)arg;
  FAR spinlock_t *lock = tparam->lock;
  uint64_t   delay_ns = 0;
  irqstate_t flags = spin_lock_irqsave(lock);

  /* Random sleep */

  delay_ns = tparam->triggered_ns % tparam->interval;

  /* Check if the version is same. */

  if (expired_ns == tparam->timer->expired)
    {
      tparam->triggered_ns = clock_systime_nsec();

      /* Increment the callback count */

      tparam->callback_cnt++;
    }

  spin_unlock_irqrestore(lock, flags);

  up_ndelay(delay_ns);

  return 0;
}

static void hrtimer_test_rand_cancel(FAR hrtimer_t *timer,
                                     FAR hrtimer_tparam_t *param,
                                     uint64_t rand_ns)
{
  uint64_t   cnt;
  unsigned   idx;
  uint64_t   delay_ns;
  irqstate_t flags;
  spinlock_t rand_cancel_lock = SP_UNLOCKED;

  hrtest_printf("hrtimer_test_rand cancel %" PRIu64 " ns\n", rand_ns);

  param->timer    = timer;
  param->interval = rand_ns;
  param->lock     = &rand_cancel_lock;

  /* Perform multiple iterations with random delays. */

  for (idx = 0; idx < HRTIMER_TEST_RAND_ITER; idx++)
    {
      /* Generate a random delay within the specified range. */

      delay_ns = rand() % rand_ns;

      flags = spin_lock_irqsave(&rand_cancel_lock);

      cnt = param->callback_cnt;
      ASSERT(hrtimer_restart(timer, hrtimer_test_rand_cancel_callback, param,
                             delay_ns) == 0);

      spin_unlock_irqrestore(&rand_cancel_lock, flags);

      /* Decide to wait for the callback or cancel the watchdog. */

      if (delay_ns % 2u)
        {
          /* Wait for the callback finished. */

          while (param->callback_cnt != cnt + 1u)
            {
              hrtest_delay(delay_ns);
            }

          while (HRTIMER_ISPENDING(timer))
            {
              hrtest_delay(0);
            }
        }
      else
        {
          hrtimer_async_cancel(timer);
        }
    }

  hrtimer_cancel(timer);
}

static uint64_t hrtimer_test_callback_period(void *arg, uint64_t expired_ns)
{
  FAR hrtimer_tparam_t *tparam   = (FAR hrtimer_tparam_t *)arg;
  sclock_t              interval = tparam->interval;

  tparam->callback_cnt++;
  tparam->triggered_ns = clock_systime_nsec();

  return interval;
}

static void hrtimer_test_period(FAR hrtimer_t *timer,
                                FAR hrtimer_tparam_t *param,
                                uint64_t delay_ns,
                                unsigned int times)
{
  uint64_t cnt;
  clock_t  timer_setns;
  hrtest_printf("hrtimer_test_period %" PRIu64 " ns\n", delay_ns);

  cnt = param->callback_cnt;

  param->interval = delay_ns;

  ASSERT(param->interval > 0);

  // clock_systime_nsec(&timer_setns);
  timer_setns = clock_systime_nsec();

  ASSERT(hrtimer_start(timer, hrtimer_test_callback_period,
                       param, param->interval) == OK);

  hrtest_delay(times * delay_ns);

  hrtimer_cancel(timer);

  DEBUGASSERT(timer->func == NULL);

  hrtest_printf("periodical hrtimer triggered %" PRIu64 " times, "
                 "elapsed nsec %" PRIu64 "\n", param->callback_cnt - cnt,
                 param->triggered_ns - timer_setns);

  if (param->callback_cnt - cnt < times)
    {
      hrtest_printf("WARNING: periodical hrtimer"
                    "triggered times < %u\n", times);
    }
}

#ifdef CONFIG_SMP
static uint64_t hrtimer_test_callback_crita(void *arg, uint64_t expired_ns)
{
  FAR hrtimer_tparam_t *hrtimer_tparam = (FAR hrtimer_tparam_t *)arg;

  /* change status */

  if (hrtimer_tparam->current_cpu != this_cpu())
    {
      hrtimer_tparam->current_cpu = this_cpu();
      hrtimer_tparam->callback_cnt++;
    }

  /* check whether parameter be changed by another critical section */

  ASSERT(hrtimer_tparam->state == 0);
  hrtimer_tparam->state = !hrtimer_tparam->state;

  return 0;
}

static uint64_t hrtimer_test_callback_critb(void *arg, uint64_t expired_ns)
{
  FAR hrtimer_tparam_t *hrtimer_tparam = (FAR hrtimer_tparam_t *)arg;

  /* change status */

  if (hrtimer_tparam->current_cpu != this_cpu())
    {
      hrtimer_tparam->current_cpu = this_cpu();
      hrtimer_tparam->callback_cnt++;
    }

  /* check whether parameter be changed by another critical section */

  ASSERT(hrtimer_tparam->state == 1);
  hrtimer_tparam->state = !hrtimer_tparam->state;

  return 0;
}

static uint64_t hrtimer_test_callback_critdelay(void *arg, uint64_t expired_ns)
{
  FAR hrtimer_tparam_t *hrtimer_tparam = (FAR hrtimer_tparam_t *)arg;
  spinlock_t *lock = hrtimer_tparam->lock;
  irqstate_t flags;

  flags = spin_lock_irqsave(lock);
  hrtimer_tparam->callback_cnt++;
  spin_unlock_irqrestore(lock, flags);

  up_ndelay(100 * NSEC_PER_USEC);

  return 0;
}

static void hrtimer_test_critical_section(FAR hrtimer_t *timer,
                                          FAR hrtimer_tparam_t *param)
{
  int      cnt = 0;
  uint64_t callback_cnt;
  spinlock_t lock  = SP_UNLOCKED;

  param->lock = &lock;

  DEBUGASSERT(!HRTIMER_ISPENDING(timer));

  while (cnt < HRTIMER_TEST_CRITICAL_SECTION)
    {
      /* set param statue and start wdog */

      param->state = 0;
      param->current_cpu = this_cpu();
      hrtimer_start(timer, hrtimer_test_callback_crita, param, 0);

      /* set param statue and start wdog */

      hrtimer_cancel(timer);
      param->state = 1;
      param->current_cpu = this_cpu();
      hrtimer_start(timer, hrtimer_test_callback_critb, param, 0);

      if (++cnt % 256 == 0)
        {
          printf("hrtimer critical section test1 %d times.\n", cnt);
        }

      hrtimer_cancel(timer);
    }

  cnt = 0;

  param->callback_cnt = 0;

  while (cnt < HRTIMER_TEST_CRITICAL_SECTION)
    {
      /* set param statue and start wdog */

      irqstate_t flags = spin_lock_irqsave(&lock);

      hrtimer_start(timer, hrtimer_test_callback_critdelay, param, 0);

      spin_unlock_irqrestore(&lock, flags);

      up_ndelay(10000);

      flags = spin_lock_irqsave(&lock);

      hrtimer_async_cancel(timer);
      hrtimer_start(timer, hrtimer_test_callback_critdelay, param, 0);

      spin_unlock_irqrestore(&lock, flags);

      up_ndelay(10000);

      hrtimer_cancel(timer);
      callback_cnt = param->callback_cnt;

      hrtest_delay(10000);

      ASSERT(callback_cnt == param->callback_cnt);

      if (++cnt % 256 == 0)
        {
          printf("hrtimer critical section test2 %d times. count %llu\n", cnt,
                 (unsigned long long)param->callback_cnt);
        }
    }

  hrtimer_cancel(timer);

  param->lock = NULL;
}
#endif

static void hrtimer_test_run(FAR hrtimer_tparam_t *param)
{
  uint64_t             cnt;
  uint64_t             rest;
  hrtimer_t     test_hrtimer =
    {
      0
    };

  param->timer = &test_hrtimer;

  /* Wrong arguments of the hrtimer_start */

  ASSERT(hrtimer_start(&test_hrtimer, NULL, NULL, 0) != OK);
  ASSERT(hrtimer_start(&test_hrtimer, NULL, NULL, -1) != OK);

  /* Delay = 0 */

  hrtimer_test_once(&test_hrtimer, param, 0);

  /* Delay > 0, small */

  hrtimer_test_once(&test_hrtimer, param, 1);
  hrtimer_test_once(&test_hrtimer, param, 10);
  hrtimer_test_once(&test_hrtimer, param, 100);
  hrtimer_test_once(&test_hrtimer, param, 1000);
  hrtimer_test_once(&test_hrtimer, param, 10000);

  /* Delay > 0, middle 100us */

  hrtimer_test_once(&test_hrtimer, param, 100000);
  hrtimer_test_once(&test_hrtimer, param, 1000000);
  hrtimer_test_once(&test_hrtimer, param, 10000000);

#ifdef CONFIG_SMP

  /* Test wdog critical section */

  hrtimer_test_critical_section(&test_hrtimer, param);

#endif

  /* Delay > 0, maximum */

  cnt = param->callback_cnt;

  /* Maximum */

  ASSERT(hrtimer_start(&test_hrtimer, hrtimer_test_callback,
                       param, UINT64_MAX) == OK);

  /* Sleep for 1s */

  hrtest_delay(USEC_PER_SEC / 100);

  /* Testing hrtimer_cancel */

  // ASSERT(hrtimer_cancel(NULL) != 0);

  /* Ensure watchdog not alarmed */

  ASSERT(cnt == param->callback_cnt);

  rest = hrtimer_gettime(&test_hrtimer);

  ASSERT(rest < UINT64_MAX);

  ASSERT(hrtimer_cancel(&test_hrtimer) == OK);

  hrtest_printf("hrtimer_start with maximum delay, cancel OK, rest %" PRIu64 "\n",
                rest);

  /* period wdog delay from 1000us to 10000us */

  hrtimer_test_period(&test_hrtimer, param, 1000000, 128);

  /* Random delay ~12us */

  hrtimer_test_rand(&test_hrtimer, param, 12345);

  hrtimer_test_rand_cancel(&test_hrtimer, param, 67890);
}

/* Multi threaded */

static FAR void *hrtimer_test_thread(FAR void *param)
{
  hrtimer_test_run(param);
  return NULL;
}

/****************************************************************************
 * Public Functions
 ****************************************************************************/

void hrtimer_test(void)
{
  unsigned int   thread_id;
  pthread_attr_t attr;
  pthread_t      pthreads[HRTIMER_TEST_THREAD_NR];
  hrtimer_tparam_t params[HRTIMER_TEST_THREAD_NR] =
    {
      0
    };

  printf("hrtimer_test start...\n");

  ASSERT(pthread_attr_init(&attr) == 0);

  /* Create wdog test thread */

  for (thread_id = 0; thread_id < HRTIMER_TEST_THREAD_NR; thread_id++)
    {
      ASSERT(pthread_create(&pthreads[thread_id], &attr,
                            hrtimer_test_thread, &params[thread_id]) == 0);
    }

  for (thread_id = 0; thread_id < HRTIMER_TEST_THREAD_NR; thread_id++)
    {
      pthread_join(pthreads[thread_id], NULL);
    }

  ASSERT(pthread_attr_destroy(&attr) == 0);

  printf("hrtimer_test end...\n");
}

@github-actions github-actions bot added Area: Documentation Improvements or additions to documentation Area: Drivers Drivers issues Area: OS Components OS Components issues Size: XL The size of the change in this PR is very large. Consider breaking down the PR into smaller pieces. labels Dec 25, 2025
* Private Functions
****************************************************************************/

static inline_function clock_t adjust_next_interval(clock_t interval)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to separated pr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to #17680.

****************************************************************************/

#ifdef CONFIG_SCHED_TICKLESS
int weak_function up_alarm_cancel(FAR struct timespec *ts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to the separated pr too

up_alarm_tick_start(div_const_roundup(next_expired, NSEC_PER_TICK));
# else
struct timespec ts;
# ifdef CONFIG_SCHED_TICKLESS_ALARM
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge to e72a975

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merged.

@wangchdo
Copy link
Contributor

wangchdo commented Dec 25, 2025

@Fix-Point @xiaoxiang781216 @GUIDINGLI

I believe this hritmer feature is genuinely useful for Apache NuttX, which is why I decided to introduce it about three months ago. Since then, I have continued to refine and improve it whenever shortcomings were identified, with the goal of making it more robust and mature.

I really don’t want to get into a dispute again. If you want to merge this refactoring, I am not able to stop you

However, I would appreciate it if my implementation were not described as incorrect or even completely unusable, or described as having functional issues or being inferior in terms of performance, readability, or effectiveness.

I acknowledge that the first version did not fully account for SMP, but I have submitted a fix (#17642
) that addresses all SMP cases. This fix is straightforward, as it focuses solely on improving the Hrtimer state-machine.

Finally, before merging this, I would suggest performing a performance comparison with my PR (#17573
) to ensure the best outcome for NuttX.

Thank you!

Best Regards
Wang Chengdong

Copy link
Contributor

@anchao anchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting aside the implementation details, there are far too many critical section issues that I haven’t pointed out them exhaustively. You are advised to fully resolve them before submitting the code.
Also, please watch your wording. What is "incorrect"? I’m genuinely curious about your professionalism and respectfulness — you are essentially belittling the work of individual developers.

}

static inline_function
int hrtimer_start(FAR hrtimer_t *timer, hrtimer_callback_t func,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was the parameter prototype changed? If func/arg are determined in hrtimer_init or hrtimer_setup, it will yield better performance for the start process.

Copy link
Contributor Author

@Fix-Point Fix-Point Dec 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compared to previous implementation,

  • when restarting the timer, writing the three fields—func, arg, and expired—does indeed involve two additional writes compared to writing only expired.
  • However, within the critical section, this implementation does not require modifying the func of the hrtimer, which saves one write operation compared to the need to write the state field, resulting in an overall increase of one write outside the critical section but one write reduced inside the critical section.

The advantage of this interface design is that it reuses func to encode the state of the hrtimer, saving 4 bytes of space per hrtimer. Furthermore, it maintains consistency with other OS APIs (such as workqueue and wdog), improving user-friendliness.

I believe the slight overhead outside the critical section will be hidden by the pipeline and should not have a significant impact on overall performance. I will attempt to write test cases to compare the performance of the two implementations.

static inline_function
void hrtimer_queue_init(FAR hrtimer_queue_internal_t *queue)
{
FAR hrtimer_internal_t *guard_timer = &queue->guard_timer;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add null pointer check or remove debug assert on line 102

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, null pointer check is added and debug assertion is removed.

hrtimer_callback_t func;
uint64_t next_expired;
uint64_t expired;
int cpu = this_cpu();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cpu may changed without lock

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can ensure the invariant that the core id will not change in the interrupt handling.

  • NuttX currently does not support kernel preemption, thus interrupts cannot be nested within interrupt context. Even if kernel preemption and interrupt nesting are supported in the future, we can avoid nested calls to hrtimer_expiry by using an interrupt nesting counter.
  • During interrupt handling, we use per-core ARCH_INTERRUPTSTACK, ensuring that the core ID remains unchanged while in interrupt context.

@Fix-Point Fix-Point changed the title sched: Refactor the incorrect HRTimer implementation. sched: Refactor the functional incorrect HRTimer implementation. Dec 25, 2025
@Fix-Point
Copy link
Contributor Author

Fix-Point commented Dec 25, 2025

Setting aside the implementation details, there are far too many critical section issues that I haven’t pointed out them exhaustively. You are advised to fully resolve them before submitting the code. Also, please watch your wording. What is "incorrect"? I’m genuinely curious about your professionalism and respectfulness — you are essentially belittling the work of individual developers.

I am sorry for my mistake. The incorrect should be functional incorrect, which means it is inconsistent with the design specifications. (E.g, a cancelled and restarted periodic timer will not never be overwritten by an old periodic timer callback that has not yet completed in any cases.)

@wangchdo
Copy link
Contributor

wangchdo commented Dec 25, 2025

Setting aside the implementation details, there are far too many critical section issues that I haven’t pointed out them exhaustively. You are advised to fully resolve them before submitting the code. Also, please watch your wording. What is "incorrect"? I’m genuinely curious about your professionalism and respectfulness — you are essentially belittling the work of individual developers.

I am sorry for my mistake. The incorrect should be functional incorrect, which means it is inconsistent with the design specifications. (E.g, a cancelled and restarted periodic timer will not never be overwritten by an old periodic timer callback that has not yet completed in any cases.)

A minor fix can resolve the issue you mentioned, check #17642, you don't need to do such a big refactoring

This commit removed functional incorrect hrtimer implementation. This
implementation can not work correctly for SMP systems.

Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>
@github-actions github-actions bot removed the Area: Drivers Drivers issues label Dec 26, 2025
which delivers true nanosecond-level precision. Unlike wdog’s list-based timers,
hrtimer uses a red-black tree for efficient management of large numbers of timers,
an important advantage in hard real-time systems like vehicle control.
High-resolution Timer (HRTimer) is a timer abstraction capable of achieving
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should also add hrtimer state-machine diagram in the documentation

is triggered.
The new NuttX HRTimer is designed to address the issues of insufficient
precision in the current NuttX wdog. It draws on the strengths of the Linux
Copy link
Contributor

@wangchdo wangchdo Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HRTimer is not primarily about precision, but about resolution.

In my view, your PR related to one-shot counting focuses on precision—that is, improving tick accuracy. In contrast, the hrtimer module should be understood as a mechanism for improving timer resolution, providing finer-grained time representation capability.

@@ -0,0 +1,555 @@
/****************************************************************************
* include/nuttx/hrtimer_queue.h
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest placing the hrtimer queue definitions in an internal header file.

Having a single public header, hrtimer.h, should be sufficient. Users do not need to be aware of the internal queue structure of hrtimers; they only need to interact with the exposed APIs.

@@ -0,0 +1,158 @@
/****************************************************************************
* include/nuttx/hrtimer_queue_type.h
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put this in internal

@@ -0,0 +1,204 @@
/****************************************************************************
* include/nuttx/hrtimer_type_list.h
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put this in internal

@@ -0,0 +1,216 @@
/****************************************************************************
* include/nuttx/hrtimer_type_rb.h
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put this in internal

int hrtimer_compare(FAR const hrtimer_rb_t *a, FAR const hrtimer_rb_t *b)
{
/* This branchless compare is equivalent to:
* (int64_t)(a->expired - b->expired) > 0 ? 1 : -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be equivalent to (int64_t)(a->expired - b->expired) >= 0 ? 1 : -1;

This ensures that when a new timer has the same expiration time as an existing one, it will be placed after the older timer.

*/

static inline_function
void hrtimer_reprogram(FAR hrtimer_queue_internal_t *queue,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this is an external API?

****************************************************************************/

static inline_function
int hrtimer_queue_init(FAR hrtimer_queue_internal_t *queue)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internal API should not be exposed to users

}

static inline_function
int hrtimer_start(FAR hrtimer_t *timer, hrtimer_callback_t func,
Copy link
Contributor

@wangchdo wangchdo Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API requires users to provide too many parameters, which I don’t think is ideal.
Users really only need to specify arg, mode, and func during hrtimer_init.

Therefore, my suggestion is to adopt my implementation, offering two separate APIs:

hrtimer_init()
hrtimer_start()

#define HRTIMER_TIME_AFTER(t1, t2) ((int64_t)((t2) - (t1)) < 0)
#define HRTIMER_TIME_AFTER_EQ(t1, t2) ((int64_t)((t2) - (t1)) <= 0)

#define HRTIMER_MODE_ABS (0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not safe to use a macro to represent the mode. We should use an enum instead, which allows the compiler to perform type checking for us.

(timer)->expired = (time); \
} while(0)

#define hrtimer_init(timer) memset(timer, 0, sizeof(timer))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should allow users to specify arg, mode, and func when initializing the hrtimer via hrtimer_init.

* jumping to NULL.
*/

hrtimer_fill(guard_timer, NULL, NULL, INT64_MAX);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In an RTOS system, a guard_timer is generally not needed, as the hrtimer module should be robust.

If the system misbehaves due to application issues rather than kernel faults, the proper approach would be to handle it at the system level, such as performing a restart.


/* Wait until all references have been released. */

for (cpu = 0; cpu < CONFIG_SMP_NCPUS; cpu++)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two loops in this section, which may impact efficiency. It might be worth considering a way to optimize or combine them.

- The caller has ownership of the timer.
- Mode parameter is valid (0 or 1).
.. c:function:: int hrtimer_start(FAR hrtimer_t *timer, hrtimer_callback_t func, FAR void *arg, uint64_t time, uint32_t mode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mode should be enum type, so compiler can do type checking for you.

hrtimer_cb func,
FAR void *arg)
int hrtimer_restart(hrtimer_t *timer, hrtimer_callback_t func,
FAR void *arg, uint64_t time, uint32_t mode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mode should be enum type, so compiler can do type checking for you.

expired += clock_systime_nsec();
}

DEBUGASSERT(mode <= 1u);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let the compiler to do type checking for you.

{
int ret = -EINVAL;

DEBUGASSERT(mode <= 1u);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto


static inline_function
int hrtimer_start(FAR hrtimer_t *timer, hrtimer_callback_t func,
FAR void *arg, uint64_t time, uint32_t mode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

**hrtimer_start(timer, func, arg, time, mode)**: Starts a timer.
The mode parameter indicates whether it is a relative or absolute timer.
**hrtimer_restart(timer, func, arg, time, mode)**: Restarts a timer that has
Copy link
Contributor

@wangchdo wangchdo Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don’t need a separate hrtimer_restart, which would require users to input arg, time, and mode again.

Restarting essentially starts the hrtimer with a new expiry. If users want to refresh the hrtimer, they can simply call hrtimer_init() first.

So, in practice, the following functions are sufficient for users to manage hrtimers:
hrtimer_init(), hrtimer_start(), hrtimer_cancel(), and hrtimer_cancel_sync().

* hrtimer and can not modify this hrtimer anymore.
*/

if (queue->running[cpu] == timer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logic is only SMP related, we should separate it with CONFIG_SMP to improve performance in non-smp target

unsigned owner = 0u;
uintptr_t time_uptr = (uintptr_t)timer;

for (cpu = 0; cpu < CONFIG_SMP_NCPUS; cpu++)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SMP specific logic should be in CONFIG_SMP

* User can not reclaim (deallocate or modify) the hrtimer.
*/

queue->running[cpu] = (uintptr_t)timer;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SMP specific logic should be in CONFIG_SMP

@wangchdo
Copy link
Contributor

wangchdo commented Dec 26, 2025

@Fix-Point

Let me make a conclusion of all my comments for your implementation into four categories:

  1. Your way of resolving multicore running the same hrtimer for SMP system has performance issue and is not deterministic

  2. You does not consider to remove un-needed operations when running on non-SMP platform, so your implementation is also not good for non-smp platform

  3. Your APIs are two complicated, not friendly for users

  4. You do not decouple the queue abstraction for hrtimer very well, the queue related operations are not needed to be exposed to users

This commit introduced hrtimer_queue, a resuable component to generate
user-defined hrtimer implementation.

Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>
This commit introduced the high-resolution timer abstraction. The hrtimer design features including:
Use a strict state machine: an active timer can not be directly restarted, simplifying the implementation.
Abstract the sorted queue for flexibility, allowing different data structures for various use cases.
Execute callbacks with interrupts enabled, using hazard pointers to manage references.
Clear ownership transfer: callbacks return the next expiration time for periodic timers, and the thread executing the callback is responsible for restarting or releasing the timer.
Non-blocking restart: allow restarting a timer even if its callback is still running, requiring proper synchronization in the callback function.
Starvation-free cancellation: use hazard pointers to avoid starvation and ensure safe memory reclamation.

Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>
This commit supported wdog/scheduler hrtimer with
tickless enabled.

Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>
This commit added documentation for HRTimer.

Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>
@Fix-Point
Copy link
Contributor Author

Fix-Point commented Dec 26, 2025

@Fix-Point

Let me make a conclusion of all my comments for your implementation into four categories:

  1. Your way of resolving multicore running the same hrtimer for SMP system has performance issue and is not deterministic
  2. You does not consider to remove un-needed operations when running on non-SMP platform, so your implementation is also not good for non-smp platform
  3. Your APIs are two complicated, not friendly for users
  4. You do not decouple the queue abstraction for hrtimer very well, the queue related operations are not needed to be exposed to users

Thank you for your suggestions. Your comments are very insightful. Below are the modifications made based on your advice:

  • Your suggestion regarding performance improvement in non‑SMP mode is excellent. Following your advice, macros have been added to remove unnecessary waits that cannot occur in non‑SMP scenarios.
  • Regarding the point about too many parameters in the API, I decided to fully align with the wdog/workqueue interface: the original cancel is now named cancel_sync, while async_cancel is renamed to cancel.
  • Your suggestion to move the hrtimer_queue.h implementation into an internal header file is very good, as it helps reduce the likelihood of exposing this header to users.
  • Your suggestions for documentation improvements are very helpful. The state diagram is added.

Regarding some other points that have not been changed yet, clarifications are as follows:

1. Determinism Issues Caused by Waiting

We need to discuss this case by case:

  • In non‑SMP mode, because a timer interrupt cannot occur while a thread is executing cancel, the semantics of hrtimer_cancel and hrtimer_cancel_sync are essentially the same—no waiting is required. Therefore, both hrtimer_start and hrtimer_cancel are deterministic and predictable.
  • In SMP mode, only hrtimer_cancel_sync requires waiting, which is unavoidable. This interface is typically called only before releasing a dynamically allocated hrtimer or when expecting all callback executions to complete. In most cases, restarting a timer only requires the asynchronous hrtimer_cancel + hrtimer_restart combination, which does not involve waiting and is deterministic. However, it requires users to be aware of synchronization issues in callback functions.

2. Interface Design Issues

As mentioned, the reasons for adopting an interface in the form of hrtimer_start(timer, func, arg, time) are twofold:

  • To align with the semantics of the wdog/workqueue interface, improve user‑friendliness, and maintain consistency with kernel interface styles.
  • To use hrtimer->func (NULL and UINTPTR_MAX) to encode state, saving 4 bytes of storage for the state field.

From a performance perspective, compared to the previous implementation, this PR’s implementation does incur two additional writes outside the critical section when restarting a timer, but it saves one write inside the critical section (because func is used to encode state).
As shown in the table:

hrtimer_start(timer, func, arg, time) hrtimer_start(time, mode)
start timer func, arg, expired func, arg, expired, state
restart timer func, arg, expired expired, state

Since these are consecutive writes to the same cache line and CPU pipelining can hide some of the latency, the extra write outside the critical section should not cause significant performance issues in practice.

Overall, I believe the hrtimer_start(timer, func, arg, time) interface form is better.

3. Lock Performance Issues

According to our test results (#17569):

  • On non‑SMP platforms, the write‑lock performance of seqcount is 1.58x that of spinlock.
  • On SMP platforms, the write‑lock performance of seqcount is roughly comparable with spinlock.

However, these results are from x86 platforms and may vary across different architectures.
Therefore, I fully agree with your point about lock performance—there is no lock fits for all scenarios. For example, in multi‑core real‑time scenarios, we might need starvation‑free queuing locks instead of high‑throughput spinlocks.
To address this, I will attempt to introduce a composable lock abstraction layer in future improvements, allowing users to customize the lock used by hrtimer_queue.
The lock‑related interfaces have already been abstracted, and I am considering how to better compose the lock data structure for hrtimer_queue.

4. Why Forward‑declare the Dependent Function hrtimer_reprogram?

I also tried two other approaches, but each had its own issues:

  • Using a pre‑defined hrtimer_ops_t callback to inform users that they need to implement hrtimer_reprogram when instantiating an hrtimer. This works fine under GCC/Clang—the compiler can fully inline static const hrtimer_ops_t and eliminate the overhead of the structure. However, I checked the assembly and found that safety compilers like CompCertC cannot perform such optimizations. Hence, I had to adopt the current implementation.
  • Using higher‑order macro functions, passing hrtimer_reprogram as a dependent input function. This avoids compiler optimization issues, but @xiaoxiang781216 and @GUIDINGLI considered it poor for readability.

After trying the above approaches, I chose to forward‑declare the dependent function hrtimer_reprogram. This implementation meets readability, performance, and memory-overhead requirements, but requires slight caution during use (the inclusion order must strictly follow Figure 1; otherwise, compilation errors may occur).

5. Why Replace Instead of Improve?

I appreciate your use of reference counting in the latest version to solve memory reclamation issues. However, the problem is that reference counting loses version information. A version‑aware method, such as Epoch‑based memory reclamation (a typical kernel implementation is Linux’s QSBR‑RCU), is needed to address this.

As I mentioned in a previous scenario:

  • t0: Thread 0 on Core 0 starts a periodic timer with callback CA, initial delay 1000 ns, period UINT32_MAX.
  • t1: The timer fires on Core 0, and callback CA is executing.
  • t2: Thread 0 is scheduled onto Core 1, cancels the timer, and restarts a new timer with callback CB, initial delay 0 ns, period 1 000 000 ns.
  • t3: The timer fires on Core 1, and callback CB starts executing.
  • t4: Callback CA on Core 0 finishes, finds the hrtimer in a “running” state (but not of this version), and attempts to restart the timer by setting hrtimer->expired += UINT32_MAX.

Here, two errors occur:

  • The old periodic timer that should have been canceled continues, overwriting the newly started timer.
  • After the old timer restarts, its next trigger time is t2 + UINT32_MAX, which is incorrect—it should have been t0 + 1000 ns + UINT32_MAX.

In this PR’s HRTimer implementation, this situation is impossible:

  • t0: Thread 0 on Core 0 starts a periodic timer with callback CA, initial delay 1000 ns, period UINT32_MAX.
  • t1: The timer fires on Core 0, and callback CA is executing.
  • t2: Thread 0 is scheduled onto Core 1, cancels the timer, taking ownership of the hrtimer data structure from Core 0, and restarts a new timer with callback CB, initial delay 0 ns, period 1 000 000 ns.
  • t3: The timer fires on Core 1, and callback CB starts executing.
  • t4: Callback CA on Core 0 finishes, checks the hazard pointer and realizes it has lost ownership of the hrtimer, so it simply exits without further action.

The key difference is that reference counting loses version information, while hazard pointers do not. Canceling a timer via hazard pointers ensures that, by time t2, all cores using the hrtimer have lost ownership of the hrtimer, thereby guaranteeing that an old timer callback can never overwrite a new timer.

Designing correct concurrent algorithms requires systematic consideration. As I stated in #17570, after carefully reviewing your implementation, I could not really find a good solution that preserves version information without introducing significant performance or memory-overhead overhead.

That is why I suggested we not spend additional time on the previous design and instead directly adopt this implementation. I do not disrespect your work or doubt your abilities—I acknowledge you as a diligent and excellent engineer. I simply do not wish to see anyone stumble twice on the same issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: Documentation Improvements or additions to documentation Area: OS Components OS Components issues Size: XL The size of the change in this PR is very large. Consider breaking down the PR into smaller pieces.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants