[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <68734d1a-b77e-495e-922a-7fce3a122789@linux.dev>
Date: Mon, 17 Nov 2025 07:58:57 +0800
From: "enlin.mu" <enlin.mu@...ux.dev>
To: Thomas Gleixner <tglx@...utronix.de>, anna-maria@...utronix.de,
frederic@...nel.org, linux-kernel@...r.kernel.org, enlin.mu@...soc.com
Subject: Re: [PATCH V2] hrtimer: Check running timer state
On 2025/11/15 02:34, Thomas Gleixner wrote:
> On Fri, Nov 14 2025 at 19:42, Enlin Mu wrote:
>> When the running timer is not NULL, print debugging information.
>
> What for?
>
>> The test code is roughly as follows:
>>
>> static struct hrtimer serial_timer;
>> enum hrtimer_restart serial_timer_handler(struct hrtimer * timer)
>> {
>> local_irq_disable();
>
> What is this for?
Hi Thomas
This code comes from my colleauge who is not familiar with the
running machanism of hrtimer, which is why there is this logical error.
>
>> ......
>> do_someting();
>> copy_data_with_dma();
>> ......
>> hrtimer_forward_now(*serial_timer, ns_to_ktime(1000*2000));
>> local_irq_enable();
>
> And this?
>
>> return HRTIMER_RESTART;
>> }
>>
>> static int serial_start(struct uart_port *port)
>> {
>> ......
>> ......
>> hrtime_init(&serial_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
>
> That function does not exist.
>
>> ktime = no_to_ktime(1000*2000);
>> serial_timer.function = serial_timer_handler;
>> hrtimer_start(&serial_timer, ktime, HRTIMER_MODE_REL);
>> ......
>> return 0;
>> }
>>
>> static void serial_shutdown(struct uart_port *port)
>> {
>> ......
>> hrtimer_cancle(&serial_timer);
>> ......
>> serial_release_dma(port);
>> ......
>> }
>
>> The cpu6 canceled serial_timer and released dma,but the
>> serial_timer still ran many times on CPU7 until a panic occurred.
>> The reason for the panic is that serial_timer accessed the
>> released dma,But the serial_timer had been canceled for
>> some time now on cpu6.
>
> You still fail to explain how the timer can still run after being
> canceled.
Hi Thomas
The serial_timer on cpu6 has not been properly cancelled.
The serial_timer is once again entered into the active queue on cpu7.
---------------------------
on cpu7
step 1: preempted by hrtimer_usb
static void __run_hrtimer(.....)
{
......
base->running = timer; //timer is serial_timer;
.....
.....
restart = fn(timer); // fn is serial_timer_handler
.... ==> irq is enable. serial_timer is preempted by hrtimer_usb
....
}
step 2: hrtimer_usb return
static void __run_hrtimer(.....)
{
......
base->running = timer; //timer is hrtimer_usb;
.....
.....
restart = fn(timer);
....
}
step 3: serial_timer continues to execute
static void __run_hrtimer(.....)
{
......
(restart != HRTIMER_NORESTART &&
timer->state & HRTIMER_STATE_ENQUEUED))
(timer, base, HRTIMER_MODE_ABS); //timer is serial_timer
}
step 4: serial_timer run again.
.......
........
step n: serial_timer run again.
.......
step n+1: serial_timer run again.
.......
panic from serial_timer;
---------------------------
on cpu6
step 1:
hrtimer_cancel
step 2:
hrtimer_try_to_cancel
step 3:
hrtimer_active return false, because running timer is hrtimer_usb
on cpu7(step 2 on cpu7)
step 4:
hrtimer_cancel returns 0
step 5:
serial_timer is added active queue on cpu 7
---------------------------
Due to English not being my native language, I am unable to
explain this exception with a more complex description, and
can only provide a simple step-by-step explanation.
>
>> The cpu6 can successfully cancel the serial_timer because the
>> running timer has changed and it is another timer(such as
>> hrtimer_usb).
>
> After that the timer _cannot_ be running anymore unless some other code
> re-arms it afterwards.
>
>> When the serial_timer is enable to interrupt, the next hrtimer
>> (such as hrtimer_usb) on cpu7 preempts the return of ther serial_timer,
>> causing a change in the running timer.
>
> Then fix your timer callback. The callback is invoked in hard interrupt
> context and the callback enables interrupts, which is a NONO. You
> clearly never ran your code with lockdep enabled. It would have told you
> so.
>
>> Signed-off-by: Enlin Mu <enlin.mu@...ux.dev>
>> Signed-off-by: Enlin Mu <enlin.mu@...soc.com>
>> Signed-off-by: Enlin Mu <enlin.mu@...ux.dev>
>
> Interesting Signed-off-by chain. Seems you're co-developing this patch
> with your Alter ego.
>
> Thanks,
>
> tglx
Powered by blists - more mailing lists