lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <68734d1a-b77e-495e-922a-7fce3a122789@linux.dev>
Date: Mon, 17 Nov 2025 07:58:57 +0800
From: "enlin.mu" <enlin.mu@...ux.dev>
To: Thomas Gleixner <tglx@...utronix.de>, anna-maria@...utronix.de,
 frederic@...nel.org, linux-kernel@...r.kernel.org, enlin.mu@...soc.com
Subject: Re: [PATCH V2] hrtimer: Check running timer state



On 2025/11/15 02:34, Thomas Gleixner wrote:
> On Fri, Nov 14 2025 at 19:42, Enlin Mu wrote:
>> When the running timer is not NULL, print debugging information.
> 
> What for?
> 
>> The test code is roughly as follows:
>>
>> static struct hrtimer serial_timer;
>> enum hrtimer_restart serial_timer_handler(struct hrtimer * timer)
>> {
>> 	local_irq_disable();
> 
> What is this for?
Hi Thomas

This code comes from my colleauge who is not familiar with the
running machanism of hrtimer, which is why there is this logical error.

> 
>> 	......
>> 	do_someting();
>> 	copy_data_with_dma();
>> 	......
>> 	hrtimer_forward_now(*serial_timer, ns_to_ktime(1000*2000));
>> 	local_irq_enable();
> 
> And this?
> 
>> 	return HRTIMER_RESTART;
>> }
>>
>> static int serial_start(struct uart_port *port)
>> {
>> 	......
>> 	......
>> 	hrtime_init(&serial_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> 
> That function does not exist.
> 
>> 	ktime = no_to_ktime(1000*2000);
>> 	serial_timer.function = serial_timer_handler;
>> 	hrtimer_start(&serial_timer, ktime, HRTIMER_MODE_REL);
>> 	......
>> 	return 0;
>> }
>>
>> static void serial_shutdown(struct uart_port *port)
>> {
>> 	......
>> 	hrtimer_cancle(&serial_timer);
>> 	......
>> 	serial_release_dma(port);
>> 	......
>> }
> 
>> The cpu6 canceled serial_timer and released dma,but the
>> serial_timer still ran many times on CPU7 until a panic occurred.
>> The reason for the panic is that serial_timer accessed the
>> released dma,But the serial_timer had been canceled for
>> some time now on cpu6.
> 
> You still fail to explain how the timer can still run after being
> canceled.

Hi Thomas

The serial_timer on cpu6 has not been properly cancelled.
The serial_timer is once again entered into the active queue on cpu7.
---------------------------
on cpu7

step 1:  preempted by hrtimer_usb

static void __run_hrtimer(.....)
{
	......
	base->running = timer;   //timer is serial_timer;

	.....
	.....
	restart = fn(timer);   // fn is serial_timer_handler
	....  ==> irq is enable.  serial_timer is preempted by hrtimer_usb
	....
}


step 2: hrtimer_usb return

static void __run_hrtimer(.....)
{
	......
	base->running = timer;   //timer is hrtimer_usb;

	.....
	.....
	restart = fn(timer);
	....
}


step 3: serial_timer continues to execute

static void __run_hrtimer(.....)
{
	......
	 (restart != HRTIMER_NORESTART &&
		timer->state & HRTIMER_STATE_ENQUEUED))
			(timer, base, HRTIMER_MODE_ABS);  //timer is serial_timer

}

step 4: serial_timer run again.
	.......

........

step n: serial_timer run again.
	.......

step n+1: serial_timer run again.
	.......
	panic from serial_timer;

---------------------------
on cpu6

step 1:
     hrtimer_cancel


step 2:
     hrtimer_try_to_cancel

step 3:
     hrtimer_active return false, because running timer is hrtimer_usb 
on cpu7(step 2 on cpu7)

step 4:
     hrtimer_cancel returns 0

step 5:
     serial_timer is added active queue on cpu 7

---------------------------
Due to English not being my native language, I am unable to
explain this exception with a more complex description, and
can only provide a simple step-by-step explanation.

> 
>> The cpu6 can successfully cancel the serial_timer because the
>> running timer has changed and it is another timer(such as
>> hrtimer_usb).
> 
> After that the timer _cannot_ be running anymore unless some other code
> re-arms it afterwards.
> 
>> When the serial_timer is enable to interrupt, the next hrtimer
>> (such as hrtimer_usb) on cpu7 preempts the return of ther serial_timer,
>> causing a change in the running timer.
> 
> Then fix your timer callback. The callback is invoked in hard interrupt
> context and the callback enables interrupts, which is a NONO. You
> clearly never ran your code with lockdep enabled. It would have told you
> so.
> 
>> Signed-off-by: Enlin Mu <enlin.mu@...ux.dev>
>> Signed-off-by: Enlin Mu <enlin.mu@...soc.com>
>> Signed-off-by: Enlin Mu <enlin.mu@...ux.dev>
> 
> Interesting Signed-off-by chain. Seems you're co-developing this patch
> with your Alter ego.
> 
> Thanks,
> 
>          tglx


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ