lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <085d01dbf596$44286880$cc793980$@gmx.de>
Date: Tue, 15 Jul 2025 16:39:25 +0200
From: <markus.stockhausen@....de>
To: <peterz@...radead.org>
Cc: "'Chris Packham'" <Chris.Packham@...iedtelesis.co.nz>,
	<bjorn@...k.no>,
	<mingo@...hat.com>,
	<juri.lelli@...hat.com>,
	<vincent.guittot@...aro.org>,
	<anna-maria@...utronix.de>,
	<frederic@...nel.org>,
	<tglx@...utronix.de>,
	<linux-kernel@...r.kernel.org>
Subject: task_non_contending() for fair_server leads to timer retries

Hi Peter,

I'm currently investigating issues with the timer-rtl-otto driver in 
6.12 longterm on the Realtek MIPS switch platform (Chris is working
hard to upstream this). While doing so I observed that timer retries 
continually increase (~6/second) according to /proc/timer_list. The 
system is otherwise totally idle. 6.6 longterm does not show that issue.
I'm unsure if this is related but documentation reads like "that's bad". 

To be sure about this one I nailed it down to the fair server.

Whenever task_non_contending() handles the fair_server, zerolag_time is
calculated as 0 and a hrtimer_start(timer, 0, ...) call is issued. Going
down the stack clockevents_program_event() thinks the target time has 
been exceeded. So it instructs clockevents_program_min_delta() to set
a minimum delta time (2560ns for the otto timer). From there the retry
counter is increased. See attached output.

To silence the noise and focus on the real bug I use this workaround
in task_non_contending(): 

if ((dl_se == &rq->fair_server) && (zerolag_time == 0))
	zerolag_time = 6000;

Totally crap but serves the purpose. Maybe you can share insights about
this (un)desired behaviour. 

Thanks in advance.

Markus

# uptime
 00:41:19 up 41 min,  load average: 0.00, 0.00, 0.00

# cat /proc/timer_list
...
Tick Device: mode:     1
Per CPU device: 0
Clock Event Device: timer@...0
 max_delta_ns:   85899344321
 min_delta_ns:   2560
 mult:           13421773
 shift:          32
 mode:           3
 next_event:     2469910000000 nsecs
 set_next_event: rttm_next_event
 shutdown:       rttm_state_shutdown
 periodic:       rttm_state_periodic
 oneshot:        rttm_state_oneshot
 event_handler:  hrtimer_interrupt

 retries:        14646


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ