linux-kernel - task_non_contending() for fair

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <085d01dbf596$44286880$cc793980$@gmx.de>
Date: Tue, 15 Jul 2025 16:39:25 +0200
From: <markus.stockhausen@....de>
To: <peterz@...radead.org>
Cc: "'Chris Packham'" <Chris.Packham@...iedtelesis.co.nz>,
	<bjorn@...k.no>,
	<mingo@...hat.com>,
	<juri.lelli@...hat.com>,
	<vincent.guittot@...aro.org>,
	<anna-maria@...utronix.de>,
	<frederic@...nel.org>,
	<tglx@...utronix.de>,
	<linux-kernel@...r.kernel.org>
Subject: task_non_contending() for fair_server leads to timer retries

Hi Peter,

I'm currently investigating issues with the timer-rtl-otto driver in 
6.12 longterm on the Realtek MIPS switch platform (Chris is working
hard to upstream this). While doing so I observed that timer retries 
continually increase (~6/second) according to /proc/timer_list. The 
system is otherwise totally idle. 6.6 longterm does not show that issue.
I'm unsure if this is related but documentation reads like "that's bad". 

To be sure about this one I nailed it down to the fair server.

Whenever task_non_contending() handles the fair_server, zerolag_time is
calculated as 0 and a hrtimer_start(timer, 0, ...) call is issued. Going
down the stack clockevents_program_event() thinks the target time has 
been exceeded. So it instructs clockevents_program_min_delta() to set
a minimum delta time (2560ns for the otto timer). From there the retry
counter is increased. See attached output.

To silence the noise and focus on the real bug I use this workaround
in task_non_contending(): 

if ((dl_se == &rq->fair_server) && (zerolag_time == 0))
	zerolag_time = 6000;

Totally crap but serves the purpose. Maybe you can share insights about
this (un)desired behaviour. 

Thanks in advance.

Markus

# uptime
 00:41:19 up 41 min,  load average: 0.00, 0.00, 0.00

# cat /proc/timer_list
...
Tick Device: mode:     1
Per CPU device: 0
Clock Event Device: timer@...0
 max_delta_ns:   85899344321
 min_delta_ns:   2560
 mult:           13421773
 shift:          32
 mode:           3
 next_event:     2469910000000 nsecs
 set_next_event: rttm_next_event
 shutdown:       rttm_state_shutdown
 periodic:       rttm_state_periodic
 oneshot:        rttm_state_oneshot
 event_handler:  hrtimer_interrupt

 retries:        14646