lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251027173037.Cj4b_alm@linutronix.de>
Date: Mon, 27 Oct 2025 18:30:37 +0100
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: LKML <linux-kernel@...r.kernel.org>,
	Peter Zilstra <peterz@...radead.org>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	"Paul E. McKenney" <paulmck@...nel.org>,
	Boqun Feng <boqun.feng@...il.com>, Jonathan Corbet <corbet@....net>,
	Prakash Sangappa <prakash.sangappa@...cle.com>,
	Madadi Vineeth Reddy <vineethr@...ux.ibm.com>,
	K Prateek Nayak <kprateek.nayak@....com>,
	Steven Rostedt <rostedt@...dmis.org>, Arnd Bergmann <arnd@...db.de>,
	linux-arch@...r.kernel.org
Subject: Re: [patch V2 00/12] rseq: Implement time slice extension mechanism

On 2025-10-22 14:57:28 [+0200], Thomas Gleixner wrote:
> Time slice extensions are an attempt to provide opportunistic priority
> ceiling without the overhead of an actual priority ceiling protocol, but
> also without the guarantees such a protocol provides.
> 
> The intent is to avoid situations where a user space thread is interrupted
> in a critical section and scheduled out, while holding a resource on which
> the preempting thread or other threads in the system might block on. That
> obviously prevents those threads from making progress in the worst case for
> at least a full time slice. Especially in the context of user space
> spinlocks, which are a patently bad idea to begin with, but that's also
> true for other mechanisms.

I've been playing with it a bit with RT enabled and started to debug
this:

|       slice_test-2903    [001] d.h..  2313.285439: local_timer_entry: vector=236
|       slice_test-2903    [001] d.h1.  2313.285440: hrtimer_cancel: hrtimer=000000000507e6d5
|       slice_test-2903    [001] d.h..  2313.285440: hrtimer_expire_entry: hrtimer=000000000507e6d5 function=tick_nohz_handler now=2313208001152
|       slice_test-2903    [001] d.h1.  2313.285449: sched_stat_runtime: comm=slice_test pid=2903 runtime=3982905 [ns]
|       slice_test-2903    [001] dlh..  2313.285452: softirq_raise: vec=7 [action=SCHED]
|       slice_test-2903    [001] dlh..  2313.285452: hrtimer_expire_exit: hrtimer=000000000507e6d5
|       slice_test-2903    [001] dlh1.  2313.285452: hrtimer_start: hrtimer=000000000507e6d5 function=tick_nohz_handler expires=2313212000000 softexpires=2313212000000 mode=ABS
|       slice_test-2903    [001] dlh..  2313.285453: local_timer_exit: vector=236
|       slice_test-2903    [001] dl.2.  2313.285453: sched_waking: comm=ksoftirqd/1 pid=32 prio=120 target_cpu=001
|       slice_test-2903    [001] dl.3.  2313.285456: sched_wakeup: comm=ksoftirqd/1 pid=32 prio=120 target_cpu=001
|       slice_test-2903    [001] d....  2313.285457: irqentry_exit: rseq_grant_slice_extension(216)

granting the extension and removing the lazy wakup. We are still on
return from IRQ but the 'h' flag has been already removed…

|       slice_test-2903    [001] d..1.  2313.285458: hrtimer_start: hrtimer=0000000030a688cc function=rseq_slice_expired expires=2313208047790 softexpires=2313208047790 mode=ABS|PINNED|HARD
|       slice_test-2903    [001] d....  2313.285458: __rseq_arm_slice_extension_timer: timer
|       slice_test-2903    [001] d..2.  2313.285484: hrtimer_cancel: hrtimer=0000000030a688cc
extension granted, timer started and revoked and set need resched.

|       slice_test-2903    [001] dN.2.  2313.285487: sched_stat_runtime: comm=slice_test pid=2903 runtime=36886 [ns]
This is coming from schedule() already. It took me a while since I was
hunting a missing clear of need-resched.

|       slice_test-2903    [001] d..2.  2313.285489: sched_switch: prev_comm=slice_test prev_pid=2903 prev_prio=120 prev_state=R+ ==> next_comm=ksoftirqd/1 next_pid=32 next_prio=120
|      ksoftirqd/1-32      [001] ..s.1  2313.285490: softirq_entry: vec=7 [action=SCHED]
|      ksoftirqd/1-32      [001] ..s.1  2313.285501: softirq_exit: vec=7 [action=SCHED]
|      ksoftirqd/1-32      [001] d..2.  2313.285502: sched_stat_runtime: comm=ksoftirqd/1 pid=32 runtime=16438 [ns]
|      ksoftirqd/1-32      [001] d..2.  2313.285503: sched_switch: prev_comm=ksoftirqd/1 prev_pid=32 prev_prio=120 prev_state=S ==> next_comm=slice_test next_pid=2904 next_prio=120
|       slice_test-2904    [001] .....  2313.285507: sys_enter: NR 230 (1, 0, 7f4692c7baa0, 0, 0, 0)
|       slice_test-2904    [001] .....  2313.285507: hrtimer_setup: hrtimer=00000000f2d53899 clockid=CLOCK_MONOTONIC mode=REL
|       slice_test-2904    [001] d..1.  2313.285507: hrtimer_start: hrtimer=00000000f2d53899 function=hrtimer_wakeup expires=2313208168792 softexpires=2313208118792 mode=REL
|       slice_test-2904    [001] d..2.  2313.285508: sched_stat_runtime: comm=slice_test pid=2904 runtime=6149 [ns]
|       slice_test-2904    [001] d..2.  2313.285510: sched_switch: prev_comm=slice_test prev_pid=2904 prev_prio=120 prev_state=S ==> next_comm=slice_test next_pid=2903 next_prio=120
|       slice_test-2903    [001] .....  2313.285510: sys_enter: NR 470 (7fffc04f1ff0, c350, 11a0e0, 0, 7f4692e99000, 0)

slice_test-2903 enters _now_ rseq_slice_yield() so it must have been in
userland during the suppressed wake up at 2313.285457.
But a few iterations later it turns at out this trace event is recorded
_after_ the rseq magic happens at sys_enter time. We entered
rseq_slice_yield() a few cycles after the extension was granted. Buh.
So it seems to work as intended but it is not obvious tell from tracing
why it does not work.

Sebastian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ