linux-kernel - Re: [patch V6 07/11] rseq: Implement time slice extension enforcement timer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20251219100517.GA1132199@noisy.programming.kicks-ass.net>
Date: Fri, 19 Dec 2025 11:05:17 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: LKML <linux-kernel@...r.kernel.org>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	"Paul E. McKenney" <paulmck@...nel.org>,
	Boqun Feng <boqun.feng@...il.com>, Jonathan Corbet <corbet@....net>,
	Prakash Sangappa <prakash.sangappa@...cle.com>,
	Madadi Vineeth Reddy <vineethr@...ux.ibm.com>,
	K Prateek Nayak <kprateek.nayak@....com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
	Arnd Bergmann <arnd@...db.de>, linux-arch@...r.kernel.org,
	Randy Dunlap <rdunlap@...radead.org>,
	Ron Geva <rongevarg@...il.com>, Waiman Long <longman@...hat.com>
Subject: Re: [patch V6 07/11] rseq: Implement time slice extension
 enforcement timer

On Fri, Dec 19, 2025 at 12:26:46AM +0100, Thomas Gleixner wrote:
> On Thu, Dec 18 2025 at 16:05, Peter Zijlstra wrote:
> > On Mon, Dec 15, 2025 at 05:52:22PM +0100, Thomas Gleixner wrote:
> >
> >> V5: Document the slice extension range - PeterZ
> >
> >> --- a/Documentation/admin-guide/sysctl/kernel.rst
> >> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> >> @@ -1228,6 +1228,14 @@ reboot-cmd (SPARC only)
> >>  ROM/Flash boot loader. Maybe to tell it what to do after
> >>  rebooting. ???
> >>  
> >> +rseq_slice_extension_nsec
> >> +=========================
> >> +
> >> +A task can request to delay its scheduling if it is in a critical section
> >> +via the prctl(PR_RSEQ_SLICE_EXTENSION_SET) mechanism. This sets the maximum
> >> +allowed extension in nanoseconds before scheduling of the task is enforced.
> >> +Default value is 30000ns (30us). The possible range is 10000ns (10us) to
> >> +50000ns (50us).
> >
> > The important bit: we're not going to increase these numbers. If
> > anything, I would like the default to be 10us and taint the kernel if
> > you up it.
> 
> Fine with me.

Thanks; the thinking is that it will be very hard to shrink this number
due to unknown workloads in the wild and all that, so starting on the
small end is the conservative option.

> > I also think we want some tracing/tool to find the actual length of the
> > extension used (min/avg/max etc.). That is the time between the kernel
> > finding the extension bit set and arming the timer and the slice_yield()
> > syscall.
> 
> I could probably integrate that easily into the RSEQ stats mechanism.

I was thinking that perhaps the hrtimer tracepoints, filtered on this
specific timer, might just do. Arming the timer is the point where the
extension is granted, cancelling the timer is on the slice_yield() (or
any other random syscall :/), and the timer actually firing is on fail.

Normally I would suggest using a Poison distribution to find the
'average', but this case is more complicated because the start of the
extension is lost.

Let me ask one of these fancy AI things. Ah, it says this is "a classic
example of Length-Biased Sampling combined with Left-Truncation". It
then further suggests:

  If you cannot assume a distribution, you should use a Weighting
  Method.  Since the probability of catching an event of length L is
  proportional to L, you must weight each observation by 1/L.

      1. For each event, record the observed duration d_i

      2. Calculate the weighted mean:

			    \Sum (d_i * 1/d_i)      n
	      avg(x)_true = ------------------ = ----------
				\Sum 1/d_i       \Sum 1/d_i

      This is the Harmonic Mean of your observed durations. The harmonic
      mean effectively "penalizes" the long events you were more likely to
      catch.

It also babbled something about an Inspection Paradox:

  If your sampling rate is constant (a Poisson process) and the system is
  in a "steady state," the most robust and mathematically elegant way to
  find the true average duration (μ) is surprisingly simple.

  In a steady-state system where you catch an event in progress:

      The time from the start of the event to your arrival is U
      (unobserved).

      The time from your arrival to the end of the event is V (observed).

  Under these specific conditions, the expected value of the observed
  remaining duration (V) is exactly equal to the mean of the length-biased
  distribution. However, because long events are over-sampled, the mean of
  the durations you catch is actually higher than the true mean of all
  events. For many common distributions (like the Exponential
  distribution), the relationship is: μ=E[V]

  Wait, if you ignore the part you missed (U) and only average the parts
  you saw (V), you often arrive back at the true mean. This is known as
  the Inspection Paradox.

Now I suppose I should do the real research to see how much of that is a
hallucination :-)