[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <261A8604-DA8D-468A-83BB-F530D5639A43@oracle.com>
Date: Wed, 19 Nov 2025 00:20:34 +0000
From: Prakash Sangappa <prakash.sangappa@...cle.com>
To: Thomas Gleixner <tglx@...utronix.de>
CC: LKML <linux-kernel@...r.kernel.org>,
Peter Zijlstra
<peterz@...radead.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
"Paul E. McKenney" <paulmck@...nel.org>,
Boqun Feng <boqun.feng@...il.com>, Jonathan Corbet <corbet@....net>,
Madadi Vineeth Reddy
<vineethr@...ux.ibm.com>,
K Prateek Nayak <kprateek.nayak@....com>,
Steven
Rostedt <rostedt@...dmis.org>,
Sebastian Andrzej Siewior
<bigeasy@...utronix.de>,
Arnd Bergmann <arnd@...db.de>,
"linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>
Subject: Re: [patch V3 07/12] rseq: Implement syscall entry work for time
slice extensions
> On Oct 29, 2025, at 6:22 AM, Thomas Gleixner <tglx@...utronix.de> wrote:
>
> The kernel sets SYSCALL_WORK_RSEQ_SLICE when it grants a time slice
> extension. This allows to handle the rseq_slice_yield() syscall, which is
> used by user space to relinquish the CPU after finishing the critical
> section for which it requested an extension.
>
> In case the kernel state is still GRANTED, the kernel resets both kernel
> and user space state with a set of sanity checks. If the kernel state is
> already cleared, then this raced against the timer or some other interrupt
> and just clears the work bit.
>
> Doing it in syscall entry work allows to catch misbehaving user space,
> which issues a syscall from the critical section. Wrong syscall and
> inconsistent user space result in a SIGSEGV.
>
>
[…]
> +/*
> + * Invoked from syscall entry if a time slice extension was granted and the
> + * kernel did not clear it before user space left the critical section.
> + */
> +void rseq_syscall_enter_work(long syscall)
> +{
[…]
>
> + curr->rseq.slice.state.granted = false;
> + /*
> + * Clear the grant in user space and check whether this was the
> + * correct syscall to yield. If the user access fails or the task
> + * used an arbitrary syscall, terminate it.
> + */
> + if (put_user(0U, &curr->rseq.usrptr->slice_ctrl.all) || syscall != __NR_rseq_slice_yield)
> + force_sig(SIGSEGV);
> +}
I have been trying to get our Database team to implement changes to use the slice extension API.
They encounter the issue with a system call being made within the slice extension window and the
process dies with SEGV.
Apparently it will be hard to enforce not calling a system call in the slice extension window due to layering.
For the DB use case, It is fine to terminate the slice extension if a system call is made, but the process
getting killed will not work.
Thanks,
-Prakash
Powered by blists - more mailing lists