lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251128225931.959481199@linutronix.de>
Date: Mon,  1 Dec 2025 08:05:36 +0100 (CET)
From: Thomas Gleixner <tglx@...utronix.de>
To: LKML <linux-kernel@...r.kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
 "Paul E. McKenney" <paulmck@...nel.org>,
 Boqun Feng <boqun.feng@...il.com>,
 Jonathan Corbet <corbet@....net>,
 Prakash Sangappa <prakash.sangappa@...cle.com>,
 Madadi Vineeth Reddy <vineethr@...ux.ibm.com>,
 K Prateek Nayak <kprateek.nayak@....com>,
 Steven Rostedt <rostedt@...dmis.org>,
 Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
 Arnd Bergmann <arnd@...db.de>,
 linux-arch@...r.kernel.org,
 Randy Dunlap <rdunlap@...radead.org>,
 Peter Zijlstra <peterz@...radead.org>,
 Ron Geva <rongevarg@...il.com>,
 Waiman Long <longman@...hat.com>
Subject: [patch V5 00/11] rseq: Implement time slice extension mechanism

This is a follow up on the V4 version:

     https://lore.kernel.org/20251116173423.031443519@linutronix.de

V1 contains a detailed explanation:

     https://lore.kernel.org/20250908225709.144709889@linutronix.de

TLDR: Time slice extensions are an attempt to provide opportunistic
priority ceiling without the overhead of an actual priority ceiling
protocol, but also without the guarantees such a protocol provides.

The intent is to avoid situations where a user space thread is interrupted
in a critical section and scheduled out, while holding a resource on which
the preempting thread or other threads in the system might block on. That
obviously prevents those threads from making progress in the worst case for
at least a full time slice. Especially in the context of user space
spinlocks, which are a patently bad idea to begin with, but that's also
true for other mechanisms.

This series uses the existing RSEQ user memory to implement it.

Changes vs. V4:

   - Rebase on the newest uaccess, RSEQ and CID changes

   - Remove the restriction to use rseq_slice_yield() and allow arbitrary
     syscalls to terminate the granted extension gracefully. That's
     required to support onion architectured applications where the
     layering has no control over the actual code which runs inside the
     critical section which started with requesting the extension.

   - Drop the set_need_resched_current() patch as that has been merged into
     the scheduler tree already.

All prerequisites required can be found in git:

    git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git rseq/cid

For your convenience all of it is also available as a conglomerate from
git:

    git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git rseq/slice

This still uses syscall NR 470, which conflicts with pending changes in
-next, but that will be sorted after 6.19-rc1 with the hopefully final
submission of this. For now this sticks to 470 to avoid pulling the full
zoo of -next.

In the reply to the V3 and V4 series there have been actual numbers posted
vs. the Oracle workload which triggered this whole effort and numbers vs. a
hacked up version of a netflix global lock benchmark.

I took some inspiriation from that netflix benchmark and implemented a new
version from scratch to explore a few aspects of this time slice mechanism
especially concerning the overhead in the non-contended case and the
effects of the 'work' within and outside of the lock held region. Along
with the effects of background activity.

The results are not really always what you expect, but there is a clear
sweet spot where the overhead of the time slice magic in the uncontended
case flips over to a benefit. Your mileage might vary. :)

The benchmark source with a pile of barely documented command line options
is available here:

   https://tglx.de/~tglx/timeslice/lock_slice.c

Use it at your own peril. It's a hack and I only tried to build it with

    gcc -O2 -Wall lock_slice.c -o l

Thanks,

	tglx
---
 Documentation/admin-guide/kernel-parameters.txt |    5 
 Documentation/admin-guide/sysctl/kernel.rst     |    8 
 Documentation/userspace-api/index.rst           |    1 
 Documentation/userspace-api/rseq.rst            |  135 +++++++++
 arch/alpha/kernel/syscalls/syscall.tbl          |    1 
 arch/arm/tools/syscall.tbl                      |    1 
 arch/arm64/tools/syscall_32.tbl                 |    1 
 arch/m68k/kernel/syscalls/syscall.tbl           |    1 
 arch/microblaze/kernel/syscalls/syscall.tbl     |    1 
 arch/mips/kernel/syscalls/syscall_n32.tbl       |    1 
 arch/mips/kernel/syscalls/syscall_n64.tbl       |    1 
 arch/mips/kernel/syscalls/syscall_o32.tbl       |    1 
 arch/parisc/kernel/syscalls/syscall.tbl         |    1 
 arch/powerpc/kernel/syscalls/syscall.tbl        |    1 
 arch/s390/kernel/syscalls/syscall.tbl           |    1 
 arch/sh/kernel/syscalls/syscall.tbl             |    1 
 arch/sparc/kernel/syscalls/syscall.tbl          |    1 
 arch/x86/entry/syscalls/syscall_32.tbl          |    1 
 arch/x86/entry/syscalls/syscall_64.tbl          |    1 
 arch/xtensa/kernel/syscalls/syscall.tbl         |    1 
 include/linux/entry-common.h                    |    2 
 include/linux/rseq.h                            |   11 
 include/linux/rseq_entry.h                      |  192 +++++++++++++-
 include/linux/rseq_types.h                      |   32 ++
 include/linux/syscalls.h                        |    1 
 include/linux/thread_info.h                     |   16 -
 include/uapi/asm-generic/unistd.h               |    5 
 include/uapi/linux/prctl.h                      |   10 
 include/uapi/linux/rseq.h                       |   38 ++
 init/Kconfig                                    |   12 
 kernel/entry/common.c                           |   14 -
 kernel/entry/syscall-common.c                   |   11 
 kernel/rseq.c                                   |  328 ++++++++++++++++++++++++
 kernel/sys.c                                    |    6 
 kernel/sys_ni.c                                 |    1 
 scripts/syscall.tbl                             |    1 
 tools/testing/selftests/rseq/.gitignore         |    1 
 tools/testing/selftests/rseq/Makefile           |    5 
 tools/testing/selftests/rseq/rseq-abi.h         |   27 +
 tools/testing/selftests/rseq/slice_test.c       |  219 ++++++++++++++++
 40 files changed, 1070 insertions(+), 27 deletions(-)



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ