lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20251103192042.4820571b@gandalf.local.home>
Date: Mon, 3 Nov 2025 19:20:42 -0500
From: Steven Rostedt <rostedt@...dmis.org>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: LKML <linux-kernel@...r.kernel.org>, Peter Zijlstra
 <peterz@...radead.org>, Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
 "Paul E. McKenney" <paulmck@...nel.org>, Boqun Feng <boqun.feng@...il.com>,
 Jonathan Corbet <corbet@....net>, Prakash Sangappa
 <prakash.sangappa@...cle.com>, Madadi Vineeth Reddy
 <vineethr@...ux.ibm.com>, K Prateek Nayak <kprateek.nayak@....com>,
 Sebastian Andrzej Siewior <bigeasy@...utronix.de>, Arnd Bergmann
 <arnd@...db.de>, linux-arch@...r.kernel.org, vineethrp@...gle.com
Subject: Re: [patch V3 02/12] rseq: Add fields and constants for time slice
 extension

On Wed, 29 Oct 2025 14:22:14 +0100 (CET)
Thomas Gleixner <tglx@...utronix.de> wrote:

> +/**
> + * rseq_slice_ctrl - Time slice extension control structure
> + * @all:	Compound value
> + * @request:	Request for a time slice extension
> + * @granted:	Granted time slice extension
> + *
> + * @request is set by user space and can be cleared by user space or kernel
> + * space.  @granted is set and cleared by the kernel and must only be read
> + * by user space.
> + */
> +struct rseq_slice_ctrl {
> +	union {
> +		__u32		all;
> +		struct {
> +			__u8	request;
> +			__u8	granted;
> +			__u16	__reserved;
> +		};
> +	};
> +};
> +
>  /*
>   * struct rseq is aligned on 4 * 8 bytes to ensure it is always
>   * contained within a single cache-line.
> @@ -142,6 +174,12 @@ struct rseq {
>  	__u32 mm_cid;
>  
>  	/*
> +	 * Time slice extension control structure. CPU local updates from
> +	 * kernel and user space.
> +	 */
> +	struct rseq_slice_ctrl slice_ctrl;
> +

BTW, Google is interested in expanding this feature for VMs. As a VM kernel
spinlock also happens to be a user space spinlock. The KVM folks would
rather have this implemented via the normal user spaces method than to do
anything specific to the KVM internal code. Or at least, keep it as
non-intrusive as possible.

I talked with Mathieu and the KVM folks on how it could use the rseq
method, and it was suggested that qemu would set up a shared memory region
between the qemu thread and the virtual CPU and possibly submit a driver
that would expose this memory region. This could hook to a paravirt
spinlock that would set the bit stating the system is in a critical section
and clear it when all spin locks are released. If the vCPU was granted an
extra time slice, then it would call a hypercall that would do the yield.

When I mentioned this to Mathieu, he was against sharing the qemu's thread
rseq with the guest VM, as that would expose much more than what is needed
to the guest. Especially since it needs to be a writable memory location.

What could be done is that another memory range is mapped between the qemu
thread and the vCPU memory, and the rseq would have a pointer to that memory.

To implement that, the slice_ctrl would need to be a pointer, where the
kernel would need to do another indirection to follow that pointer to
another location within the thread's memory.

Now I do believe that the return back to guest goes through a different
path. So this doesn't actually need to use rseq. But it would require a way
for the qemu thread to pass the memory to the kernel. I'm guessing that the
return to guest logic could share the code with the return to user logic
with just passing a struct rseq_slice_ctrl pointer to a function?

I'm bringing this up so that this use case is considered when implementing
the extended time slice. As I believe this would be a more common case than
then user space spin lock would be.

Thanks,

-- Steve

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ