linux-kernel - Re: [patch V2 07/37] rseq, virt: Retrigger RSEQ after vcpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aKzGZjyEQq3u-M68@google.com>
Date: Mon, 25 Aug 2025 13:24:06 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc: Thomas Gleixner <tglx@...utronix.de>, LKML <linux-kernel@...r.kernel.org>, 
	Jens Axboe <axboe@...nel.dk>, Paolo Bonzini <pbonzini@...hat.com>, Wei Liu <wei.liu@...nel.org>, 
	Dexuan Cui <decui@...rosoft.com>, Peter Zijlstra <peterz@...radead.org>, 
	"Paul E. McKenney" <paulmck@...nel.org>, Boqun Feng <boqun.feng@...il.com>, x86@...nel.org, 
	Arnd Bergmann <arnd@...db.de>, Heiko Carstens <hca@...ux.ibm.com>, 
	Christian Borntraeger <borntraeger@...ux.ibm.com>, Sven Schnelle <svens@...ux.ibm.com>, 
	Huacai Chen <chenhuacai@...nel.org>, Paul Walmsley <paul.walmsley@...ive.com>, 
	Palmer Dabbelt <palmer@...belt.com>
Subject: Re: [patch V2 07/37] rseq, virt: Retrigger RSEQ after vcpu_run()

On Mon, Aug 25, 2025, Mathieu Desnoyers wrote:
> On 2025-08-23 12:39, Thomas Gleixner wrote:
> > Hypervisors invoke resume_user_mode_work() before entering the guest, which
> > clears TIF_NOTIFY_RESUME. The @regs argument is NULL as there is no user
> > space context available to them, so the rseq notify handler skips
> > inspecting the critical section, but updates the CPU/MM CID values
> > unconditionally so that the eventual pending rseq event is not lost on the
> > way to user space.
> > 
> > This is a pointless exercise as the task might be rescheduled before
> > actually returning to user space and it creates unnecessary work in the
> > vcpu_run() loops.
> 
> One question here: AFAIU, this removes the updates to the cpu_id_start,
> cpu_id, mm_cid, and node_id fields on exit to virt usermode. This means
> that while the virt guest is running in usermode, the host hypervisor
> process has stale rseq fields, until it eventually returns to the
> hypervisor's host userspace (from ioctl).
> 
> Considering the rseq uapi documentation, this should not matter.
> Each of those fields have this statement:
> 
> "This field should only be read by the thread which registered this data
> structure."
> 
> I can however think of use-cases for reading the rseq fields from other
> hypervisor threads to figure out information about thread placement.
> Doing so would however go against the documented uapi.
> 
> I'd rather ask whether anyone is misusing this uapi in that way before
> going ahead with the change, just to prevent surprises.
> 
> I'm OK with the re-trigger of rseq, as it does indeed appear to fix
> an issue, but I'm concerned about the ABI impact of skipping the
> rseq_update_cpu_node_id() on return to virt userspace.
> 
> Thoughts ?

I know the idea of exposing rseq to paravirtualized guests has been floated (more
than once), but I don't _think_ anyone has actually shipped anything of that 
nature.

> > @@ -49,6 +49,7 @@
> >   #include <linux/lockdep.h>
> >   #include <linux/kthread.h>
> >   #include <linux/suspend.h>
> > +#include <linux/rseq.h>
> >   #include <asm/processor.h>
> >   #include <asm/ioctl.h>
> > @@ -4466,6 +4467,8 @@ static long kvm_vcpu_ioctl(struct file *
> >   		r = kvm_arch_vcpu_ioctl_run(vcpu);
> >   		vcpu->wants_to_run = false;
> > +		rseq_virt_userspace_exit();

I don't love bleeding even more entry/rseq details into KVM.  Rather than optimize
KVM and then add TIF_RSEQ, what if we do the opposite?  I.e. add TIF_RSEQ to
XFER_TO_GUEST_MODE_WORK as part of "rseq: Switch to TIF_RSEQ if supported", and
then drop TIF_RSEQ from XFER_TO_GUEST_MODE_WORK in a new patch?

That should make it easier to revert the KVM/virt change if it turns out PV setups
are playing games with rseq, and it would give the stragglers (arm64 in particular)
some motiviation to implement TIF_RSEQ and/or switch to generic TIF bits.