[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aKzGZjyEQq3u-M68@google.com>
Date: Mon, 25 Aug 2025 13:24:06 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc: Thomas Gleixner <tglx@...utronix.de>, LKML <linux-kernel@...r.kernel.org>,
Jens Axboe <axboe@...nel.dk>, Paolo Bonzini <pbonzini@...hat.com>, Wei Liu <wei.liu@...nel.org>,
Dexuan Cui <decui@...rosoft.com>, Peter Zijlstra <peterz@...radead.org>,
"Paul E. McKenney" <paulmck@...nel.org>, Boqun Feng <boqun.feng@...il.com>, x86@...nel.org,
Arnd Bergmann <arnd@...db.de>, Heiko Carstens <hca@...ux.ibm.com>,
Christian Borntraeger <borntraeger@...ux.ibm.com>, Sven Schnelle <svens@...ux.ibm.com>,
Huacai Chen <chenhuacai@...nel.org>, Paul Walmsley <paul.walmsley@...ive.com>,
Palmer Dabbelt <palmer@...belt.com>
Subject: Re: [patch V2 07/37] rseq, virt: Retrigger RSEQ after vcpu_run()
On Mon, Aug 25, 2025, Mathieu Desnoyers wrote:
> On 2025-08-23 12:39, Thomas Gleixner wrote:
> > Hypervisors invoke resume_user_mode_work() before entering the guest, which
> > clears TIF_NOTIFY_RESUME. The @regs argument is NULL as there is no user
> > space context available to them, so the rseq notify handler skips
> > inspecting the critical section, but updates the CPU/MM CID values
> > unconditionally so that the eventual pending rseq event is not lost on the
> > way to user space.
> >
> > This is a pointless exercise as the task might be rescheduled before
> > actually returning to user space and it creates unnecessary work in the
> > vcpu_run() loops.
>
> One question here: AFAIU, this removes the updates to the cpu_id_start,
> cpu_id, mm_cid, and node_id fields on exit to virt usermode. This means
> that while the virt guest is running in usermode, the host hypervisor
> process has stale rseq fields, until it eventually returns to the
> hypervisor's host userspace (from ioctl).
>
> Considering the rseq uapi documentation, this should not matter.
> Each of those fields have this statement:
>
> "This field should only be read by the thread which registered this data
> structure."
>
> I can however think of use-cases for reading the rseq fields from other
> hypervisor threads to figure out information about thread placement.
> Doing so would however go against the documented uapi.
>
> I'd rather ask whether anyone is misusing this uapi in that way before
> going ahead with the change, just to prevent surprises.
>
> I'm OK with the re-trigger of rseq, as it does indeed appear to fix
> an issue, but I'm concerned about the ABI impact of skipping the
> rseq_update_cpu_node_id() on return to virt userspace.
>
> Thoughts ?
I know the idea of exposing rseq to paravirtualized guests has been floated (more
than once), but I don't _think_ anyone has actually shipped anything of that
nature.
> > @@ -49,6 +49,7 @@
> > #include <linux/lockdep.h>
> > #include <linux/kthread.h>
> > #include <linux/suspend.h>
> > +#include <linux/rseq.h>
> > #include <asm/processor.h>
> > #include <asm/ioctl.h>
> > @@ -4466,6 +4467,8 @@ static long kvm_vcpu_ioctl(struct file *
> > r = kvm_arch_vcpu_ioctl_run(vcpu);
> > vcpu->wants_to_run = false;
> > + rseq_virt_userspace_exit();
I don't love bleeding even more entry/rseq details into KVM. Rather than optimize
KVM and then add TIF_RSEQ, what if we do the opposite? I.e. add TIF_RSEQ to
XFER_TO_GUEST_MODE_WORK as part of "rseq: Switch to TIF_RSEQ if supported", and
then drop TIF_RSEQ from XFER_TO_GUEST_MODE_WORK in a new patch?
That should make it easier to revert the KVM/virt change if it turns out PV setups
are playing games with rseq, and it would give the stragglers (arm64 in particular)
some motiviation to implement TIF_RSEQ and/or switch to generic TIF bits.
Powered by blists - more mailing lists