[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aLlhSmeA_TPSheyu@google.com>
Date: Thu, 4 Sep 2025 02:52:10 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, LKML <linux-kernel@...r.kernel.org>,
Jens Axboe <axboe@...nel.dk>, Peter Zijlstra <peterz@...radead.org>,
"Paul E. McKenney" <paulmck@...nel.org>, Boqun Feng <boqun.feng@...il.com>,
Paolo Bonzini <pbonzini@...hat.com>, Wei Liu <wei.liu@...nel.org>,
Dexuan Cui <decui@...rosoft.com>, x86@...nel.org, Arnd Bergmann <arnd@...db.de>,
Heiko Carstens <hca@...ux.ibm.com>, Christian Borntraeger <borntraeger@...ux.ibm.com>,
Sven Schnelle <svens@...ux.ibm.com>, Huacai Chen <chenhuacai@...nel.org>,
Paul Walmsley <paul.walmsley@...ive.com>, Palmer Dabbelt <palmer@...belt.com>
Subject: Re: [patch V2 25/37] rseq: Rework the TIF_NOTIFY handler
On Tue, Sep 02, 2025, Thomas Gleixner wrote:
> On Tue, Aug 26 2025 at 11:12, Mathieu Desnoyers wrote:
> > On 2025-08-23 12:40, Thomas Gleixner wrote:
> >> +void __rseq_handle_notify_resume(struct pt_regs *regs)
> >> +{
> >> + /*
> >> + * If invoked from hypervisors before entering the guest via
> >> + * resume_user_mode_work(), then @regs is a NULL pointer.
> >> + *
> >> + * resume_user_mode_work() clears TIF_NOTIFY_RESUME and re-raises
> >> + * it before returning from the ioctl() to user space when
> >> + * rseq_event.sched_switch is set.
> >> + *
> >> + * So it's safe to ignore here instead of pointlessly updating it
> >> + * in the vcpu_run() loop.
> >
> > I don't think any virt user should expect the userspace fields to be
> > updated on the host process while running in guest mode, but it's good
> > to clarify that we intend to change this user-visible behavior within
> > this series, to spare any unwelcome surprise.
>
> Actually it is not really a user-visible change.
It's definitely a user-visible change in the sense that userspace, via the guest,
will see different behavior.
> TLS::rseq is thread local and any update to it becomes only visible to
> user space once the vCPU thread actually returns to user space. Arguably
> no guest has legitimately access to the hosts VCPU thread's TLS.
>
> You might argue, that GDB might look at the thread's TLS::rseq while the
> task runs in VCPUs guest mode. But that's completely irrelevant because
> once a task enters the kernel the RSEQ CPU/NODE/MM ids have no meaning
> anymore. They are only valid as long as the task runs in user space.
Paravirt setups, e.g. hoisting host-controlled workloads into VMs, have explored
(ab)using rseq. In such setups, host threads are often mapped 1:1 to vCPUs, in
which case the pCPU in particular becomes interesting.
> When a task hits a breakpoint GDB can only look at the state _before_
> that and that's all what it can see when it looks at the TLS of a
> thread, which voluntarily went into the kernel via the KVM ioctl.
>
> That update is truly a kernel internal implementation detail and it got
> introduced way _after_ the initial RSEQ implementation.
Yes, but that doesn't change the fact that a user _could_ have come to depend on
the current behavior sometime in the last ~5 years.
I'm ok formally stating that exposing rseq directly to a KVM guest is unsupported,
but I would like to explicitly call out and document the change.
Powered by blists - more mailing lists