[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1673583543.19718.1629409152244.JavaMail.zimbra@efficios.com>
Date: Thu, 19 Aug 2021 17:39:12 -0400 (EDT)
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: "Russell King, ARM Linux" <linux@...linux.org.uk>,
Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will@...nel.org>, Guo Ren <guoren@...nel.org>,
Thomas Bogendoerfer <tsbogend@...ha.franken.de>,
Michael Ellerman <mpe@...erman.id.au>,
Heiko Carstens <hca@...ux.ibm.com>, gor <gor@...ux.ibm.com>,
Christian Borntraeger <borntraeger@...ibm.com>,
Oleg Nesterov <oleg@...hat.com>, rostedt <rostedt@...dmis.org>,
Ingo Molnar <mingo@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>,
Peter Zijlstra <peterz@...radead.org>,
Andy Lutomirski <luto@...nel.org>,
paulmck <paulmck@...nel.org>, Boqun Feng <boqun.feng@...il.com>,
Paolo Bonzini <pbonzini@...hat.com>, shuah <shuah@...nel.org>,
Benjamin Herrenschmidt <benh@...nel.crashing.org>,
Paul Mackerras <paulus@...ba.org>,
linux-arm-kernel <linux-arm-kernel@...ts.infradead.org>,
linux-kernel <linux-kernel@...r.kernel.org>,
linux-csky <linux-csky@...r.kernel.org>,
linux-mips@...r.kernel.org,
linuxppc-dev <linuxppc-dev@...ts.ozlabs.org>,
linux-s390@...r.kernel.org, KVM list <kvm@...r.kernel.org>,
linux-kselftest <linux-kselftest@...r.kernel.org>,
Peter Foley <pefoley@...gle.com>,
Shakeel Butt <shakeelb@...gle.com>,
Ben Gardon <bgardon@...gle.com>
Subject: Re: [PATCH 1/5] KVM: rseq: Update rseq when processing
NOTIFY_RESUME on xfer to KVM guest
----- On Aug 17, 2021, at 8:12 PM, Sean Christopherson seanjc@...gle.com wrote:
> Invoke rseq's NOTIFY_RESUME handler when processing the flag prior to
> transferring to a KVM guest, which is roughly equivalent to an exit to
> userspace and processes many of the same pending actions. While the task
> cannot be in an rseq critical section as the KVM path is reachable only
> via ioctl(KVM_RUN), the side effects that apply to rseq outside of a
> critical section still apply, e.g. the CPU ID needs to be updated if the
> task is migrated.
>
> Clearing TIF_NOTIFY_RESUME without informing rseq can lead to segfaults
> and other badness in userspace VMMs that use rseq in combination with KVM,
> e.g. due to the CPU ID being stale after task migration.
I agree with the problem assessment, but I would recommend a small change
to this fix.
>
> Fixes: 72c3c0fe54a3 ("x86/kvm: Use generic xfer to guest work function")
> Reported-by: Peter Foley <pefoley@...gle.com>
> Bisected-by: Doug Evans <dje@...gle.com>
> Cc: Shakeel Butt <shakeelb@...gle.com>
> Cc: Thomas Gleixner <tglx@...utronix.de>
> Cc: stable@...r.kernel.org
> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> ---
> kernel/entry/kvm.c | 4 +++-
> kernel/rseq.c | 4 ++--
> 2 files changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/entry/kvm.c b/kernel/entry/kvm.c
> index 49972ee99aff..049fd06b4c3d 100644
> --- a/kernel/entry/kvm.c
> +++ b/kernel/entry/kvm.c
> @@ -19,8 +19,10 @@ static int xfer_to_guest_mode_work(struct kvm_vcpu *vcpu,
> unsigned long ti_work)
> if (ti_work & _TIF_NEED_RESCHED)
> schedule();
>
> - if (ti_work & _TIF_NOTIFY_RESUME)
> + if (ti_work & _TIF_NOTIFY_RESUME) {
> tracehook_notify_resume(NULL);
> + rseq_handle_notify_resume(NULL, NULL);
> + }
>
> ret = arch_xfer_to_guest_mode_handle_work(vcpu, ti_work);
> if (ret)
> diff --git a/kernel/rseq.c b/kernel/rseq.c
> index 35f7bd0fced0..58c79a7918cd 100644
> --- a/kernel/rseq.c
> +++ b/kernel/rseq.c
> @@ -236,7 +236,7 @@ static bool in_rseq_cs(unsigned long ip, struct rseq_cs
> *rseq_cs)
>
> static int rseq_ip_fixup(struct pt_regs *regs)
> {
> - unsigned long ip = instruction_pointer(regs);
> + unsigned long ip = regs ? instruction_pointer(regs) : 0;
> struct task_struct *t = current;
> struct rseq_cs rseq_cs;
> int ret;
> @@ -250,7 +250,7 @@ static int rseq_ip_fixup(struct pt_regs *regs)
> * If not nested over a rseq critical section, restart is useless.
> * Clear the rseq_cs pointer and return.
> */
> - if (!in_rseq_cs(ip, &rseq_cs))
> + if (!regs || !in_rseq_cs(ip, &rseq_cs))
I think clearing the thread's rseq_cs unconditionally here when regs is NULL
is not the behavior we want when this is called from xfer_to_guest_mode_work.
If we have a scenario where userspace ends up calling this ioctl(KVM_RUN)
from within a rseq c.s., we really want a CONFIG_DEBUG_RSEQ=y kernel to
kill this application in the rseq_syscall handler when exiting back to usermode
when the ioctl eventually returns.
However, clearing the thread's rseq_cs will prevent this from happening.
So I would favor an approach where we simply do:
if (!regs)
return 0;
Immediately at the beginning of rseq_ip_fixup, before getting the instruction
pointer, so effectively skip all side-effects of the ip fixup code. Indeed, it
is not relevant to do any fixup here, because it is nested in a ioctl system
call.
Effectively, this would preserve the SIGSEGV behavior when this ioctl is
erroneously called by user-space from a rseq critical section.
Thanks for looking into this !
Mathieu
> return clear_rseq_cs(t);
> ret = rseq_need_restart(t, rseq_cs.flags);
> if (ret <= 0)
> --
> 2.33.0.rc1.237.g0d66db33f3-goog
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Powered by blists - more mailing lists