lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 13 Jan 2022 11:01:30 +0000
From:   Mark Rutland <mark.rutland@....com>
To:     Marc Zyngier <maz@...nel.org>
Cc:     linux-kernel@...r.kernel.org, aleksandar.qemu.devel@...il.com,
        alexandru.elisei@....com, anup.patel@....com,
        aou@...s.berkeley.edu, atish.patra@....com,
        benh@...nel.crashing.org, borntraeger@...ux.ibm.com, bp@...en8.de,
        catalin.marinas@....com, chenhuacai@...nel.org,
        dave.hansen@...ux.intel.com, david@...hat.com,
        frankja@...ux.ibm.com, frederic@...nel.org, gor@...ux.ibm.com,
        hca@...ux.ibm.com, imbrenda@...ux.ibm.com, james.morse@....com,
        jmattson@...gle.com, joro@...tes.org, kvm@...r.kernel.org,
        mingo@...hat.com, mpe@...erman.id.au, nsaenzju@...hat.com,
        palmer@...belt.com, paulmck@...nel.org, paulus@...ba.org,
        paul.walmsley@...ive.com, pbonzini@...hat.com, seanjc@...gle.com,
        suzuki.poulose@....com, tglx@...utronix.de,
        tsbogend@...ha.franken.de, vkuznets@...hat.com,
        wanpengli@...cent.com, will@...nel.org
Subject: Re: [PATCH 1/5] kvm: add exit_to_guest_mode() and
 enter_from_guest_mode()

On Tue, Jan 11, 2022 at 05:54:59PM +0000, Marc Zyngier wrote:
> Hi Mark,
> 
> On Tue, 11 Jan 2022 15:35:35 +0000,
> Mark Rutland <mark.rutland@....com> wrote:
> > 
> > When transitioning to/from guest mode, it is necessary to inform
> > lockdep, tracing, and RCU in a specific order, similar to the
> > requirements for transitions to/from user mode. Additionally, it is
> > necessary to perform vtime accounting for a window around running the
> > guest, with RCU enabled, such that timer interrupts taken from the guest
> > can be accounted as guest time.
> > 
> > Most architectures don't handle all the necessary pieces, and a have a
> > number of common bugs, including unsafe usage of RCU during the window
> > between guest_enter() and guest_exit().
> > 
> > On x86, this was dealt with across commits:
> > 
> >   87fa7f3e98a1310e ("x86/kvm: Move context tracking where it belongs")
> >   0642391e2139a2c1 ("x86/kvm/vmx: Add hardirq tracing to guest enter/exit")
> >   9fc975e9efd03e57 ("x86/kvm/svm: Add hardirq tracing on guest enter/exit")
> >   3ebccdf373c21d86 ("x86/kvm/vmx: Move guest enter/exit into .noinstr.text")
> >   135961e0a7d555fc ("x86/kvm/svm: Move guest enter/exit into .noinstr.text")
> >   160457140187c5fb ("KVM: x86: Defer vtime accounting 'til after IRQ handling")
> >   bc908e091b326467 ("KVM: x86: Consolidate guest enter/exit logic to common helpers")
> > 
> > ... but those fixes are specific to x86, and as the resulting logic
> > (while correct) is split across generic helper functions and
> > x86-specific helper functions, it is difficult to see that the
> > entry/exit accounting is balanced.
> > 
> > This patch adds generic helpers which architectures can use to handle
> > guest entry/exit consistently and correctly. The guest_{enter,exit}()
> > helpers are split into guest_timing_{enter,exit}() to perform vtime
> > accounting, and guest_context_{enter,exit}() to perform the necessary
> > context tracking and RCU management. The existing guest_{enter,exit}()
> > heleprs are left as wrappers of these.
> > 
> > Atop this, new exit_to_guest_mode() and enter_from_guest_mode() helpers
> > are added to handle the ordering of lockdep, tracing, and RCU manageent.
> > These are named to align with exit_to_user_mode() and
> > enter_from_user_mode().
> > 
> > Subsequent patches will migrate architectures over to the new helpers,
> > following a sequence:
> > 
> > 	guest_timing_enter_irqoff();
> > 
> > 	exit_to_guest_mode();
> > 	< run the vcpu >
> > 	enter_from_guest_mode();
> > 
> > 	< take any pending IRQs >
> > 
> > 	guest_timing_exit_irqoff();
> > 
> > This sequences handles all of the above correctly, and more clearly
> > balances the entry and exit portions, making it easier to understand.
> > 
> > The existing helpers are marked as deprecated, and will be removed once
> > all architectures have been converted.
> > 
> > There should be no functional change as a result of this patch.
> > 
> > Signed-off-by: Mark Rutland <mark.rutland@....com>
> 
> Thanks a lot for looking into this and writing this up. I have a
> couple of comments below, but that's pretty much cosmetic and is only
> there to ensure that I actually understand this stuff. FWIW:
> 
> Reviewed-by: Marc Zyngier <maz@...nel.org>

Thanks!

> > ---
> >  include/linux/kvm_host.h | 108 +++++++++++++++++++++++++++++++++++++--
> >  1 file changed, 105 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index c310648cc8f1..13fcf7979880 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -29,6 +29,8 @@
> >  #include <linux/refcount.h>
> >  #include <linux/nospec.h>
> >  #include <linux/notifier.h>
> > +#include <linux/ftrace.h>
> > +#include <linux/instrumentation.h>
> >  #include <asm/signal.h>
> >  
> >  #include <linux/kvm.h>
> > @@ -362,8 +364,11 @@ struct kvm_vcpu {
> >  	int last_used_slot;
> >  };
> >  
> > -/* must be called with irqs disabled */
> > -static __always_inline void guest_enter_irqoff(void)
> > +/*
> > + * Start accounting time towards a guest.
> > + * Must be called before entering guest context.
> > + */
> > +static __always_inline void guest_timing_enter_irqoff(void)
> >  {
> >  	/*
> >  	 * This is running in ioctl context so its safe to assume that it's the
> > @@ -372,7 +377,17 @@ static __always_inline void guest_enter_irqoff(void)
> >  	instrumentation_begin();
> >  	vtime_account_guest_enter();
> >  	instrumentation_end();
> > +}
> >  
> > +/*
> > + * Enter guest context and enter an RCU extended quiescent state.
> > + *
> > + * This should be the last thing called before entering the guest, and must be
> > + * called after any potential use of RCU (including any potentially
> > + * instrumented code).
> 
> nit: "the last thing called" is terribly ambiguous. Any architecture
> obviously calls a ****load of stuff after this point. Should this be
> 'the last thing involving RCU' instead?

I agree this is unclear and I struggled to fing good wording for this. Is the
following any better?

/*
 * Enter guest context and enter an RCU extended quiescent state.
 *
 * Between guest_context_enter_irqoff() and guest_context_exit_irqoff() it is
 * unsafe to use any code which may directly or indirectly use RCU, tracing
 * (including IRQ flag tracing), or lockdep. All code in this period must be
 * non-instrumentable.
 */

If that's good I can add similar to guest_context_exit_irqoff().

[...]

> > +/**
> > + * exit_to_guest_mode - Fixup state when exiting to guest mode
> > + *
> > + * This is analagous to exit_to_user_mode(), and ensures we perform the
> > + * following in order:
> > + *
> > + * 1) Trace interrupts on state
> > + * 2) Invoke context tracking if enabled to adjust RCU state
> > + * 3) Tell lockdep that interrupts are enabled
> 
> nit: or rather, are about to be enabled? Certainly on arm64, the
> enable happens much later, right at the point where we enter the guest
> for real.

True; I'd cribbed the wording from the comment block above exit_to_user_mode(),
but I stripped the context that made that clear. I'll make that:

	/**
	 * exit_to_guest_mode - Fixup state when exiting to guest mode
	 *
	 * Entry to a guest will enable interrupts, but the kernel state is
	 * interrupts disabled when this is invoked. Also tell RCU about it.
	 *
	 * 1) Trace interrupts on state
	 * 2) Invoke context tracking if enabled to adjust RCU state
	 * 3) Tell lockdep that interrupts are enabled
	 *
	 * Invoked from architecture specific code before entering a guest.
	 * Must be called with interrupts disabled and the caller must be
	 * non-instrumentable.
	 * The caller has to invoke guest_timing_enter_irqoff() before this.
	 *
	 * Note: this is analagous to exit_to_user_mode().
	 */

... with likewise for enter_from_guest_mode(), if that's clear enough?

FWIW, the comment blcok for exit_to_user_mode() in
include/linux/entry-common.h says:

	/**
	 * exit_to_user_mode - Fixup state when exiting to user mode
	 *
	 * Syscall/interrupt exit enables interrupts, but the kernel state is
	 * interrupts disabled when this is invoked. Also tell RCU about it.
	 *
	 * 1) Trace interrupts on state
	 * 2) Invoke context tracking if enabled to adjust RCU state
	 * 3) Invoke architecture specific last minute exit code, e.g. speculation
	 *    mitigations, etc.: arch_exit_to_user_mode()
	 * 4) Tell lockdep that interrupts are enabled
	 *
	 * Invoked from architecture specific code when syscall_exit_to_user_mode()
	 * is not suitable as the last step before returning to userspace. Must be
	 * invoked with interrupts disabled and the caller must be
	 * non-instrumentable.
	 * The caller has to invoke syscall_exit_to_user_mode_work() before this.
	 */

Thanks,
Mark.

Powered by blists - more mailing lists