lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YeF7Wvz05JhyCx0l@FVFF77S0Q05N>
Date:   Fri, 14 Jan 2022 13:32:10 +0000
From:   Mark Rutland <mark.rutland@....com>
To:     Christian Borntraeger <borntraeger@...ux.ibm.com>
Cc:     linux-kernel@...r.kernel.org, aleksandar.qemu.devel@...il.com,
        alexandru.elisei@....com, anup.patel@....com,
        aou@...s.berkeley.edu, atish.patra@....com,
        benh@...nel.crashing.org, bp@...en8.de, catalin.marinas@....com,
        chenhuacai@...nel.org, dave.hansen@...ux.intel.com,
        david@...hat.com, frankja@...ux.ibm.com, frederic@...nel.org,
        gor@...ux.ibm.com, hca@...ux.ibm.com, imbrenda@...ux.ibm.com,
        james.morse@....com, jmattson@...gle.com, joro@...tes.org,
        kvm@...r.kernel.org, maz@...nel.org, mingo@...hat.com,
        mpe@...erman.id.au, nsaenzju@...hat.com, palmer@...belt.com,
        paulmck@...nel.org, paulus@...ba.org, paul.walmsley@...ive.com,
        pbonzini@...hat.com, seanjc@...gle.com, suzuki.poulose@....com,
        tglx@...utronix.de, tsbogend@...ha.franken.de, vkuznets@...hat.com,
        wanpengli@...cent.com, will@...nel.org
Subject: Re: [PATCH 0/5] kvm: fix latent guest entry/exit bugs

On Fri, Jan 14, 2022 at 01:29:46PM +0100, Christian Borntraeger wrote:
> 
> 
> Am 14.01.22 um 13:19 schrieb Mark Rutland:
> > On Thu, Jan 13, 2022 at 04:20:07PM +0100, Christian Borntraeger wrote:
> > > Am 11.01.22 um 16:35 schrieb Mark Rutland:
> > > > Several architectures have latent bugs around guest entry/exit, most
> > > > notably:
> > > > 
> > > > 1) Several architectures enable interrupts between guest_enter() and
> > > >      guest_exit(). As this period is an RCU extended quiescent state (EQS) this
> > > >      is unsound unless the irq entry code explicitly wakes RCU, which most
> > > >      architectures only do for entry from usersapce or idle.
> > > > 
> > > >      I believe this affects: arm64, riscv, s390
> > > > 
> > > >      I am not sure about powerpc.
> > > > 
> > > > 2) Several architectures permit instrumentation of code between
> > > >      guest_enter() and guest_exit(), e.g. KASAN, KCOV, KCSAN, etc. As
> > > >      instrumentation may directly o indirectly use RCU, this has the same
> > > >      problems as with interrupts.
> > > > 
> > > >      I believe this affects: arm64, mips, powerpc, riscv, s390
> > > > 
> > > > 3) Several architectures do not inform lockdep and tracing that
> > > >      interrupts are enabled during the execution of the guest, or do so in
> > > >      an incorrect order. Generally
> > > >      this means that logs will report IRQs being masked for much longer
> > > >      than is actually the case, which is not ideal for debugging. I don't
> > > >      know whether this affects the correctness of lockdep.
> > > > 
> > > >      I believe this affects: arm64, mips, powerpc, riscv, s390
> > > > 
> > > > This was previously fixed for x86 specifically in a series of commits:
> > > > 
> > > >     87fa7f3e98a1310e ("x86/kvm: Move context tracking where it belongs")
> > > >     0642391e2139a2c1 ("x86/kvm/vmx: Add hardirq tracing to guest enter/exit")
> > > >     9fc975e9efd03e57 ("x86/kvm/svm: Add hardirq tracing on guest enter/exit")
> > > >     3ebccdf373c21d86 ("x86/kvm/vmx: Move guest enter/exit into .noinstr.text")
> > > >     135961e0a7d555fc ("x86/kvm/svm: Move guest enter/exit into .noinstr.text")
> > > >     160457140187c5fb ("KVM: x86: Defer vtime accounting 'til after IRQ handling")
> > > >     bc908e091b326467 ("KVM: x86: Consolidate guest enter/exit logic to common helpers")
> > > > 
> > > > But other architectures were left broken, and the infrastructure for
> > > > handling this correctly is x86-specific.
> > > > 
> > > > This series introduces generic helper functions which can be used to
> > > > handle the problems above, and migrates architectures over to these,
> > > > fixing the latent issues.
> > > > 
> > > > I wasn't able to figure my way around powerpc and s390, so I have not
> > > 
> > > I think 2 later patches have moved the guest_enter/exit a bit out.
> > > Does this make the s390 code clearer?
> > 
> > Yes; that's much simpler to follow!
> > 
> > One major thing I wasn't sure about for s390 is the sequence:
> > 
> > 	guest_enter_irqoff();	// Enters an RCU EQS
> > 	...
> > 	local_irq_enable();
> > 	...
> > 	sie64a(...);
> > 	...
> > 	local_irq_disable();
> > 	...
> > 	guest_exit_irqoff();	// Exits an RCU EQS
> > 
> > ... since if an IRQ is taken between local_irq_{enable,disable}(), RCU won't be
> > watching, and I couldn't spot whether your regular IRQ entry logic would wake
> > RCU in this case, or whether there was something else I'm missing that saves
> > you here.
> > 
> > For other architectures, including x86 and arm64, we enter the guest with IRQs
> > masked and return from the guest with IRQs masked, and don't actually take IRQs
> > until we unmask them in the host, after the guest_exit_*() logic has woken RCU
> > and so on.
> > 
> > I wasn't able to find documentation on the semantics of SIE, so I couldn't spot
> > whether the local_irq_{enable,disable}() calls were necessary, or could be
> > removed.
> 
> We run the SIE instruction with interrupts enabled. SIE is interruptible.
> The disable/enable pairs are just because  guest_enter/exit_irqoff() require them.

What I was trying to figure out was when an interrupt is taken between
guest_enter_irqoff() and guest_exit_irqoff(), where is RCU woken? I couldn't
spot that in the s390 entry code (probably simply because I'm not familiar with
it), and so AFAICT that means IRQ code could run without RCU watching, which
would cause things to explode.

On other architectures that problem is avoided because IRQs asserted during the
guest cause a specific guest exit rather than a regular IRQ exception, and the
HW enables/disables IRQs when entering/exiting the guest, so the host can leave
IRQs masked across guest_enter_irqoff()..guest_exit_irqoff().

Am I right in understanding that SIE itself won't enable (host) interrupts
while running the guest, and so it *needs* to be run with interrupts already
enabled?

> One thing to be aware of: in our entry.S - after an interrupt - we leave SIE by
> setting the return address of the interrupt after the sie instruction so that we
> get back into this __vcpu_run loop to check for signals and so.

Just to check, that's after the IRQ handler runs, right?

Thanks,
Mark.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ