[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b66c4856-7826-9cff-83f3-007d7ed5635c@linux.ibm.com>
Date: Fri, 14 Jan 2022 14:51:38 +0100
From: Christian Borntraeger <borntraeger@...ux.ibm.com>
To: Mark Rutland <mark.rutland@....com>
Cc: linux-kernel@...r.kernel.org, aleksandar.qemu.devel@...il.com,
alexandru.elisei@....com, anup.patel@....com,
aou@...s.berkeley.edu, atish.patra@....com,
benh@...nel.crashing.org, bp@...en8.de, catalin.marinas@....com,
chenhuacai@...nel.org, dave.hansen@...ux.intel.com,
david@...hat.com, frankja@...ux.ibm.com, frederic@...nel.org,
gor@...ux.ibm.com, hca@...ux.ibm.com, imbrenda@...ux.ibm.com,
james.morse@....com, jmattson@...gle.com, joro@...tes.org,
kvm@...r.kernel.org, maz@...nel.org, mingo@...hat.com,
mpe@...erman.id.au, nsaenzju@...hat.com, palmer@...belt.com,
paulmck@...nel.org, paulus@...ba.org, paul.walmsley@...ive.com,
pbonzini@...hat.com, seanjc@...gle.com, suzuki.poulose@....com,
tglx@...utronix.de, tsbogend@...ha.franken.de, vkuznets@...hat.com,
wanpengli@...cent.com, will@...nel.org
Subject: Re: [PATCH 0/5] kvm: fix latent guest entry/exit bugs
Am 14.01.22 um 14:32 schrieb Mark Rutland:
> On Fri, Jan 14, 2022 at 01:29:46PM +0100, Christian Borntraeger wrote:
>>
>>
>> Am 14.01.22 um 13:19 schrieb Mark Rutland:
>>> On Thu, Jan 13, 2022 at 04:20:07PM +0100, Christian Borntraeger wrote:
>>>> Am 11.01.22 um 16:35 schrieb Mark Rutland:
>>>>> Several architectures have latent bugs around guest entry/exit, most
>>>>> notably:
>>>>>
>>>>> 1) Several architectures enable interrupts between guest_enter() and
>>>>> guest_exit(). As this period is an RCU extended quiescent state (EQS) this
>>>>> is unsound unless the irq entry code explicitly wakes RCU, which most
>>>>> architectures only do for entry from usersapce or idle.
>>>>>
>>>>> I believe this affects: arm64, riscv, s390
>>>>>
>>>>> I am not sure about powerpc.
>>>>>
>>>>> 2) Several architectures permit instrumentation of code between
>>>>> guest_enter() and guest_exit(), e.g. KASAN, KCOV, KCSAN, etc. As
>>>>> instrumentation may directly o indirectly use RCU, this has the same
>>>>> problems as with interrupts.
>>>>>
>>>>> I believe this affects: arm64, mips, powerpc, riscv, s390
>>>>>
>>>>> 3) Several architectures do not inform lockdep and tracing that
>>>>> interrupts are enabled during the execution of the guest, or do so in
>>>>> an incorrect order. Generally
>>>>> this means that logs will report IRQs being masked for much longer
>>>>> than is actually the case, which is not ideal for debugging. I don't
>>>>> know whether this affects the correctness of lockdep.
>>>>>
>>>>> I believe this affects: arm64, mips, powerpc, riscv, s390
>>>>>
>>>>> This was previously fixed for x86 specifically in a series of commits:
>>>>>
>>>>> 87fa7f3e98a1310e ("x86/kvm: Move context tracking where it belongs")
>>>>> 0642391e2139a2c1 ("x86/kvm/vmx: Add hardirq tracing to guest enter/exit")
>>>>> 9fc975e9efd03e57 ("x86/kvm/svm: Add hardirq tracing on guest enter/exit")
>>>>> 3ebccdf373c21d86 ("x86/kvm/vmx: Move guest enter/exit into .noinstr.text")
>>>>> 135961e0a7d555fc ("x86/kvm/svm: Move guest enter/exit into .noinstr.text")
>>>>> 160457140187c5fb ("KVM: x86: Defer vtime accounting 'til after IRQ handling")
>>>>> bc908e091b326467 ("KVM: x86: Consolidate guest enter/exit logic to common helpers")
>>>>>
>>>>> But other architectures were left broken, and the infrastructure for
>>>>> handling this correctly is x86-specific.
>>>>>
>>>>> This series introduces generic helper functions which can be used to
>>>>> handle the problems above, and migrates architectures over to these,
>>>>> fixing the latent issues.
>>>>>
>>>>> I wasn't able to figure my way around powerpc and s390, so I have not
>>>>
>>>> I think 2 later patches have moved the guest_enter/exit a bit out.
>>>> Does this make the s390 code clearer?
>>>
>>> Yes; that's much simpler to follow!
>>>
>>> One major thing I wasn't sure about for s390 is the sequence:
>>>
>>> guest_enter_irqoff(); // Enters an RCU EQS
>>> ...
>>> local_irq_enable();
>>> ...
>>> sie64a(...);
>>> ...
>>> local_irq_disable();
>>> ...
>>> guest_exit_irqoff(); // Exits an RCU EQS
>>>
>>> ... since if an IRQ is taken between local_irq_{enable,disable}(), RCU won't be
>>> watching, and I couldn't spot whether your regular IRQ entry logic would wake
>>> RCU in this case, or whether there was something else I'm missing that saves
>>> you here.
>>>
>>> For other architectures, including x86 and arm64, we enter the guest with IRQs
>>> masked and return from the guest with IRQs masked, and don't actually take IRQs
>>> until we unmask them in the host, after the guest_exit_*() logic has woken RCU
>>> and so on.
>>>
>>> I wasn't able to find documentation on the semantics of SIE, so I couldn't spot
>>> whether the local_irq_{enable,disable}() calls were necessary, or could be
>>> removed.
>>
>> We run the SIE instruction with interrupts enabled. SIE is interruptible.
>> The disable/enable pairs are just because guest_enter/exit_irqoff() require them.
>
> What I was trying to figure out was when an interrupt is taken between
> guest_enter_irqoff() and guest_exit_irqoff(), where is RCU woken? I couldn't
> spot that in the s390 entry code (probably simply because I'm not familiar with
> it), and so AFAICT that means IRQ code could run without RCU watching, which
> would cause things to explode.
>
> On other architectures that problem is avoided because IRQs asserted during the
> guest cause a specific guest exit rather than a regular IRQ exception, and the
> HW enables/disables IRQs when entering/exiting the guest, so the host can leave
> IRQs masked across guest_enter_irqoff()..guest_exit_irqoff().
>
> Am I right in understanding that SIE itself won't enable (host) interrupts
> while running the guest, and so it *needs* to be run with interrupts already
> enabled?
yes
>
>> One thing to be aware of: in our entry.S - after an interrupt - we leave SIE by
>> setting the return address of the interrupt after the sie instruction so that we
>> get back into this __vcpu_run loop to check for signals and so.
>
> Just to check, that's after the IRQ handler runs, right?
and yes.
Powered by blists - more mailing lists