[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ae1a42ab-f719-4a4e-8d2a-e2b4fa6e9580@linux.ibm.com>
Date: Fri, 14 Jan 2022 13:29:46 +0100
From: Christian Borntraeger <borntraeger@...ux.ibm.com>
To: Mark Rutland <mark.rutland@....com>
Cc: linux-kernel@...r.kernel.org, aleksandar.qemu.devel@...il.com,
alexandru.elisei@....com, anup.patel@....com,
aou@...s.berkeley.edu, atish.patra@....com,
benh@...nel.crashing.org, bp@...en8.de, catalin.marinas@....com,
chenhuacai@...nel.org, dave.hansen@...ux.intel.com,
david@...hat.com, frankja@...ux.ibm.com, frederic@...nel.org,
gor@...ux.ibm.com, hca@...ux.ibm.com, imbrenda@...ux.ibm.com,
james.morse@....com, jmattson@...gle.com, joro@...tes.org,
kvm@...r.kernel.org, maz@...nel.org, mingo@...hat.com,
mpe@...erman.id.au, nsaenzju@...hat.com, palmer@...belt.com,
paulmck@...nel.org, paulus@...ba.org, paul.walmsley@...ive.com,
pbonzini@...hat.com, seanjc@...gle.com, suzuki.poulose@....com,
tglx@...utronix.de, tsbogend@...ha.franken.de, vkuznets@...hat.com,
wanpengli@...cent.com, will@...nel.org
Subject: Re: [PATCH 0/5] kvm: fix latent guest entry/exit bugs
Am 14.01.22 um 13:19 schrieb Mark Rutland:
> On Thu, Jan 13, 2022 at 04:20:07PM +0100, Christian Borntraeger wrote:
>>
>>
>> Am 11.01.22 um 16:35 schrieb Mark Rutland:
>>> Several architectures have latent bugs around guest entry/exit, most
>>> notably:
>>>
>>> 1) Several architectures enable interrupts between guest_enter() and
>>> guest_exit(). As this period is an RCU extended quiescent state (EQS) this
>>> is unsound unless the irq entry code explicitly wakes RCU, which most
>>> architectures only do for entry from usersapce or idle.
>>>
>>> I believe this affects: arm64, riscv, s390
>>>
>>> I am not sure about powerpc.
>>>
>>> 2) Several architectures permit instrumentation of code between
>>> guest_enter() and guest_exit(), e.g. KASAN, KCOV, KCSAN, etc. As
>>> instrumentation may directly o indirectly use RCU, this has the same
>>> problems as with interrupts.
>>>
>>> I believe this affects: arm64, mips, powerpc, riscv, s390
>>>
>>> 3) Several architectures do not inform lockdep and tracing that
>>> interrupts are enabled during the execution of the guest, or do so in
>>> an incorrect order. Generally
>>> this means that logs will report IRQs being masked for much longer
>>> than is actually the case, which is not ideal for debugging. I don't
>>> know whether this affects the correctness of lockdep.
>>>
>>> I believe this affects: arm64, mips, powerpc, riscv, s390
>>>
>>> This was previously fixed for x86 specifically in a series of commits:
>>>
>>> 87fa7f3e98a1310e ("x86/kvm: Move context tracking where it belongs")
>>> 0642391e2139a2c1 ("x86/kvm/vmx: Add hardirq tracing to guest enter/exit")
>>> 9fc975e9efd03e57 ("x86/kvm/svm: Add hardirq tracing on guest enter/exit")
>>> 3ebccdf373c21d86 ("x86/kvm/vmx: Move guest enter/exit into .noinstr.text")
>>> 135961e0a7d555fc ("x86/kvm/svm: Move guest enter/exit into .noinstr.text")
>>> 160457140187c5fb ("KVM: x86: Defer vtime accounting 'til after IRQ handling")
>>> bc908e091b326467 ("KVM: x86: Consolidate guest enter/exit logic to common helpers")
>>>
>>> But other architectures were left broken, and the infrastructure for
>>> handling this correctly is x86-specific.
>>>
>>> This series introduces generic helper functions which can be used to
>>> handle the problems above, and migrates architectures over to these,
>>> fixing the latent issues.
>>>
>>> I wasn't able to figure my way around powerpc and s390, so I have not
>>
>> I think 2 later patches have moved the guest_enter/exit a bit out.
>> Does this make the s390 code clearer?
>
> Yes; that's much simpler to follow!
>
> One major thing I wasn't sure about for s390 is the sequence:
>
> guest_enter_irqoff(); // Enters an RCU EQS
> ...
> local_irq_enable();
> ...
> sie64a(...);
> ...
> local_irq_disable();
> ...
> guest_exit_irqoff(); // Exits an RCU EQS
>
> ... since if an IRQ is taken between local_irq_{enable,disable}(), RCU won't be
> watching, and I couldn't spot whether your regular IRQ entry logic would wake
> RCU in this case, or whether there was something else I'm missing that saves
> you here.
>
> For other architectures, including x86 and arm64, we enter the guest with IRQs
> masked and return from the guest with IRQs masked, and don't actually take IRQs
> until we unmask them in the host, after the guest_exit_*() logic has woken RCU
> and so on.
>
> I wasn't able to find documentation on the semantics of SIE, so I couldn't spot
> whether the local_irq_{enable,disable}() calls were necessary, or could be
> removed.
We run the SIE instruction with interrupts enabled. SIE is interruptible.
The disable/enable pairs are just because guest_enter/exit_irqoff() require them.
One thing to be aware of: in our entry.S - after an interrupt - we leave SIE by
setting the return address of the interrupt after the sie instruction so that we
get back into this __vcpu_run loop to check for signals and so.
>
> Thanks,
> Mark.
>
>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>> index 577f1ead6a51..5859207c2cc0 100644
>> --- a/arch/s390/kvm/kvm-s390.c
>> +++ b/arch/s390/kvm/kvm-s390.c
>> @@ -4145,10 +4145,6 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>> * As PF_VCPU will be used in fault handler, between
>> * guest_enter and guest_exit should be no uaccess.
>> */
>> - local_irq_disable();
>> - guest_enter_irqoff();
>> - __disable_cpu_timer_accounting(vcpu);
>> - local_irq_enable();
>> if (kvm_s390_pv_cpu_is_protected(vcpu)) {
>> memcpy(sie_page->pv_grregs,
>> vcpu->run->s.regs.gprs,
>> @@ -4156,8 +4152,16 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>> }
>> if (test_cpu_flag(CIF_FPU))
>> load_fpu_regs();
>> + local_irq_disable();
>> + __disable_cpu_timer_accounting(vcpu);
>> + guest_enter_irqoff();
>> + local_irq_enable();
>> exit_reason = sie64a(vcpu->arch.sie_block,
>> vcpu->run->s.regs.gprs);
>> + local_irq_disable();
>> + guest_exit_irqoff();
>> + __enable_cpu_timer_accounting(vcpu);
>> + local_irq_enable();
>> if (kvm_s390_pv_cpu_is_protected(vcpu)) {
>> memcpy(vcpu->run->s.regs.gprs,
>> sie_page->pv_grregs,
>> @@ -4173,10 +4177,6 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>> vcpu->arch.sie_block->gpsw.mask &= ~PSW_INT_MASK;
>> }
>> }
>> - local_irq_disable();
>> - __enable_cpu_timer_accounting(vcpu);
>> - guest_exit_irqoff();
>> - local_irq_enable();
>> vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
>> rc = vcpu_post_run(vcpu, exit_reason);
Powered by blists - more mailing lists