lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <458c0819a578ba854f00089bc312c8faa177a81a.camel@redhat.com>
Date:   Tue, 30 Nov 2021 00:53:37 +0200
From:   Maxim Levitsky <mlevitsk@...hat.com>
To:     Paolo Bonzini <pbonzini@...hat.com>,
        Sean Christopherson <seanjc@...gle.com>
Cc:     Marc Zyngier <maz@...nel.org>, Huacai Chen <chenhuacai@...nel.org>,
        Aleksandar Markovic <aleksandar.qemu.devel@...il.com>,
        Paul Mackerras <paulus@...abs.org>,
        Anup Patel <anup.patel@....com>,
        Paul Walmsley <paul.walmsley@...ive.com>,
        Palmer Dabbelt <palmer@...belt.com>,
        Albert Ou <aou@...s.berkeley.edu>,
        Christian Borntraeger <borntraeger@...ibm.com>,
        Janosch Frank <frankja@...ux.ibm.com>,
        James Morse <james.morse@....com>,
        Alexandru Elisei <alexandru.elisei@....com>,
        Suzuki K Poulose <suzuki.poulose@....com>,
        Atish Patra <atish.patra@....com>,
        David Hildenbrand <david@...hat.com>,
        Cornelia Huck <cohuck@...hat.com>,
        Claudio Imbrenda <imbrenda@...ux.ibm.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>,
        linux-arm-kernel@...ts.infradead.org, kvmarm@...ts.cs.columbia.edu,
        linux-mips@...r.kernel.org, kvm@...r.kernel.org,
        kvm-ppc@...r.kernel.org, kvm-riscv@...ts.infradead.org,
        linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org,
        David Matlack <dmatlack@...gle.com>,
        Oliver Upton <oupton@...gle.com>,
        Jing Zhang <jingzhangos@...gle.com>,
        Wei Huang <wei.huang2@....com>
Subject: Re: [PATCH v2 11/43] KVM: Don't block+unblock when halt-polling is
 successful

On Mon, 2021-11-29 at 20:18 +0100, Paolo Bonzini wrote:
> On 11/29/21 19:55, Sean Christopherson wrote:
> > > Still it does seem to be a race that happens when IS_RUNNING=true but
> > > vcpu->mode == OUTSIDE_GUEST_MODE.  This patch makes the race easier to
> > > trigger because it moves IS_RUNNING=false later.
> > 
> > Oh!  Any chance the bug only repros with preemption enabled?  That would explain
> > why I don't see problems, I'm pretty sure I've only run AVIC with a PREEMPT=n.
> 
> Me too.
> 
> > svm_vcpu_{un}blocking() are called with preemption enabled, and avic_set_running()
> > passes in vcpu->cpu.  If the vCPU is preempted and scheduled in on a different CPU,
> > avic_vcpu_load() will overwrite the vCPU's entry with the wrong CPU info.
> 
> That would make a lot of sense.  avic_vcpu_load() can handle 
> svm->avic_is_running = false, but avic_set_running still needs its body 
> wrapped by preempt_disable/preempt_enable.
> 
> Fedora's kernel is CONFIG_PREEMPT_VOLUNTARY, but I know Maxim uses his 
> own build so it would not surprise me if he used CONFIG_PREEMPT=y.
> 
> Paolo
> 

I will write ll the details tomorrow but I strongly suspect the CPU errata 
https://developer.amd.com/wp-content/resources/56323-PUB_0.78.pdf
#1235
 
Basically what I see that
 
1. vCPU2 disables is_running in avic physical id cache
2. vCPU2 checks that IRR is empty and it is
3. vCPU2 does schedule();
 
and it keeps on sleeping forever. If I kick it via signal 
(like just doing 'info registers' qemu hmp command
or just stop/cont on the same hmp interface, the
vCPU wakes up and notices that IRR suddenly is not empty,
and the VM comes back to life (and then hangs after a while again
with the same problem....).
 
As far as I see in the traces, the bit in IRR came from
another VCPU who didn't respect the ir_running bit and didn't get 
AVIC_INCOMPLETE_IPI VMexit.
I can't 100% prove it yet, but everything in the trace shows this.
 
About the rest of the environment, currently I reproduce this in
a VM which has no pci passed through devices at all, just AVIC.
(I wasn't able to reproduce it before just because I forgot to
enable AVIC in this configuration).
 
So I also agree that Sean's patch is not to blame here,
it just made the window between setting is_running and getting to sleep
shorter and made it less likely that other vCPUs will pick up the is_running change.
(I suspect that they pick it up on next vmrun, and otherwise the value is somehow
cached wrongfully in them).
 
A very performance killing workaround of kicking all vCPUs when one of them enters vcpu_block
does seem to work for me but it skews all the timing off so I can't prove it.
 
That is all, I will write more detailed info, including some traces I have.
 
I do use windows 10 with so called LatencyMon in it, which shows overall how
much latency hardware interrupts have, which used to be useful for me to
ensure that my VMs are suitable for RT like latency (even before I joined RedHat,
I tuned my VMs as much as I could to make my Rift CV1 VR headset work well which 
needs RT like latencies.
 
These days VR works fine in my VMs anyway, but I still kept this tool to keep an eye on it).
 
I really need to write a kvm unit test to stress test IPIs, especially this case,
I will do this very soon.
 
 
Wei Huang, any info on this would be very helpful. 
 
Maybe putting the avic physical table in UC memory would help? 
Maybe ringing doorbells of all other vcpus will help them notice the change?

Best regards,
	Maxim Levitsky

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ