lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Yagoo7R8P5xVilsj@google.com>
Date:   Thu, 2 Dec 2021 02:00:03 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     Maxim Levitsky <mlevitsk@...hat.com>
Cc:     Paolo Bonzini <pbonzini@...hat.com>, Marc Zyngier <maz@...nel.org>,
        Huacai Chen <chenhuacai@...nel.org>,
        Aleksandar Markovic <aleksandar.qemu.devel@...il.com>,
        Paul Mackerras <paulus@...abs.org>,
        Anup Patel <anup.patel@....com>,
        Paul Walmsley <paul.walmsley@...ive.com>,
        Palmer Dabbelt <palmer@...belt.com>,
        Albert Ou <aou@...s.berkeley.edu>,
        Christian Borntraeger <borntraeger@...ibm.com>,
        Janosch Frank <frankja@...ux.ibm.com>,
        James Morse <james.morse@....com>,
        Alexandru Elisei <alexandru.elisei@....com>,
        Suzuki K Poulose <suzuki.poulose@....com>,
        Atish Patra <atish.patra@....com>,
        David Hildenbrand <david@...hat.com>,
        Cornelia Huck <cohuck@...hat.com>,
        Claudio Imbrenda <imbrenda@...ux.ibm.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>,
        linux-arm-kernel@...ts.infradead.org, kvmarm@...ts.cs.columbia.edu,
        linux-mips@...r.kernel.org, kvm@...r.kernel.org,
        kvm-ppc@...r.kernel.org, kvm-riscv@...ts.infradead.org,
        linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org,
        David Matlack <dmatlack@...gle.com>,
        Oliver Upton <oupton@...gle.com>,
        Jing Zhang <jingzhangos@...gle.com>,
        Wei Huang <wei.huang2@....com>
Subject: Re: [PATCH v2 11/43] KVM: Don't block+unblock when halt-polling is
 successful

On Thu, Dec 02, 2021, Maxim Levitsky wrote:
> On Tue, 2021-11-30 at 00:53 +0200, Maxim Levitsky wrote:
> > On Mon, 2021-11-29 at 20:18 +0100, Paolo Bonzini wrote:
> > Basically what I see that
> >  
> > 1. vCPU2 disables is_running in avic physical id cache
> > 2. vCPU2 checks that IRR is empty and it is
> > 3. vCPU2 does schedule();
> >  
> > and it keeps on sleeping forever. If I kick it via signal 
> > (like just doing 'info registers' qemu hmp command
> > or just stop/cont on the same hmp interface, the
> > vCPU wakes up and notices that IRR suddenly is not empty,
> > and the VM comes back to life (and then hangs after a while again
> > with the same problem....).
> >  
> > As far as I see in the traces, the bit in IRR came from
> > another VCPU who didn't respect the ir_running bit and didn't get 
> > AVIC_INCOMPLETE_IPI VMexit.
> > I can't 100% prove it yet, but everything in the trace shows this.

...

> I am now almost sure that this is errata #1235.
> 
> I had attached a kvm-unit-test I wrote (patch against master of
> https://gitlab.com/kvm-unit-tests/kvm-unit-tests.git/) which is able to
> reproduce the issue on stock 5.15.0 kernel (*no patches applied at all*)
> after just few seconds.  If kvm is loaded without halt-polling (that is
> halt_poll_ns=0 is used).
> 
> Halt polling and/or Sean's patch are not to blame, it just changes timeing.
> With Sean's patch I don't need to disable half polling.

Hmm, that suggests the bug/erratum is due to the CPU consuming stale data from #4
for the IsRunning check in #5, or retiring uops for the IsRunning check before
retiring the vIRR update.  It would be helpful if the erratum actually provided
info on the "highly specific and detailed set of internal timing conditions". :-/

  4. Lookup the vAPIC backing page address in the Physical APIC table using the
     guest physical APIC ID as an index into the table.
  5. For every valid destination:
     - Atomically set the appropriate IRR bit in each of the destinations’ vAPIC
       backing page.
     - Check the IsRunning status of each destination.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ