lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.21.1907012011460.1802@nanos.tec.linutronix.de>
Date:   Tue, 2 Jul 2019 00:41:40 +0200 (CEST)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Rong Chen <rong.a.chen@...el.com>
cc:     Feng Tang <feng.tang@...el.com>, x86@...nel.org,
        LKML <linux-kernel@...r.kernel.org>,
        "H. Peter Anvin" <hpa@...or.com>,
        "tipbuild@...or.com" <tipbuild@...or.com>,
        "lkp@...org" <lkp@...org>, Ingo Molnar <mingo@...nel.org>,
        kvm@...r.kernel.org, Paolo Bonzini <pbonzini@...hat.com>,
        Radim Krčmář <rkrcmar@...hat.com>,
        Fenghua Yu <fenghua.yu@...el.com>
Subject: [BUG] kvm: APIC emulation problem - was Re: [LKP] [x86/hotplug]
 ...

Folks,

after chasing a 0-day test failure for a couple of days, I was finally able
to reproduce the issue.

Background:

   In preparation of supporting IPI shorthands I changed the CPU offline
   code to software disable the local APIC instead of just masking it.
   That's done by clearing the APIC_SPIV_APIC_ENABLED bit in the APIC_SPIV
   register.

Failure:

   When the CPU comes back online the startup code triggers occasionally
   the warning in apic_pending_intr_clear(). That complains that the IRRs
   are not empty.

   The offending vector is the local APIC timer vector who's IRR bit is set
   and stays set.

It took me quite some time to reproduce the issue locally, but now I can
see what happens.

It requires apicv_enabled=0, i.e. full apic emulation. With apicv_enabled=1
(and hardware support) it behaves correctly.

Here is the series of events:

    Guest CPU

    goes down

      native_cpu_disable()		

	apic_soft_disable();

    play_dead()

    ....

    startup()

      if (apic_enabled())
        apic_pending_intr_clear()	<- Not taken

     enable APIC

        apic_pending_intr_clear()	<- Triggers warning because IRR is stale

When this happens then the deadline timer or the regular APIC timer -
happens with both, has fired shortly before the APIC is disabled, but the
interrupt was not serviced because the guest CPU was in an interrupt
disabled region at that point.

The state of the timer vector ISR/IRR bits:

    	     	       	      ISR     IRR
before apic_soft_disable()    0	      1
after apic_soft_disable()     0	      1

On startup		      0	      1

Now one would assume that the IRR is cleared after the INIT reset, but this
happens only on CPU0.

Why?

Because our CPU0 hotplug is just for testing to make sure nothing breaks
and goes through an NMI wakeup vehicle because INIT would send it through
the boots-trap code which is not really working if that CPU was not
physically unplugged.

Now looking at a real world APIC the situation in that case is:

    	     	       	      ISR     IRR
before apic_soft_disable()    0	      1
after apic_soft_disable()     0	      1

On startup		      0	      0

Why?

Once the dying CPU reenables interrupts the pending interrupt gets
delivered as a spurious interupt and then the state is clear.

While that CPU0 hotplug test case is surely an esoteric issue, the APIC
emulation is still wrong, Even if the play_dead() code would not enable
interrupts then the pending IRR bit would turn into an ISR .. interrupt
when the APIC is reenabled on startup.

Thanks,

	tglx

 




	
     
    
   

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ