lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 11 Feb 2012 17:04:15 -0800
From:	Yinghai Lu <yinghai@...nel.org>
To:	linux-kernel@...r.kernel.org, mingo@...hat.com, hpa@...or.com,
	yinghai@...nel.org, torvalds@...ux-foundation.org,
	kexec@...ts.infradead.org, vgoyal@...hat.com,
	ebiederm@...ssion.com, akpm@...ux-foundation.org,
	tglx@...utronix.de, dzickus@...hat.com, mingo@...e.hu
Cc:	linux-tip-commits@...r.kernel.org
Subject: Re: [tip:x86/debug] x86/kdump: No need to disable ioapic/ lapic in
 crash path

On Sat, Feb 11, 2012 at 3:09 PM, tip-bot for Don Zickus
<dzickus@...hat.com> wrote:
> Commit-ID:  d9bc9be89629445758670220787683e37c93f6c1
> Gitweb:     http://git.kernel.org/tip/d9bc9be89629445758670220787683e37c93f6c1
> Author:     Don Zickus <dzickus@...hat.com>
> AuthorDate: Thu, 9 Feb 2012 16:53:41 -0500
> Committer:  Ingo Molnar <mingo@...e.hu>
> CommitDate: Sat, 11 Feb 2012 15:38:53 +0100
>
> x86/kdump: No need to disable ioapic/lapic in crash path
>
> A customer of ours noticed when their machine crashed, kdump did
> not work but hung instead.  Using their firmware dumping
> solution they grabbed a vmcore and decoded the stacks on the
> cpus.  What they noticed seemed to be a rare deadlock with the
> ioapic_lock.
>
>  CPU4:
>  machine_crash_shutdown
>  -> machine_ops.crash_shutdown
>    -> native_machine_crash_shutdown
>       -> kdump_nmi_shootdown_cpus ------> Send NMI to other CPUs
>       -> disable_IO_APIC
>          -> clear_IO_APIC
>             -> clear_IO_APIC_pin
>                -> ioapic_read_entry
>                   -> spin_lock_irqsave(&ioapic_lock, flags)
>                   ---Infinite loop here---
>
>  CPU0:
>  do_IRQ
>  -> handle_irq
>    -> handle_edge_irq
>        -> ack_apic_edge
>           -> move_native_irq
>               -> mask_IO_APIC_irq
>                  -> mask_IO_APIC_irq_desc
>                     -> spin_lock_irqsave(&ioapic_lock, flags)
>                     ---Receive NMI here after getting spinlock---
>                        -> nmi
>                           -> do_nmi
>                              -> crash_nmi_callback
>                              ---Infinite loop here---
>
> The problem is that although kdump tries to shutdown minimal
> hardware, it still needs to disable the IO APIC.  This requires
> spinlocks which may be held by another cpu.  This other cpu is
> being held infinitely in an NMI context by kdump in order to
> serialize the crashing path.  Instant deadlock.
>
> Eric brought up a point that because the boot code was
> restructured we may not need to disable the io apic any more in
> the crash path.  The original concern that led to the
> development of disable_IO_APIC, was that the jiffies calibration
> on boot up relied on the PIT timer for reference.  Access to the
> PIT required 8259 interrupts to be working.  This wouldn't work
> if the ioapic needed to be configured.  So on panic path, the
> ioapic was reconfigured to use virtual wire mode to allow the 8259 to passthrough.
>
> Those concerns don't hold true now, thanks to the jiffies
> calibration code not needing the PIT.  As a result, we can
> remove this call and simplify the locking needed in the panic
> path.
>
> The same work allowed us to remove the need to disable the local
> apic on shutdown too.  This should allow us to jump to the
> second a little faster.
>
> I tested kdump on an Ivy Bridge platform, a Pentium4 and an old
> athlon that did not have an ioapic.  All three were successful.
>
> I also tested using lkdtm that would use jprobes to panic the
> system when entering do_IRQ.  The idea was to see how the system
> reacted with an interrupt pending in the second kernel.  My
> core2 quad successfully kdump'd 3 times in a row with no issues.
>
> v2: removed the disable lapic code too

with this commit, kdump is not working anymore on my setups with
Nehalem, Westmere, sandbridge.
these setup all have VT-d enabled.


After reverting this commit, kdump is working again.

So assume you need to drop this patch.

Thanks

Yinghai Lu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists