lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 24 Dec 2013 12:41:02 +0800
From:	rui wang <>
To:	Prarit Bhargava <>
Cc:, Thomas Gleixner <>,
	Ingo Molnar <>,
	"H. Peter Anvin" <>,,
	Michel Lespinasse <>,
	Andi Kleen <>,
	Seiji Aguchi <>,
	Yang Zhang <>,
	Paul Gortmaker <>,,
Subject: Re: [PATCH] x86, Fix do_IRQ interrupt warning for cpu hotplug
 retriggered irqs

On 12/23/13, Prarit Bhargava <> wrote:
> On 12/23/2013 04:41 AM, rui wang wrote:
>> On 12/2/13, Prarit Bhargava <> wrote:
>>> Bugzilla:
>>> When downing a cpu it is possible that there are unhandled irqs left in
>>> the APIC IRR register.  fixup_irqs() goes through the IRR and retriggers
>>> the IRQs left in the APIC IRR.  After this, the vector for the irq is
>>> set
>>> to -1.  There is a possibility here, however, that the CPU does handle
>>> an
>>> irq in the IRR and then calls the vector.
>> The patch does not seem to root-cause the problem. It seems to hide
>> the real problem.
>> It is not possible that a device-triggered irq can arrive to this cpu
>> again after fixup_irqs() fills its vector_irq[vector] to -1, because
>> we've done the following:
>> 1. We disabled interrupt on this cpu in stop_machine().
>> 2. We called irq_set_affinity() to exclude this cpu as a target for the
>> irq.
>> 3. We checked APIC_IRR and re-triggered any pending irqs to other cpus.
> ... and we set the IRQ handler to -1 for the down'd cpu.
> Rui, I think you're right up to here but I think this has nothing to do with
> or locking.
> I assumed that the issue I was trying to fix was long standing and
> well-known
> within the kernel given some of the comments I had read here-and-there
> about
> people seeing the do_IRQ errors on LKML.  There have long been reports of
> the
> do_IRQ warning output during cpu down.
> Here's what the issue is after step 3 above...
> 4.  The APIC_IRR is still *set* in the down'd cpu with IRQs disabled.
> 5.  We continue executing the stop_machine "down" portion of the code, then
> continue executing in normal context the "die" code (ie, __cpu_die()).
> IRQ disable only pertains stop_machine down.  So after we leave that
> context,
> IRR will still execute.  While the kernel is spinning in cpu_die(), the
> down'd
> cpu attempts to execute handler for IRQ in IRR ... and can't find one
> because
> we've set it to -1.  So we see the warning.
> A few additional debug points:
> 1.  I put a printk in fixup_irq when we call the irq_retrigger on another
> cpu
> that dumps the the down'd CPU and IRQ # in fixup_irqs().  I see that printk
> *EVERYTIME* I see the do_IRQ warning.
> 2.  The do_IRQ warning *always* appears before I see the offline message
> ...
> [  148.656016] Broke affinity for irq 634
> [  148.660493] Broke affinity for irq 698
> [  148.665739] kvm: disabling virtualization on CPU58
> [  148.666732] PRARIT: 58.208 IRR entry ... irq_retrigger call.
> at this point we've left the stop_machine() code and we're now continuing
> to
> execute ... then we hit the cpu_die() ... which spins.
> [  148.671106] do_IRQ: 58.208 No irq handler for vector (irq -1)
> [  148.677544] smpboot: CPU 58 is now offline
> I think I have root caused this to the IRR being set in the down'd cpu.  It
> is
> admittedly a rare occurrence in the kernel.  I usually have to run about
> 1000 up
> and down's before hitting it, however, on my current test system it seems to
> hit
> much more frequently, almost 1 in 64 times.

If that's the case, then it means stop_machine() doesn't manage to
clear the IRR and ISR bits. But why not? Since this cpu is down it's
not supposed to handle any further interrupts. IMHO we're supposed to
send EOIs repeatedly until all the APIC_IRR and APIC_ISR bits are
cleared. If an IRR bit is set, it means that there's (maybe another
vector) an APIC_ISR bit set with the highest interrupt priority.
Sending an EOI clears the highest priority APIC_ISR bit, so the LAPIC
will then clear the next highest priority IRR bit and set the
corresponding ISR bit... We can repeat the process. It's like handling
interrups in polled mode. That's the right thing to do IMHO.

The other unanswered question is why isn't cpu_online_mask() protected
by a spin lock ? Being atomic isn't enough.

Are these all well-known issues? Are there well-known answers already?

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at
Please read the FAQ at

Powered by blists - more mailing lists