linux-kernel - Re: [PATCH] x86, Fix do_IRQ interrupt warning for cpu hotplug retriggered irqs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 25 Dec 2013 16:22:00 +0800
From:	rui wang <ruiv.wang@...il.com>
To:	Prarit Bhargava <prarit@...hat.com>
Cc:	linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>, x86@...nel.org,
	Michel Lespinasse <walken@...gle.com>,
	Andi Kleen <ak@...ux.intel.com>,
	Seiji Aguchi <seiji.aguchi@....com>,
	Yang Zhang <yang.z.zhang@...el.com>,
	Paul Gortmaker <paul.gortmaker@...driver.com>,
	janet.morgan@...el.com, tony.luck@...el.com
Subject: Re: [PATCH] x86, Fix do_IRQ interrupt warning for cpu hotplug
 retriggered irqs

On 12/24/13, Prarit Bhargava <prarit@...hat.com> wrote:
>
>
> On 12/23/2013 11:41 PM, rui wang wrote:
>> On 12/23/13, Prarit Bhargava <prarit@...hat.com> wrote:
>> > I think I have root caused this to the IRR being set in the down'd cpu.
>> > It
>> > is
>> > admittedly a rare occurrence in the kernel.  I usually have to run
>> > about
>> > 1000 up
>> > and down's before hitting it, however, on my current test system it
>> > seems to
>> > hit
>> > much more frequently, almost 1 in 64 times.
>> >
>>>
>>
>> If that's the case, then it means stop_machine() doesn't manage to
>> clear the IRR and ISR bits. But why not? Since this cpu is down it's
>> not supposed to handle any further interrupts.
>
> ISR bits can be cleared by software (via EOI).  IRR bits CANNOT be cleared
> by
> software.  Once IRR bits are set the only thing that can clear them is the
> processor itself.
>
> IMHO we're supposed to
>> send EOIs repeatedly until all the APIC_IRR and APIC_ISR bits are
>> cleared. If an IRR bit is set, it means that there's (maybe another
>> vector) an APIC_ISR bit set with the highest interrupt priority
>
> I don't think that is quite right (admittedly I may be wrong on this and
> I'll
> have to dig out an Intel spec to double check).  As I mentioned above the
> ISR
> can be cleared but the IRR cannot be cleared.
>
> My understanding (also other's understanding on clearing of the IRR) is
> that
> only the processor can clear the IRR.  See arch/x86/kernel/apic/apic.c line
> 1352
> and the code below it.  Because of kexec boot, we "drain" the IRR by
> waiting.  I
> remember talking to the people who wrote that code about the IRR and the
> inability to affect a write to clear requested and pending IRQs.
>
>> Sending an EOI clears the highest priority APIC_ISR bit, so the LAPIC
>> will then clear the next highest priority IRR bit and set the
>> corresponding ISR bit...
>
> But the issue here is that the IRR has been set, but the processor is yet
> to
> execute the irq's handler.  That's what makes this error so difficult to
> hit.
> It is a race with the hardware to catch the processor in exactly this
> state.
>
> We can repeat the process. It's like handling
>> interrups in polled mode. That's the right thing to do IMHO.
>
> But the IRR register bits are still set and once the processor is out of
> interrupt context (ie, in the cpu_die() code) the processor will attempt to
> access the vector for that IRR bit and we'll still get a do_IRQ() warning.
> That
> is different than handling the ISR.
>
> [Aside: Sorry Rui, but it feels like I repeated myself several times about
> the
> IRR :).  I mean no disrespect towards you by that :) as I'm just answering
> each
> of your points and pointing out that the IRR is the problem here.]
>

I just did some experiments. Yes you are right. The IRR is hard to
clear. I thought the IRR could transfer into an ISR when there's no
higher priority irq pending. But it's not designed in that way.

>>
>> The other unanswered question is why isn't cpu_online_mask() protected
>> by a spin lock ? Being atomic isn't enough.
>
> Perhaps it is thought that the cpu_online_mask() can only be changed by the
> cpu
> hotplug code and that is done via stop_machine() and protected by
> cpu_maps_update_begin() and cpu_maps_update_done() in cpu_down()?  So an
> additional lock is not necessary?
>

Yes. I missed that part.

Thanks
Rui

>>
>> Are these all well-known issues? Are there well-known answers already?
>
> I thought the bug was well-known because of the number of do_IRQ warning
> reports
> I stumbled across while searching for a solution.  But as the code says in
> fixup_irqs()
>
>         /*
>          * We can remove mdelay() and then send spuriuous interrupts to
>          * new cpu targets for all the irqs that were handled previously by
>          * this cpu. While it works, I have seen spurious interrupt
> messages
>          * (nothing wrong but still...).
>
> so I don't think anyone has realized what the bug actually was.
> 	

Yes that comment was what triggered me to think that the issue wasn't
understood. You now have a clear enough explanation.

Thanks
Rui
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/