[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFykg3SAO16=NRiC+tP1gGj5hgbu+Y93ss4Qg30+qyZ=+w@mail.gmail.com>
Date: Tue, 31 Mar 2015 08:08:40 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Chris J Arges <chris.j.arges@...onical.com>
Cc: Rafael David Tinoco <inaddy@...ntu.com>,
Ingo Molnar <mingo@...nel.org>, Peter Anvin <hpa@...or.com>,
Jiang Liu <jiang.liu@...ux.intel.com>,
Peter Zijlstra <peterz@...radead.org>,
LKML <linux-kernel@...r.kernel.org>,
Jens Axboe <axboe@...nel.dk>,
Frederic Weisbecker <fweisbec@...il.com>,
Gema Gomez <gema.gomez-solano@...onical.com>,
"the arch/x86 maintainers" <x86@...nel.org>
Subject: Re: smp_call_function_single lockups
On Mon, Mar 30, 2015 at 8:15 PM, Chris J Arges
<chris.j.arges@...onical.com> wrote:
>
> I modified the posted patch with the following:
Actually, in addition to Ingo's patches (and the irq printout), which
you should try first, if none of that really gives any different
behavior, can modify that ack_APIC_irq() debugging code a bit more:
> diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
> index bf32309..dc3e192 100644
> --- a/arch/x86/include/asm/apic.h
> +++ b/arch/x86/include/asm/apic.h
> @@ -441,7 +441,7 @@ static inline void ack_APIC_irq(int vector)
> if (vector >= 16) {
> unsigned v = apic_read(APIC_ISR + ((vector & ~0x1f) >> 1));
> v >>= vector & 0x1f;
> - WARN_ON_ONCE(!(v & 1));
> + WARN(!(v & 1), "ack_APIC_irq: vector = %0x\n", vector);
> }
> /*
> * ack_APIC_irq() actually gets compiled as a single instruction
So what I'd suggest doing is:
- change the test of "vector >= 16" to just "vector >= 0".
We still have "-1" as the "unknown vector" thing, but I think only
the ack_bad_irq() thing calls it, and that should print out its own
message if it ever triggers, so it isn't an issue.
The reason for the ">= 16" was kind of bogus - the first 16 vectors
are system vectors, but we definitely shouldn't ack the apic for such
vectors anyway, so giving a warning for them is very much appropriate.
In particular, vector 2 is NMI, and maybe we do ACk it incorrectly.
- add a "return" if the warning triggers, and simply don't do the
actual ACK cycle at all if the ISR bit is clear.
IOW, make it do "if (WARN(..)) return;"
Now, we might get the vector number wrong for some reason, and in that
case not ACK'ing at all might cause problems too, but it would be
interesting to see if it changes behavior wrt the lockup.
I don't have any other ideas at the moment, but hopefully the
suggested changes by me and Ingo will give some more data to go on and
clarify what might be going on.
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists