[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <54E99A8F.1080803@numascale.com>
Date: Sun, 22 Feb 2015 16:59:59 +0800
From: Daniel J Blueman <daniel@...ascale.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>,
Ingo Molnar <mingo@...nel.org>
CC: Rafael David Tinoco <inaddy@...ntu.com>,
Peter Anvin <hpa@...or.com>,
Jiang Liu <jiang.liu@...ux.intel.com>,
Peter Zijlstra <peterz@...radead.org>,
LKML <linux-kernel@...r.kernel.org>,
Jens Axboe <axboe@...nel.dk>,
Frederic Weisbecker <fweisbec@...il.com>,
Gema Gomez <gema.gomez-solano@...onical.com>,
Christopher Arges <chris.j.arges@...onical.com>,
the arch/x86 maintainers <x86@...nel.org>
Subject: Re: smp_call_function_single lockups
On Saturday, February 21, 2015 at 3:50:05 AM UTC+8, Ingo Molnar wrote:
> * Linus Torvalds <torvalds@...ux-foundation.org> wrote:
>
> > On Fri, Feb 20, 2015 at 1:30 AM, Ingo Molnar <mingo@...nel.org> wrote:
> > >
> > > So if my memory serves me right, I think it was for
> > > local APICs, and even there mostly it was a performance
> > > issue: if an IO-APIC sent more than 2 IRQs per 'level'
> > > to a local APIC then the IO-APIC might be forced to
> > > resend those IRQs, leading to excessive message traffic
> > > on the relevant hardware bus.
> >
> > Hmm. I have a distinct memory of interrupts actually
> > being lost, but I really can't find anything to support
> > that memory, so it's probably some drug-induced confusion
> > of mine. I don't find *anything* about interrupt "levels"
> > any more in modern Intel documentation on the APIC, but
> > maybe I missed something. But it might all have been an
> > IO-APIC thing.
>
> So I just found an older discussion of it:
>
>
http://www.gossamer-threads.com/lists/linux/kernel/1554815?do=post_view_threaded#1554815
>
> while it's not a comprehensive description, it matches what
> I remember from it: with 3 vectors within a level of 16
> vectors we'd get excessive "retries" sent by the IO-APIC
> through the (then rather slow) APIC bus.
>
> ( It was possible for the same phenomenon to occur with
> IPIs as well, when a CPU sent an APIC message to another
> CPU, if the affected vectors were equal modulo 16 - but
> this was rare IIRC because most systems were dual CPU so
> only two IPIs could have occured. )
>
> > Well, the attached patch for that seems pretty trivial.
> > And seems to work for me (my machine also defaults to
> > x2apic clustered mode), and allows the APIC code to start
> > doing a "send to specific cpu" thing one by one, since it
> > falls back to the send_IPI_mask() function if no
> > individual CPU IPI function exists.
> >
> > NOTE! There's a few cases in
> > arch/x86/kernel/apic/vector.c that also do that
> > "apic->send_IPI_mask(cpumask_of(i), .." thing, but they
> > aren't that important, so I didn't bother with them.
> >
> > NOTE2! I've tested this, and it seems to work, but maybe
> > there is something seriously wrong. I skipped the
> > "disable interrupts" part when doing the "send_IPI", for
> > example, because I think it's entirely unnecessary for
> > that case. But this has certainly *not* gotten any real
> > stress-testing.
> I'm not so sure about that aspect: I think disabling IRQs
> might be necessary with some APICs (if lower levels don't
> disable IRQs), to make sure the 'local APIC busy' bit isn't
> set:
>
> we typically do a wait_icr_idle() call before sending an
> IPI - and if IRQs are not off then the idleness of the APIC
> might be gone. (Because a hardirq that arrives after a
> wait_icr_idle() but before the actual IPI sending sent out
> an IPI and the queue is full.)
The Intel SDM [1] and AMD F15h BKDG [2] state that IPIs are queued, so
the wait_icr_idle() polling is only necessary on PPro and older, and
maybe then to avoid delivery retry. This unnecessarily ties up the IPI
caller, so we bypass the polling in the Numachip APIC driver IPI-to-self
path.
On Linus's earlier point, with the large core counts on Numascale
systems, I previously implemented a shortcut to allow single IPIs to
bypass all the cpumask generation and walking; it's way down on my list,
but I'll see if I can generalise and present a patch series at some
point if interested?
Dan
-- [1] Intel SDM 3, p10-30
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-system-programming-manual-325384.pdf
If more than one interrupt is generated with the same vector number, the
local APIC can set the bit for the vector both in the IRR and the ISR.
This means that for the Pentium 4 and Intel Xeon processors, the IRR and
ISR can queue two interrupts for each interrupt vector: one in the IRR
and one in the ISR. Any additional interrupts issued for the same
interrupt vector are collapsed into the single bit in the IRR. For the
P6 family and Pentium processors, the IRR and ISR registers can queue no
more than two interrupts per interrupt vector and will reject other
interrupts that are received within the same vector.
-- [2] AMD Fam15h BKDG p470
http://support.amd.com/TechDocs/42301_15h_Mod_00h-0Fh_BKDG.pdf
DS: interrupt delivery status. Read-only. Reset: 0. In xAPIC mode this
bit is set to indicate that the interrupt has not yet been accepted by
the destination core(s). 0=Idle. 1=Send pending. Reserved in x2APIC
mode. Software may repeatedly write ICRL without polling the DS bit; all
requested IPIs will be delivered.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists