[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <m1hbhalo1o.fsf@fess.ebiederm.org>
Date: Mon, 27 Sep 2010 17:17:07 -0700
From: ebiederm@...ssion.com (Eric W. Biederman)
To: Thomas Gleixner <tglx@...utronix.de>
Cc: Arthur Kepner <akepner@....com>, linux-kernel@...r.kernel.org,
x86@...nel.org
Subject: Re: [RFC/PATCHv2] x86/irq: round-robin distribution of irqs to cpus w/in node
Thomas Gleixner <tglx@...utronix.de> writes:
> On Mon, 27 Sep 2010, Arthur Kepner wrote:
>
>> On Mon, Sep 27, 2010 at 10:46:02PM +0200, Thomas Gleixner wrote:
>> > ...
>> > Sigh. Why is this a x86 specific problem ?
>> >
>>
>> It's obviously not. But we're particularly seeing it on x86
>> systems, so an x86-specific fix would address our problem.
>
> Even more sigh.
The fact that x86 has vectors probably doesn't help.
>> > If we setup an irq on a node then we should set the affinity to the
>> > target node in general.
>>
>> OK.
>>
>> > .... The round robin inside the node is really not
>> > a problem unless you hit:
>> >
>> > nr_irqs_per_node * nr_cpus_per_node > max_vectors_per_cpu
>> >
>>
>> No, I don't think that's true.
>>
>> The problem we're seeing is that one driver asks for a large
>> number of interrupts (on no CPU in particular). And because of the
>
> It does it for a node, dammit. Otherwise your patch would be
> absolutely useless.
We derive a node from where the device is plugged in. The driver
does not specify a node.
>> > > + if ((node != -1) && alloc_cpumask_var(&tmp_mask, GFP_ATOMIC)) {
>
>> way that the vectors are initially assigned to CPUs (in
>> __assign_irq_vector()), a particular CPU can have all its vectors
>> consumed.
>
> Stop selling me crap already.
The deep bug is that create_irq_nr allocates a vector (which it does
because at the time there was no better way to mark an irq in use on
x86). In the case of msi-x we really don't know the node that irq is
going to be used on until we get a request irq. We simply know which
node the device is on.
If you want to see what is going follow the call trace looks like.
pci_enable_msix
arch_setup_msi_irqs
create_irq_nr
After pci_enable_msix is finished then the driver goes and makes all
of the irqs per cpu irqs.
There are goofy things that happen when hardware asks for 1 irq per cpu.
But since msi can ask for up to 4096 irqs (assuming the hardware
supports it) I can totally see putting all 256 of those irqs on a single
cpu, before you go to user space and let user space or something
reassign all of those irqs in a per cpu way.
My gut feel says that the real answer is to delay assigning a vector
to an irq until request_irq(). At which point we will know that someone
at least wants to use the irq.
Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists