linux-kernel - Re: [RFC/PATCHv2] x86/irq: round-robin distribution of irqs to cpus w/in node

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <m1hbhalo1o.fsf@fess.ebiederm.org>
Date:	Mon, 27 Sep 2010 17:17:07 -0700
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Thomas Gleixner <tglx@...utronix.de>
Cc:	Arthur Kepner <akepner@....com>, linux-kernel@...r.kernel.org,
	x86@...nel.org
Subject: Re: [RFC/PATCHv2] x86/irq: round-robin distribution of irqs to cpus w/in node

Thomas Gleixner <tglx@...utronix.de> writes:

> On Mon, 27 Sep 2010, Arthur Kepner wrote:
>
>> On Mon, Sep 27, 2010 at 10:46:02PM +0200, Thomas Gleixner wrote:
>> > ...
>> > Sigh. Why is this a x86 specific problem ?
>> >
>> 
>> It's obviously not. But we're particularly seeing it on x86 
>> systems, so an x86-specific fix would address our problem.
>
> Even more sigh.

The fact that x86 has vectors probably doesn't help.

>> > If we setup an irq on a node then we should set the affinity to the
>> > target node in general. 
>> 
>> OK.
>> 
>> > .... The round robin inside the node is really not
>> > a problem unless you hit:
>> > 
>> >    nr_irqs_per_node * nr_cpus_per_node > max_vectors_per_cpu
>> > 
>> 
>> No, I don't think that's true. 
>> 
>> The problem we're seeing is that one driver asks for a large 
>> number of interrupts (on no CPU in particular). And because of the 
>
> It does it for a node, dammit. Otherwise your patch would be
> absolutely useless.

We derive a node from where the device is plugged in.  The driver
does not specify a node.

>> > > +               if ((node != -1) && alloc_cpumask_var(&tmp_mask, GFP_ATOMIC)) {
>
>> way that the vectors are initially assigned to CPUs (in 
>> __assign_irq_vector()), a particular CPU can have all its vectors 
>> consumed. 
>
> Stop selling me crap already.

The deep bug is that create_irq_nr allocates a vector (which it does
because at the time there was no better way to mark an irq in use on
x86).  In the case of msi-x we really don't know the node that irq is
going to be used on until we get a request irq.  We simply know which
node the device is on.

If you want to see what is going follow the call trace looks like.
pci_enable_msix 
  arch_setup_msi_irqs
    create_irq_nr

After pci_enable_msix is finished then the driver goes and makes all
of the irqs per cpu irqs.

There are goofy things that happen when hardware asks for 1 irq per cpu.
But since msi can ask for up to 4096 irqs (assuming the hardware
supports it) I can totally see putting all 256 of those irqs on a single
cpu, before you go to user space and let user space or something
reassign all of those irqs in a per cpu way.

My gut feel says that the real answer is to delay assigning a vector
to an irq until request_irq().  At which point we will know that someone
at least wants to use the irq.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/