[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <m14p5i2iwx.fsf@frodo.ebiederm.org>
Date: Mon, 18 Aug 2008 12:59:58 -0700
From: ebiederm@...ssion.com (Eric W. Biederman)
To: James Bottomley <James.Bottomley@...senPartnership.com>
Cc: Yinghai Lu <yhlu.kernel@...il.com>,
Alan Cox <alan@...rguk.ukuu.org.uk>,
"H. Peter Anvin" <hpa@...or.com>,
Jesse Barnes <jbarnes@...tuousgeek.org>,
Ingo Molnar <mingo@...e.hu>,
Thomas Gleixner <tglx@...utronix.de>,
Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org,
Andrew Vasquez <andrew.vasquez@...gic.com>
Subject: Re: [PATCH] pci: change msi-x vector to 32bit
James Bottomley <James.Bottomley@...senPartnership.com> writes:
> On Sat, 2008-08-16 at 15:17 -0700, Yinghai Lu wrote:
>> On Sat, Aug 16, 2008 at 1:45 PM, James Bottomley
>> <James.Bottomley@...senpartnership.com> wrote:
>> >> > What I still don't quite get is the benefit of large IRQ spaces ...
>> >> > particularly if you encode things the system doesn't really need to know
>> >> > in them.
>> >>
>> >> then set nr_irqs = nr_cpu_ids * NR_VECTORS))
>> >> and count down for msi/msi-x?
>> >
>> > No, what I mean is that msis can trip directly to CPUs, so this is an
>> > affinity thing (that MSI is directly bound to that CPU now), so in the
>> > matrixed way we display this in show_interrupts() with the CPU along the
>> > top and the IRQ down the side, it doesn't make sense to me to encode IRQ
>> > affinity in the irq number again. So it makes more sense to assign the
>> > vectors based on both the irq number and the CPU affinity so that if the
>> > PCI MSI for qla is assigned to CPU4 you can reassign it to CPU5 and so
>> > on.
>>
>> msi-x entry index, cpu_vector, irq number...
>>
>> you want to different cpus have same vector?
>
> Obviously I'm not communicating very well. Your apparent assumption is
> that irq number == vector.
Careful. There are two entities termed vector in this conversation.
There is the MSI-X vector which can hold up to 4096 entries per device.
There is the idt vector which has 256 entries per cpu.
> What I'm saying is that's not what we've
> done for individually vectored CPU interrupts in other architectures.
> In those we did (cpu no, irq) == vector. i.e. the affinity and the irq
> number identify the vector. For non-numa systems, this is effectively
> what you're interested in doing anyway. For numa systems, it just
> becomes a sparse matrix.
I believe assign_irq_vector on x86_64 and soon on x86_32 does this already.
The number that was being changed was the irq number of for the
msi-x ``vectors'' from some random free irq number to roughly
bus(8 bits):device+function(8 bits):msix-vector(12 bits) so that we
could have a stable irq number for msi irqs.
Once pci domain is considered it is hard to claim we have enough bits.
I expect we need at least pci domains to have one per NUMA node, in
the general case.
The big motivation for killing NR_IRQS sized arrays comes from 2 directions.
msi-x which allows up to 4096 irqs per device and nic vendors starting
to produce cards with 256 queues, and from large SGI systems that don't do
I/O and want to be supported with the same kernel build as smaller systems.
A kernel built to handle 4096*32 irqs which is more or less reasonable if
the system was I/O heavy is a ridiculously sized array on smaller machines.
So a static irq_desc is out. And since with the combination of msi-x hotplug
we can not tell how many irq sources and thus irq numbers the machine is going
to have we can not reasonably even have a dynamic array at boot time. Further
we also want to allocate the irq_desc entries in node-local memory on NUMA
machines for better performance. Which means we need to dynamically allocate
irq_desc entries and have some lookup mechanism from irq# to irq_desc entry.
So once we have all of that. It becomes possible to look at assigning a static
irq number to each pci (bus:device:function:msi-x vector) pair so the system
is more reproducible.
Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists