lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1219093158.3261.70.camel@localhost.localdomain>
Date:	Mon, 18 Aug 2008 15:59:18 -0500
From:	James Bottomley <James.Bottomley@...senPartnership.com>
To:	"Eric W. Biederman" <ebiederm@...ssion.com>
Cc:	Yinghai Lu <yhlu.kernel@...il.com>,
	Alan Cox <alan@...rguk.ukuu.org.uk>,
	"H. Peter Anvin" <hpa@...or.com>,
	Jesse Barnes <jbarnes@...tuousgeek.org>,
	Ingo Molnar <mingo@...e.hu>,
	Thomas Gleixner <tglx@...utronix.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org,
	Andrew Vasquez <andrew.vasquez@...gic.com>
Subject: Re: [PATCH] pci: change msi-x vector to 32bit

On Mon, 2008-08-18 at 12:59 -0700, Eric W. Biederman wrote:
> James Bottomley <James.Bottomley@...senPartnership.com> writes:
> 
> > On Sat, 2008-08-16 at 15:17 -0700, Yinghai Lu wrote:
> >> On Sat, Aug 16, 2008 at 1:45 PM, James Bottomley
> >> <James.Bottomley@...senpartnership.com> wrote:
> >> >> > What I still don't quite get is the benefit of large IRQ spaces ...
> >> >> > particularly if you encode things the system doesn't really need to know
> >> >> > in them.
> >> >>
> >> >> then set nr_irqs = nr_cpu_ids * NR_VECTORS))
> >> >> and count down for msi/msi-x?
> >> >
> >> > No, what I mean is that msis can trip directly to CPUs, so this is an
> >> > affinity thing (that MSI is directly bound to that CPU now), so in the
> >> > matrixed way we display this in show_interrupts() with the CPU along the
> >> > top and the IRQ down the side, it doesn't make sense to me to encode IRQ
> >> > affinity in the irq number again.   So it makes more sense to assign the
> >> > vectors based on both the irq number and the CPU affinity so that if the
> >> > PCI MSI for qla is assigned to CPU4 you can reassign it to CPU5 and so
> >> > on.
> >> 
> >> msi-x entry index, cpu_vector, irq number...
> >> 
> >> you want to different cpus have same vector?
> >
> > Obviously I'm not communicating very well.  Your apparent assumption is
> > that irq number == vector.  
> 
> Careful.  There are two entities termed vector in this conversation.
> There is the MSI-X vector which can hold up to 4096 entries per device.
> There is the idt vector which has 256 entries per cpu.

Yes ... I was trying to discriminate, but not very successfully.

Where I'm coming from is parisc CPUs have 64 CPU vectors (what you'd
call the idt in the local apic) and the bus hardware has a variable
number per bridge (These aren't MSI they're bridge triggered:  The way
PSI works is that bridge<->CPU is done as MSI).

Anyway, the argument about vector or matrix representation goes to the
core of what irq number represents below.

> > What I'm saying is that's not what we've
> > done for individually vectored CPU interrupts in other architectures.
> > In those we did (cpu no, irq) == vector.  i.e. the affinity and the irq
> > number identify the vector.  For non-numa systems, this is effectively
> > what you're interested in doing anyway.  For numa systems, it just
> > becomes a sparse matrix.
> 
> I believe assign_irq_vector on x86_64 and soon on x86_32 does this already.
> 
> The number that was being changed was the irq number of for the
> msi-x ``vectors'' from some random free irq number to roughly
> bus(8 bits):device+function(8 bits):msix-vector(12 bits) so that we
> could have a stable irq number for msi irqs.
> 
> Once pci domain is considered it is hard to claim we have enough bits.
> I expect we need at least pci domains to have one per NUMA node, in
> the general case.

Right ... I think the idea of trying to give some meaningful number to
irq is a lost cause.  Since IRQ is just a handle, I'd actually use
something like dev->bus_id and individual card functions rather than a
number (i.e. no longer exposing the raw number to user space).  Then
what irq number means and how it's used becomes an internal matter.

The only real concern I had was not wanting the per-cpu idt vectors to
be mangled into the irq number because they're essentially columns in
the sparse matrix with CPU across the top.

> The big motivation for killing NR_IRQS sized arrays comes from 2 directions.
> msi-x which allows up to 4096 irqs per device and nic vendors starting
> to produce cards with 256 queues, and from large SGI systems that don't do
> I/O and want to be supported with the same kernel build as smaller systems.
> A kernel built to handle 4096*32 irqs which is more or less reasonable if
> the system was I/O heavy is a ridiculously sized array on smaller machines.
> 
> So a static irq_desc is out.  And since with the combination of msi-x hotplug
> we can not tell how many irq sources and thus irq numbers the machine is going
> to have we can not reasonably even have a dynamic array at boot time.  Further
> we also want to allocate the irq_desc entries in node-local memory on NUMA
> machines for better performance.  Which means we need to dynamically allocate
> irq_desc entries and have some lookup mechanism from irq# to irq_desc entry.
> 
> So once we have all of that.  It becomes possible to look at assigning a static
> irq number to each pci (bus:device:function:msi-x vector) pair so the system
> is more reproducible.

Yes, agree with the above except the static irq number ... although once
that's hidden from the user, I suppose I really don't care any more.

James



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ