linux-kernel - Re: [PATCH] pci: change msi-x vector to 32bit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <86802c440808161334q75a7d019ofade0b6cabf3f74d@mail.gmail.com>
Date:	Sat, 16 Aug 2008 13:34:58 -0700
From:	"Yinghai Lu" <yhlu.kernel@...il.com>
To:	"James Bottomley" <James.Bottomley@...senpartnership.com>
Cc:	"Alan Cox" <alan@...rguk.ukuu.org.uk>,
	"H. Peter Anvin" <hpa@...or.com>,
	"Jesse Barnes" <jbarnes@...tuousgeek.org>,
	"Ingo Molnar" <mingo@...e.hu>,
	"Thomas Gleixner" <tglx@...utronix.de>,
	"Eric W. Biederman" <ebiederm@...ssion.com>,
	"Andrew Morton" <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org,
	"Andrew Vasquez" <andrew.vasquez@...gic.com>
Subject: Re: [PATCH] pci: change msi-x vector to 32bit

On Sat, Aug 16, 2008 at 1:25 PM, James Bottomley
<James.Bottomley@...senpartnership.com> wrote:
> On Sat, 2008-08-16 at 11:56 -0700, Yinghai Lu wrote:
>> On Sat, Aug 16, 2008 at 9:13 AM, James Bottomley
>> <James.Bottomley@...senpartnership.com> wrote:
>> > On Sat, 2008-08-16 at 16:39 +0100, Alan Cox wrote:
>> >> > Where exactly is this code in the kernel?  Most arches assume the irq is
>> >> > an index to a compact table bounded by NR_IRQS, so something like this
>> >> > would violate that assumption.
>> >>
>> >> Yes, which is no bad thing for some platforms. There are some driver
>> >> assumptions like that but those have also been stomped.
>> >
>> > I'm not saying we couldn't do this, or even that we shouldn't; I'm just
>> > asking why would we want to?
>> >
>> > All arches currently seem to have show_interrupts() which loop over
>> > 0..NR_IRQS where the interrupt is printed as %d.  In this encoded scheme
>> > they would show up with rather nastily large numbers that have no
>> > visible meaning unless we switch to hex for displaying them.
>> >
>> > What I'm really saying is that irq as the interrupt number is really the
>> > *user's* handle for the interrupt not the machine's, so it needs to be
>> > something the user is comfortable with.  We could overcome this
>> > objection by encoding the number to something meaningful for the
>> > user ... I'm just asking if there's any benefit to doing this?
>> >
>> the code is tip/irq/sparseirq or tip/master
>
> OK, that's either a quilt or a specifier for a git head ...
> unfortunately linux-next doesn't give you those, so I'd need either a
> commit id or a pointer to the base tree or quilt for that to make sense.
>
>> story:
>> 1. for x86_64: first we have NR_IRQS = NR_CPUS * NR_VECTORS, because
>> it already supports per_cpu vector
>
> Hmm ... the first thing that springs to mind is are you sure?  We have
> architectures (like voyager and parisc) that always had these per cpu
> vector type interrupts.  On each of them we actually factored the CPU
> affinity out of the irq number for sound reasons (although the per CPU
> vectors still exist):  The user understands better that irq line 50 is
> currently going to CPU1 and that they could change it to CPU2 (or just
> use irqbalance).  Combining the affinity into the irq number looks like
> a bad idea because users won't be able to parse it correctly.
>
>> 2. SGI want MAX_SMP support: NR_CPUS=4096, so everything is broken.
>> 3. Mike spent some time to make every array [NR_CPUS]  to per_cpu
>> define as possible.
>> 4. Mike or someone else reduce NR_IRQS to 224, because NR=256*4096,
>> will make kstat_irqs[NR_CPUS][NR_VECTORS*NR_VECTORS] too big, and it
>> could be complied.
>> 5. IBM guys report their one server is broken, that system GSI > 256,
>> so some irq can not work.
>> 6. Yinghai tried one patch change NR_IRQS=32*NR_CPUS., but sgi said it
>> still broke their system.  --- for 2.6.27
>> 7. Eric provide one patch NR_IRQS = min(32*NR_CPUS, NR_VECTORS *
>> MAX_IO_APICS) --- for 2.6.27
>> 8. For 2.6.28 later, Yinghai add code dyn_array, and probe nr_irqs, so
>> NR_IRQS related will be dynamically allocated after nr_irqs is probed.
>> 9. Eric said using dyn_array still waste ram, because a lot of
>> irq_desc is not used. when MSI-X is involved, some card could use 256
>> vectors or 4096 in theory.
>> 10. Eric said he had one dyn irq_desc, with 90% done. but didn't have
>> time to work it out left 10%
>> 11. Yinghai add sparese_irq support. those array will be increased by
>> 32, and be claimed one by one.
>> 12. according to Eric, we could have irq spread out [0, -1U), irq =
>> bus/dev/fn + entry_of_msix
>> 13. with sparseirq, /proc/interrupts will have irq_number in hex.
>>
>> but msix current cached irq number, and it only use 16bit to store
>> unsigned int irq., and later cards will call request_irq with
>> truncated irq_number...card will fallback to MSI or INTa
>
> OK, sorry, I get that there's a bug in the msix_entry ... if it's going
> to assign an irq to it, it should at least be the same type as irq.

good. for 2.6.27?

>
> What I still don't quite get is the benefit of large IRQ spaces ...
> particularly if you encode things the system doesn't really need to know
> in them.

then set nr_irqs = nr_cpu_ids * NR_VECTORS))
and count down for msi/msi-x?

YH
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/