linux-kernel - Re: [PATCH 1/2] genirq/msi, platform-msi: Adjust return value of msi_domain_prepare

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <878rd6lwlh.ffs@tglx>
Date:   Mon, 29 May 2023 22:19:06 +0200
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Huacai Chen <chenhuacai@...il.com>
Cc:     Marc Zyngier <maz@...nel.org>,
        Huacai Chen <chenhuacai@...ngson.cn>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        linux-kernel@...r.kernel.org, loongson-kernel@...ts.loongnix.cn,
        Xuefeng Li <lixuefeng@...ngson.cn>,
        Jiaxun Yang <jiaxun.yang@...goat.com>
Subject: Re: [PATCH 1/2] genirq/msi, platform-msi: Adjust return value of
 msi_domain_prepare_irqs()

Huacai!

On Mon, May 29 2023 at 17:36, Huacai Chen wrote:
> On Mon, May 29, 2023 at 5:27 PM Thomas Gleixner <tglx@...utronix.de> wrote:
>> By default you allow up to 256 interrupts to be allocated, right? So to
>> prevent vector exhaustion, the admin needs to reboot the machine and set
>> a command line parameter to limit this, right? As that parameter is not
>> documented the admin is going to dice a number. That's impractical and
>> just a horrible bandaid.
>
> OK, I think I should update the documents in the new version.

Updating documentation neither makes it more practical (it still
requires a reboot) nor does it justify the abuse of the msi_prepare()
callback.

The only reason why this hack "works" is that there is a historical
mechanism which tells the PCI/MSI core that the number of requested
vectors cannot be allocated, but that there would be $N vectors
possible. But even that return value has no guarantee.

This mechanism is ill defined and really should go away.

Adding yet another way to limit this via msi_prepare() is just
proliferating this ill defined mechanism and I have zero interest in
that.

Let's take a step back and look at the larger picture:

 1) A PCI/MSI irqdomain is attached to a PCI bus  

 2) The number of PCI devices on that PCI bus is usually known at boot
    time _before_ the first device driver is probed.

    That's not entirely true for PCI hotplug devices, but that's hardly
    relevant for an architecture which got designed less than 10 years
    ago and the architects decided that 256 MSI vectors are good enough
    for up to 256 CPUs. The concept of per CPU queues was already known
    at that time, no?

So the irqdomain can tell the PCI/MSI core the maximum number of vectors
available for a particular bus, right?

The default, i.e if the irqdomain does not expose that information,
would be "unlimited", i.e. ULONG_MAX.

Now take that number and divide it by the number of devices on the bus
and you get at least a sensible limit which does not immediately cause
vector exhaustion.

That limit might be suboptimal if there are lots of other devices on
that bus which just require one or two vectors, but that's something
which can be optimized via a generic command line option or even a sysfs
mechanism.

Thanks,

        tglx