lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAhV-H6EPkGJchA4pg=zctmmt=9LboaFqKhFgQxZKNxJxQVT7g@mail.gmail.com>
Date:   Mon, 29 May 2023 14:52:29 +0800
From:   Huacai Chen <chenhuacai@...il.com>
To:     Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>
Cc:     Bjorn Helgaas <helgaas@...nel.org>,
        Huacai Chen <chenhuacai@...ngson.cn>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        "Ahmed S . Darwish" <darwi@...utronix.de>,
        Jason Gunthorpe <jgg@...pe.ca>,
        Kevin Tian <kevin.tian@...el.com>, linux-pci@...r.kernel.org,
        Jianmin Lv <lvjianmin@...ngson.cn>,
        Jiaxun Yang <jiaxun.yang@...goat.com>,
        loongson-kernel@...ts.loongnix.cn,
        Juxin Gao <gaojuxin@...ngson.cn>,
        Marc Zyngier <maz@...nel.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] pci: irq: Add an early parameter to limit pci irq numbers

Hi, Manivannan,

On Mon, May 29, 2023 at 1:39 PM Manivannan Sadhasivam
<manivannan.sadhasivam@...aro.org> wrote:
>
> On Mon, May 29, 2023 at 10:02:20AM +0800, Huacai Chen wrote:
> > Hi, Manivannan,
> >
> > On Mon, May 29, 2023 at 12:57 AM Manivannan Sadhasivam
> > <manivannan.sadhasivam@...aro.org> wrote:
> > >
> > > On Thu, May 25, 2023 at 05:14:28PM +0800, Huacai Chen wrote:
> > > > Hi, Bjorn,
> > > >
> > > > On Wed, May 24, 2023 at 11:21 PM Bjorn Helgaas <helgaas@...nel.org> wrote:
> > > > >
> > > > > [+cc Marc, LKML]
> > > > >
> > > > > On Wed, May 24, 2023 at 05:36:23PM +0800, Huacai Chen wrote:
> > > > > > Some platforms (such as LoongArch) cannot provide enough irq numbers as
> > > > > > many as logical cpu numbers. So we should limit pci irq numbers when
> > > > > > allocate msi/msix vectors, otherwise some device drivers may fail at
> > > > > > initialization. This patch add a cmdline parameter "pci_irq_limit=xxxx"
> > > > > > to control the limit.
> > > > > >
> > > > > > The default pci msi/msix number limit is defined 32 for LoongArch and
> > > > > > NR_IRQS for other platforms.
> > > > >
> > > > > The IRQ experts can chime in on this, but this doesn't feel right to
> > > > > me.  I assume arch code should set things up so only valid IRQ numbers
> > > > > can be allocated.  This doesn't seem necessarily PCI-specific, I'd
> > > > > prefer to avoid an arch #ifdef here, and I'd also prefer to avoid a
> > > > > command-line parameter that users have to discover and supply.
> > > > The problem we meet: LoongArch machines can have as many as 256
> > > > logical cpus, and the maximum of msi vectors is 192. Even on a 64-core
> > > > machine, 192 irqs can be easily exhausted if there are several NICs
> > > > (NIC usually allocates msi irqs depending on the number of online
> > > > cpus). So we want to limit the msi allocation.
> > > >
> > >
> > > If the MSI allocation fails with multiple vectors, then the NIC driver should
> > > revert to a single MSI vector. Is that happening in your case?
> > Thank you for pointing this out. Yes, I know  most existing drivers
> > will fallback to use single msi or legacy irqs when failed. However,
> > as I
> > replied in another thread (the new solution of this problem [1]), we
> > want to do some proactive throttling rather than consume msi vectors
> > aggressively. For example, if we have two NICs, we want both of them
> > to get 32 msi vectors; not one exhaust all available vectors, and the
> > other fallback to use single msi or legacy irq.
> >
> > I hope I have explained clearly, thanks.
> >
>
> The problem you are facing is not specific to Loongsoon but rather generic. And
> the solution we have currently is what you were also aware of it seems. So if
> you want to propose an alternative solution, it should be generic and also a
> good justification needs to be provided to the maintainers i.e., comparing two
> solutions and why yours is better.
Yes, I think we are facing a generic problem, but it is more obvious
on platforms which provide less MSI vectors. And my solution seems
generic enough. :)

At least in my example, "proactive throttling" is better than
"aggressive consuming", because two (or more) NICs have more balanced
throughput.

>
> But IMO what you are proposing seems like usecase driven and may not work all
> the time due to architecture limitation. This again proves that the existing
> solution is sufficient enough.
Yes, it's a usecase driven solution, so I provide a cmdline parameter
to let the user decide.

Huacai
>
> - Mani
>
> > [1] https://lore.kernel.org/lkml/20230527054633.704916-1-chenhuacai@loongson.cn/T/#t
> >
> > Huacai
> > >
> > > - Mani
> > >
> > > > This is not a LoongArch-specific problem, because I think other
> > > > platforms can also meet if they have many NICs. But of course,
> > > > LoongArch can meet it more easily because the available msi vectors
> > > > are very few. So, adding a cmdline parameter is somewhat reasonable.
> > > >
> > > > After some investigation, I think it may be possible to modify
> > > > drivers/irqchip/irq-loongson-pch-msi.c and override
> > > > msi_domain_info::domain_alloc_irqs() to limit msi allocation. However,
> > > > doing that need to remove the "static" before
> > > > __msi_domain_alloc_irqs(), which means revert
> > > > 762687ceb31fc296e2e1406559e8bb5 ("genirq/msi: Make
> > > > __msi_domain_alloc_irqs() static"), I don't know whether that is
> > > > acceptable.
> > > >
> > > > If such a revert is not acceptable, it seems that we can only use the
> > > > method in this patch. Maybe rename pci_irq_limits to pci_msi_limits is
> > > > a little better.
> > > >
> > > > Huacai
> > > >
> > > > >
> > > > > > Signed-off-by: Juxin Gao <gaojuxin@...ngson.cn>
> > > > > > Signed-off-by: Huacai Chen <chenhuacai@...ngson.cn>
> > > > > > ---
> > > > > >  drivers/pci/msi/msi.c | 26 +++++++++++++++++++++++++-
> > > > > >  1 file changed, 25 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
> > > > > > index ef1d8857a51b..6617381e50e7 100644
> > > > > > --- a/drivers/pci/msi/msi.c
> > > > > > +++ b/drivers/pci/msi/msi.c
> > > > > > @@ -402,12 +402,34 @@ static int msi_capability_init(struct pci_dev *dev, int nvec,
> > > > > >       return ret;
> > > > > >  }
> > > > > >
> > > > > > +#ifdef CONFIG_LOONGARCH
> > > > > > +#define DEFAULT_PCI_IRQ_LIMITS 32
> > > > > > +#else
> > > > > > +#define DEFAULT_PCI_IRQ_LIMITS NR_IRQS
> > > > > > +#endif
> > > > > > +
> > > > > > +static int pci_irq_limits = DEFAULT_PCI_IRQ_LIMITS;
> > > > > > +
> > > > > > +static int __init pci_irq_limit(char *str)
> > > > > > +{
> > > > > > +     get_option(&str, &pci_irq_limits);
> > > > > > +
> > > > > > +     if (pci_irq_limits == 0)
> > > > > > +             pci_irq_limits = DEFAULT_PCI_IRQ_LIMITS;
> > > > > > +
> > > > > > +     return 0;
> > > > > > +}
> > > > > > +
> > > > > > +early_param("pci_irq_limit", pci_irq_limit);
> > > > > > +
> > > > > >  int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
> > > > > >                          struct irq_affinity *affd)
> > > > > >  {
> > > > > >       int nvec;
> > > > > >       int rc;
> > > > > >
> > > > > > +     maxvec = clamp_val(maxvec, 0, pci_irq_limits);
> > > > > > +
> > > > > >       if (!pci_msi_supported(dev, minvec) || dev->current_state != PCI_D0)
> > > > > >               return -EINVAL;
> > > > > >
> > > > > > @@ -776,7 +798,9 @@ static bool pci_msix_validate_entries(struct pci_dev *dev, struct msix_entry *en
> > > > > >  int __pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, int minvec,
> > > > > >                           int maxvec, struct irq_affinity *affd, int flags)
> > > > > >  {
> > > > > > -     int hwsize, rc, nvec = maxvec;
> > > > > > +     int hwsize, rc, nvec;
> > > > > > +
> > > > > > +     nvec = clamp_val(maxvec, 0, pci_irq_limits);
> > > > > >
> > > > > >       if (maxvec < minvec)
> > > > > >               return -ERANGE;
> > > > > > --
> > > > > > 2.39.1
> > > > > >
> > >
> > > --
> > > மணிவண்ணன் சதாசிவம்
>
> --
> மணிவண்ணன் சதாசிவம்

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ