[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241122175449.GA2433467@bhelgaas>
Date: Fri, 22 Nov 2024 11:54:49 -0600
From: Bjorn Helgaas <helgaas@...nel.org>
To: Joseph Jang <jjang@...dia.com>
Cc: shuah@...nel.org, tglx@...utronix.de, mochs@...dia.com,
linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org,
linux-tegra@...r.kernel.org
Subject: Re: [PATCH] selftest: drivers: Add support to check duplicate hwirq
On Mon, Nov 11, 2024 at 03:21:36PM +0800, Joseph Jang wrote:
> On 2024/10/19 3:34 AM, Bjorn Helgaas wrote:
> > On Tue, Sep 03, 2024 at 06:44:26PM -0700, Joseph Jang wrote:
> > > Validate there are no duplicate hwirq from the irq debug
> > > file system /sys/kernel/debug/irq/irqs/* per chip name.
> > >
> > > One example log show 2 duplicated hwirq in the irq debug
> > > file system.
> > >
> > > $ sudo cat /sys/kernel/debug/irq/irqs/163
> > > handler: handle_fasteoi_irq
> > > device: 0019:00:00.0
> > > <SNIP>
> > > node: 1
> > > affinity: 72-143
> > > effectiv: 76
> > > domain: irqchip@...000100022040000-3
> > > hwirq: 0xc8000000
> > > chip: ITS-MSI
> > > flags: 0x20
> > >
> > > $ sudo cat /sys/kernel/debug/irq/irqs/174
> > > handler: handle_fasteoi_irq
> > > device: 0039:00:00.0
> > > <SNIP>
> > > node: 3
> > > affinity: 216-287
> > > effectiv: 221
> > > domain: irqchip@...000300022040000-3
> > > hwirq: 0xc8000000
> > > chip: ITS-MSI
> > > flags: 0x20
> > >
> > > The irq-check.sh can help to collect hwirq and chip name from
> > > /sys/kernel/debug/irq/irqs/* and print error log when find duplicate
> > > hwirq per chip name.
> > >
> > > Kernel patch ("PCI/MSI: Fix MSI hwirq truncation") [1] fix above issue.
> > > [1]: https://lore.kernel.org/all/20240115135649.708536-1-vidyas@nvidia.com/
> >
> > I don't know enough about this issue to understand the details. It
> > seems like you look for duplicate hwirqs in chips with the same name,
> > e.g., "ITS-MSI" in this case? That name seems too generic to me
> > (might there be several instances of "ITS-MSI" in a system?)
>
> As I know, each PCIe device typically has only one ITS-MSI controller.
> Having multiple ITS-MSI instances for the same device would lead to
> confusion and potential conflicts in interrupt routing.
>
> > Also, the name may come from chip->irq_print_chip(), so it apparently
> > relies on irqchip drivers to make the names unique if there are
> > multiple instances?
> >
> > I would have expected looking for duplicates inside something more
> > specific, like "irqchip@...000300022040000-3". But again, I don't
> > know enough about the problem to speak confidently here.
>
> In our case, If we look for duplicates by different irq domains like
> "irqchip@...000100022040000-3" and "irqchip@...000300022040000-3" as
> following example.
>
> $ sudo cat /sys/kernel/debug/irq/irqs/163
> handler: handle_fasteoi_irq
> device: 0019:00:00.0
> <SNIP>
> node: 1
> affinity: 72-143
> effectiv: 76
> domain: irqchip@...000100022040000-3
> hwirq: 0xc8000000
> chip: ITS-MSI
> flags: 0x20
> $ sudo cat /sys/kernel/debug/irq/irqs/174
> handler: handle_fasteoi_irq
> device: 0039:00:00.0
> <SNIP>
> node: 3
> affinity: 216-287
> effectiv: 221
> domain: irqchip@...000300022040000-3
> hwirq: 0xc8000000
> chip: ITS-MSI
> flags: 0x20
>
> We could not detect the duplicated hwirq number (0xc8000000) in this
> case.
Again, this is really out of my area, but based on
Documentation/core-api/irq/irq-domain.rst, I assumed the point of
hwirq was that hwirq numbers were local to an interrupt controller,
i.e., to an irq_domain.
If that's the case, it should not be a problem if hwirq number
0xc8000000 is used in two separate irq_domains.
Bjorn
Powered by blists - more mailing lists