[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dec53919-034b-4f0e-b415-1bc1de9b0916@nvidia.com>
Date: Mon, 11 Nov 2024 15:21:36 +0800
From: Joseph Jang <jjang@...dia.com>
To: Bjorn Helgaas <helgaas@...nel.org>
Cc: shuah@...nel.org, tglx@...utronix.de, mochs@...dia.com,
linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org,
linux-tegra@...r.kernel.org
Subject: Re: [PATCH] selftest: drivers: Add support to check duplicate hwirq
On 2024/10/19 3:34 AM, Bjorn Helgaas wrote:
> On Tue, Sep 03, 2024 at 06:44:26PM -0700, Joseph Jang wrote:
>> Validate there are no duplicate hwirq from the irq debug
>> file system /sys/kernel/debug/irq/irqs/* per chip name.
>>
>> One example log show 2 duplicated hwirq in the irq debug
>> file system.
>>
>> $ sudo cat /sys/kernel/debug/irq/irqs/163
>> handler: handle_fasteoi_irq
>> device: 0019:00:00.0
>> <SNIP>
>> node: 1
>> affinity: 72-143
>> effectiv: 76
>> domain: irqchip@...000100022040000-3
>> hwirq: 0xc8000000
>> chip: ITS-MSI
>> flags: 0x20
>>
>> $ sudo cat /sys/kernel/debug/irq/irqs/174
>> handler: handle_fasteoi_irq
>> device: 0039:00:00.0
>> <SNIP>
>> node: 3
>> affinity: 216-287
>> effectiv: 221
>> domain: irqchip@...000300022040000-3
>> hwirq: 0xc8000000
>> chip: ITS-MSI
>> flags: 0x20
>>
>> The irq-check.sh can help to collect hwirq and chip name from
>> /sys/kernel/debug/irq/irqs/* and print error log when find duplicate
>> hwirq per chip name.
>>
>> Kernel patch ("PCI/MSI: Fix MSI hwirq truncation") [1] fix above issue.
>> [1]: https://lore.kernel.org/all/20240115135649.708536-1-vidyas@nvidia.com/
>
> I don't know enough about this issue to understand the details. It
> seems like you look for duplicate hwirqs in chips with the same name,
> e.g., "ITS-MSI" in this case? That name seems too generic to me
> (might there be several instances of "ITS-MSI" in a system?)
>
As I know, each PCIe device typically has only one ITS-MSI controller.
Having multiple ITS-MSI instances for the same device would lead to
confusion and potential conflicts in interrupt routing.
> Also, the name may come from chip->irq_print_chip(), so it apparently
> relies on irqchip drivers to make the names unique if there are
> multiple instances?
>
> I would have expected looking for duplicates inside something more
> specific, like "irqchip@...000300022040000-3". But again, I don't
> know enough about the problem to speak confidently here.
>
In our case, If we look for duplicates by different irq domains like
"irqchip@...000100022040000-3" and "irqchip@...000300022040000-3" as
following example.
$ sudo cat /sys/kernel/debug/irq/irqs/163
handler: handle_fasteoi_irq
device: 0019:00:00.0
<SNIP>
node: 1
affinity: 72-143
effectiv: 76
domain: irqchip@...000100022040000-3
hwirq: 0xc8000000
chip: ITS-MSI
flags: 0x20
$ sudo cat /sys/kernel/debug/irq/irqs/174
handler: handle_fasteoi_irq
device: 0039:00:00.0
<SNIP>
node: 3
affinity: 216-287
effectiv: 221
domain: irqchip@...000300022040000-3
hwirq: 0xc8000000
chip: ITS-MSI
flags: 0x20
We could not detect the duplicated hwirq number (0xc8000000) in this case.
> Cosmetic nits:
>
> - Tweak subject to match history (use "git log --oneline
> tools/testing/selftests/drivers/" to see it), e.g.,
>
> selftests: irq: Add check for duplicate hwirq
>
> - Rewrap commit log to fill 75 columns. No point in using shorter
> lines.
>
> - Indent the "$ sudu cat ..." block by a couple spaces since it's
> effectively a quotation, not part of the main text body.
>
> - Possibly include sample output of irq-check.sh (also indented as a
> quote) when run on the system where you manually found the
> duplicate via "sudo cat /sys/kernel/debug/irq/irqs/..."
>
> - Reword "The irq-check.sh can help ..." to something like this:
>
> Add an irq-check.sh test to report errors when there are
> duplicate hwirqs per chip name.
>
> - Since the kernel patch has already been merged, cite it like this
> instead of using the https://lore URL:
>
> db744ddd59be ("PCI/MSI: Prevent MSI hardware interrupt number truncation")
>
If you agree to use irq chip name ("ITS-MSI") to scan duplicate hwirq, I
could send version 2 patch to fix above suggestions.
Thank you,
Joseph.
>> Signed-off-by: Joseph Jang <jjang@...dia.com>
>> Reviewed-by: Matthew R. Ochs <mochs@...dia.com>
>> ---
>> tools/testing/selftests/drivers/irq/Makefile | 5 +++
>> tools/testing/selftests/drivers/irq/config | 2 +
>> .../selftests/drivers/irq/irq-check.sh | 39 +++++++++++++++++++
>> 3 files changed, 46 insertions(+)
>> create mode 100644 tools/testing/selftests/drivers/irq/Makefile
>> create mode 100644 tools/testing/selftests/drivers/irq/config
>> create mode 100755 tools/testing/selftests/drivers/irq/irq-check.sh
>>
>> diff --git a/tools/testing/selftests/drivers/irq/Makefile b/tools/testing/selftests/drivers/irq/Makefile
>> new file mode 100644
>> index 000000000000..d6998017c861
>> --- /dev/null
>> +++ b/tools/testing/selftests/drivers/irq/Makefile
>> @@ -0,0 +1,5 @@
>> +# SPDX-License-Identifier: GPL-2.0
>> +
>> +TEST_PROGS := irq-check.sh
>> +
>> +include ../../lib.mk
>> diff --git a/tools/testing/selftests/drivers/irq/config b/tools/testing/selftests/drivers/irq/config
>> new file mode 100644
>> index 000000000000..a53d3b713728
>> --- /dev/null
>> +++ b/tools/testing/selftests/drivers/irq/config
>> @@ -0,0 +1,2 @@
>> +CONFIG_GENERIC_IRQ_DEBUGFS=y
>> +CONFIG_GENERIC_IRQ_INJECTION=y
>> diff --git a/tools/testing/selftests/drivers/irq/irq-check.sh b/tools/testing/selftests/drivers/irq/irq-check.sh
>> new file mode 100755
>> index 000000000000..e784777043a1
>> --- /dev/null
>> +++ b/tools/testing/selftests/drivers/irq/irq-check.sh
>> @@ -0,0 +1,39 @@
>> +#!/bin/bash
>> +# SPDX-License-Identifier: GPL-2.0
>> +
>> +# This script need root permission
>> +uid=$(id -u)
>> +if [ $uid -ne 0 ]; then
>> + echo "SKIP: Must be run as root"
>> + exit 4
>> +fi
>> +
>> +# Ensure debugfs is mounted
>> +mount -t debugfs nodev /sys/kernel/debug 2>/dev/null
>> +if [ ! -d "/sys/kernel/debug/irq/irqs" ]; then
>> + echo "SKIP: irq debugfs not found"
>> + exit 4
>> +fi
>> +
>> +# Traverse the irq debug file system directory to collect chip_name and hwirq
>> +hwirq_list=$(for irq_file in /sys/kernel/debug/irq/irqs/*; do
>> + # Read chip name and hwirq from the irq_file
>> + chip_name=$(cat "$irq_file" | grep -m 1 'chip:' | awk '{print $2}')
>> + hwirq=$(cat "$irq_file" | grep -m 1 'hwirq:' | awk '{print $2}' )
>> +
>> + if [ -z "$chip_name" ] || [ -z "$hwirq" ]; then
>> + continue
>> + fi
>> +
>> + echo "$chip_name $hwirq"
>> +done)
>> +
>> +dup_hwirq_list=$(echo "$hwirq_list" | sort | uniq -cd)
>> +
>> +if [ -n "$dup_hwirq_list" ]; then
>> + echo "ERROR: Found duplicate hwirq"
>> + echo "$dup_hwirq_list"
>> + exit 1
>> +fi
>> +
>> +exit 0
>> --
>> 2.34.1
>>
>
Powered by blists - more mailing lists