lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 2 Feb 2022 22:13:31 +0000
From:   Brent Spillner <spillner@....org>
To:     Bjorn Helgaas <helgaas@...nel.org>
Cc:     Bjorn Helgaas <bhelgaas@...gle.com>, tglx@...utronix.de,
        mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com,
        x86@...nel.org, hpa@...or.com, linux-pci@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] x86/PCI: Improve log message when IRQ cannot be identified

On Wed, Feb 2, 2022 at 6:13 PM Bjorn Helgaas <helgaas@...nel.org> wrote:
> IIUC pirq_enable_irq() is only used for non-ACPI, non-DT, non-Xen,
> non-Intel MID systems, so this is a real legacy path.
>
> I don't think it's really worth cluttering an error case in a path
> that should be rarely used in the first place.

I figured if people are getting this message, then they either have
broken hardware or are debugging something, and if the message is
trying to be useful then it can't hurt to mention other things that
might help. If the pirq path has become buggy or unsupportable in
modern kernels then it probably ought to be removed altogether; if it
still works, however rarely it might be needed these days, then it's
perhaps worth mentioning to those who might occasionally have a use
for it.

> Are you seeing a problem where you're getting the wrong error message
> today?  Can we just fix that problem instead so no kernel parameter is
> needed in the first place?

I was trying to isolate intermittent ACPI errors and tried booting
with acpi=noirq, as seemingly the closest modern equivalent to the
acpi=ht that had solved, or at least half-split, similar issues for me
in the past. With noirq, a different PCIe device stopped working (MPT
Fusion driver not picking up any responses to doorbell interrupts),
and while reviewing dmesg I noticed that the PCI error messages were
suggesting a kernel option that wasn't appropriate for my x86_64
architecture. In an ideal world with no hardware or driver problems
these log messages should never even happen, but in the real world of
troubleshooting and debugging I think they can be useful, and if
they're going to be reported they might as well be correct.

Obviously, this is a very minor bug, affecting only logs rather than
behavior, and I'm sure there are more pressing things to worry about.
On the other hand, it also seems like a very easy and low-risk fix
that leaves the kernel in a slightly better state for future users and
developers. At any rate, the current state of the PCI code (a)
generates a message that recommends specific kernel parameters, (b)
does so even on builds for which those parameters are inappropriate,
(c) doesn't say anything to encourage bug reports, and (d) doesn't
warn about the risks of noirq, which could cause other problems to be
misattributed. So even though the alternate messages I drafted may not
be perfect, and might need to be tuned in the future based on patterns
in whatever bug reports come in, I'm still confident that they're an
improvement (and I'm open to further suggestions).

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ