lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 29 Jun 2023 17:11:03 -0500
From:   Bjorn Helgaas <helgaas@...nel.org>
To:     Radu Rendec <rrendec@...hat.com>
Cc:     Jingoo Han <jingoohan1@...il.com>,
        Gustavo Pimentel <gustavo.pimentel@...opsys.com>,
        Lorenzo Pieralisi <lpieralisi@...nel.org>,
        Krzysztof Wilczynski <kw@...ux.com>,
        Rob Herring <robh@...nel.org>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        Marc Zyngier <maz@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
        Pali Rohár <pali@...nel.org>
Subject: Re: [PATCH 1/1] PCI: dwc: Use regular interrupt instead of chained

[+cc Pali]

On Thu, Jun 29, 2023 at 05:27:54PM -0400, Radu Rendec wrote:
> On Thu, 2023-06-29 at 15:58 -0500, Bjorn Helgaas wrote:
> > On Thu, Jun 29, 2023 at 04:42:07PM -0400, Radu Rendec wrote:
> > > On Thu, 2023-06-29 at 14:57 -0500, Bjorn Helgaas wrote:
> > > > On Thu, Jun 29, 2023 at 02:30:19PM -0400, Radu Rendec wrote:
> > > > > The DesignWare PCIe host driver uses a chained interrupt to demultiplex
> > > > > the downstream MSI interrupts. On Qualcomm SA8540P Ride, enabling both
> > > > > pcie2a and pcie3a at the same time can create an interrupt storm where
> > > > > the parent interrupt fires continuously, even though reading the PCIe
> > > > > host registers doesn't identify any child MSI interrupt source. This
> > > > > effectively locks up CPU0, which spends all the time servicing these
> > > > > interrupts.
> > > > > 
> > > > > This is a clear example of how bypassing the interrupt core by using
> > > > > chained interrupts can be very dangerous if the hardware misbehaves.
> > > > > 
> > > > > Convert the driver to use a regular interrupt for the demultiplex
> > > > > handler. This allows the interrupt storm detector to detect the faulty
> > > > > interrupt and disable it, allowing the system to run normally.

> ...
> > But you do imply that there's some DesignWare hardware issue involved,
> > too, so I guess it's possible the other drivers don't have an issue
> > and/or actually require the chained IRQs.  That's why I asked how we
> > should decide.
> 
> That makes sense. I don't know if it's a DesignWare hardware issue or
> something else down the PCIe bus. Other folks in my team are
> investigating the root cause. This thread has the details:
> https://lore.kernel.org/all/pmodcoakbs25z2a7mlo5gpuz63zluh35vbgb5itn6k5aqhjnny@jvphbpvahtse/
> 
> I want to point out that this patch doesn't *fix* the root problem (the
> interrupt storm). It just makes the kernel handle it (much) better if
> it occurs.
> 
> I can't see why any driver would _require_ chained IRQs. As I see it,
> chained interrupts are just a "shortcut" that circumvents the IRQ core
> for (debatable) performance reasons. Other than that, they should work
> the same as regular interrupts.
> 
> There are other benefits associated with using regular interrupts. One
> of them is the ability to control the SMP affinity of the parent
> interrupt. But that's a different topic.

Thanks for mentioning IRQ affinity because that reminds me of a
long-standing related issue.  Pali posted a similar patch for mvebu
[1], but it's been stalled for a long time because apparently there's
some potential breakage of the userspace ABI [2].

Unfortunately I'm not an IRQ expert and haven't followed all the
arguments there.  BUT it does seem like Marc's objection to Pali's
patch would also apply to your patch, so maybe this is an opportune
time to re-evaluate the situation.

If converting from chained to normal handlers can be done safely, I
would definitely be in favor if doing it across all of drivers/pci/ at
once so they're all consistent.  Otherwise that code just gets copied
to new drivers, so the issue persists and spreads.

Bjorn

[1] https://lore.kernel.org/r/20220524122817.7199-1-pali@kernel.org
[2] https://lore.kernel.org/r/874k0bf7f7.wl-maz@kernel.org

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ