lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <86r0bt39zm.wl-maz@kernel.org>
Date: Tue, 16 Jul 2024 11:30:05 +0100
From: Marc Zyngier <maz@...nel.org>
To: Johan Hovold <johan@...nel.org>
Cc: Thomas Gleixner <tglx@...utronix.de>,
	LKML <linux-kernel@...r.kernel.org>,
	linux-arm-kernel@...ts.infradead.org,
	linux-pci@...r.kernel.org,
	anna-maria@...utronix.de,
	shawnguo@...nel.org,
	s.hauer@...gutronix.de,
	festevam@...il.com,
	bhelgaas@...gle.com,
	rdunlap@...radead.org,
	vidyas@...dia.com,
	ilpo.jarvinen@...ux.intel.com,
	apatel@...tanamicro.com,
	kevin.tian@...el.com,
	nipun.gupta@....com,
	den@...inux.co.jp,
	andrew@...n.ch,
	gregory.clement@...tlin.com,
	sebastian.hesselbarth@...il.com,
	gregkh@...uxfoundation.org,
	rafael@...nel.org,
	alex.williamson@...hat.com,
	will@...nel.org,
	lorenzo.pieralisi@....com,
	jgg@...lanox.com,
	ammarfaizi2@...weeb.org,
	robin.murphy@....com,
	lpieralisi@...nel.org,
	nm@...com,
	kristo@...nel.org,
	vkoul@...nel.org,
	okaya@...nel.org,
	agross@...nel.org,
	andersson@...nel.org,
	mark.rutland@....com,
	shameerali.kolothum.thodi@...wei.com,
	yuzenghui@...wei.com,
	shivamurthy.shastri@...utronix.de
Subject: Re: [patch V4 00/21] genirq, irqchip: Convert ARM MSI handling to per device MSI domains

On Mon, 15 Jul 2024 15:10:01 +0100,
Johan Hovold <johan@...nel.org> wrote:
> 
> On Mon, Jul 15, 2024 at 01:58:13PM +0100, Marc Zyngier wrote:
> > On Mon, 15 Jul 2024 12:18:47 +0100,
> > Johan Hovold <johan@...nel.org> wrote:
> > > On Sun, Jun 23, 2024 at 05:18:31PM +0200, Thomas Gleixner wrote:
> > > > This is version 4 of the series to convert ARM MSI handling over to
> > > > per device MSI domains.
> 
> > > This series only showed up in linux-next last Friday and broke interrupt
> > > handling on Qualcomm platforms like sc8280xp (e.g. Lenovo ThinkPad X13s)
> > > and x1e80100 that use the GIC ITS for PCIe MSIs.
> > > 
> > > I've applied the series (21 commits from linux-next) on top of 6.10 and
> > > can confirm that the breakage is caused by commits:
> > > 
> > > 	3d1c927c08fc ("irqchip/gic-v3-its: Switch platform MSI to MSI parent")
> > > 	233db05bc37f ("irqchip/gic-v3-its: Provide MSI parent for PCI/MSI[-X]")
> > > 
> > > Applying the series up until the change before 3d1c927c08fc unbreaks the
> > > wifi on one machine:
> > > 
> > > 	ath11k_pci 0006:01:00.0: failed to enable msi: -22
> > > 	ath11k_pci 0006:01:00.0: probe with driver ath11k_pci failed with error -22
> > >
> > > and backing up until the commit before 233db05bc37f makes the NVMe come
> > > up again during boot on another.
> > > 
> > > I have not tried to debug this further.
> > 
> > I need a few things from you though, because you're not giving much to
> > help you (and I'm travelling, which doesn't help).
> 
> Yeah, this was just an early heads up.
> 
> > Can you at least investigate what in ath11k_pci_alloc_msi() causes the
> > wifi driver to be upset? Does it normally use a single MSI vector or
> > MSI-X? How about your nVME device?
> 
> It uses multiple vectors, but now it falls back to trying to allocate a
> single one and even that fails with -ENOSPC:
> 
> 	ath11k_pci 0006:01:00.0: ath11k_pci_alloc_msi - requesting one vector failed: -28
> 
> Similar for the NVMe, it uses multiple vectors normally, but now only
> the AER interrupts appears to be allocated for each controller and there
> is a GICv3 interrupt for the NVMe:
> 
> 208:          0          0          0          0          0          0          0          0  ITS-PCI-MSI-0006:00:00.0   0 Edge      PCIe PME, aerdrv
> 212:          0          0          0          0          0          0          0          0  ITS-PCI-MSI-0004:00:00.0   0 Edge      PCIe PME, aerdrv
> 214:        161          0          0          0          0          0          0          0     GICv3 562 Level     nvme0q0, nvme0q1
> 215:          0          0          0          0          0          0          0          0  ITS-PCI-MSI-0002:00:00.0   0 Edge      PCIe PME, aerdrv
>

That's an indication of the driver having failed its MSI allocation
and gone back to INTx signalling.

> Next boot, after disabling PCIe controller async probing, it's an MSI-X?!:
> 
> 201:          0          0          0          0          0          0          0          0  ITS-PCI-MSI-0006:00:00.0   0 Edge      PCIe PME, aerdrv
> 203:          0          0          0          0          0          0          0          0  ITS-PCI-MSI-0004:00:00.0   0 Edge      PCIe PME, aerdrv
> 205:          0          0          0          0          0          0          0          0  ITS-PCI-MSI-0002:00:00.0   0 Edge      PCIe PME, aerdrv
> 206:          0          0          0          0          0          0          0          0  ITS-PCI-MSIX-0002:01:00.0   0 Edge      nvme0q0
>

So is this issue actually tied to the async probing? Does it always
work if you disable it?

> This time ath11k vector allocation succeeded, but the driver times out
> eventually:
> 
> [    8.984619] ath11k_pci 0006:01:00.0: MSI vectors: 32
> [   29.690841] ath11k_pci 0006:01:00.0: failed to power up mhi: -110
> [   29.697136] ath11k_pci 0006:01:00.0: failed to start mhi: -110
> [   29.703153] ath11k_pci 0006:01:00.0: failed to power up :-110
> [   29.732144] ath11k_pci 0006:01:00.0: failed to create soc core: -110
> [   29.738694] ath11k_pci 0006:01:00.0: failed to init core: -110
> [   32.841758] ath11k_pci 0006:01:00.0: probe with driver ath11k_pci failed with error -110
> 
> > It would also help if you could define the DEBUG symbol at the very
> > top of irq-gic-v3-its.c and report the debug information that the ITS
> > driver dumps.
> 
> See below (with synchronous probing of the pcie controllers).

I don't see much going wrong there, and the ITS driver correctly
dishes out interrupts. I'll take the current -next for a ride on my
own HW and see what happens.

	M.

-- 
Without deviation from the norm, progress is not possible.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ