[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260116171048.GP961588@nvidia.com>
Date: Fri, 16 Jan 2026 13:10:48 -0400
From: Jason Gunthorpe <jgg@...dia.com>
To: Nicolas Cavallari <Nicolas.Cavallari@...en-communications.fr>
Cc: iommu@...ts.linux.dev, linux-pci@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
Bjorn Helgaas <bhelgaas@...gle.com>,
"Rob Herring (Arm)" <robh@...nel.org>,
Robin Murphy <robin.murphy@....com>,
Lorenzo Pieralisi <lpieralisi@...nel.org>,
Joerg Roedel <jroedel@...e.de>, regressions@...ts.linux.dev
Subject: Re: [REGRESSION] Re: imx8 PCI regression since "iommu: Get DT/ACPI
parsing into the proper probe path"
On Fri, Jan 16, 2026 at 05:52:36PM +0100, Nicolas Cavallari wrote:
> I debugged it further, it seems to be mostly a PCI issue since the system
> does not actually have an IOMMU.
>
> When examining changes in the PCI configuration (lspci -vvvv), the main
> difference is that, with the patch, Access Control Services are enabled on
> the PCI switch.
>
> Capabilities: [220 v1] Access Control Services
> ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+
> UpstreamFwd+ EgressCtrl+ DirectTrans+
> - ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir-
> UpstreamFwd- EgressCtrl- DirectTrans-
> + ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+
> UpstreamFwd+ EgressCtrl- DirectTrans-
>
> If I manually patch the config space in sysfs and re-disable ACS on the port
> connected to the LAN7430, I cannot reproduce the problem. In fact,
> disabling only ReqRedir is enough to work around the issue.
My guess would be your system has some kind of address alias going on?
Assuming you are not facing an errata, ACS generally changes the
routing of TLPs so if you have a DMA address that could go to two
different places then messing with ACS will give you different
behaviors.
In specific when you turn all those ACS settings you cannot do P2P
traffic anymore. If your system expects this for some reason then you
must use the kernel command line option to disable acs.
If you are just doing normal netdev stuff then it is doubtful that you
are doing P2P at all, so I might guess a bug in the microchip ethernet
driver doing a wild DMA? Stricter ACS settings cause it to AER and the
device cannot recover?
It will be hard to get the bottom of the defect without a PCI trace
I don't know why your bisection landed on bcb8 - the intention was
that pci_enable_acs() is always called, and I didn't notice an obvious
reason why that wouldn't happen prior to bcb8.. It is called directly
from pci_device_add() Maybe investigating that angle would be
informative..
> I also read up on AER and I'm surprised that I don't see anything in dmesg
> when the problem occurs, even through UERcvd+ start appearing on the root
> context and AdvNonFatalErr+ appears on the switch.
Though UE and AdvNonFatalErr sure are weird indications for an
addressing error.. Is there some kind of special embedded system thing
going on? Vendor messages over PCI perhaps?
Jason
Powered by blists - more mailing lists