lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 1 Feb 2023 13:59:32 +0800
From:   Baolu Lu <baolu.lu@...ux.intel.com>
To:     Bjorn Helgaas <helgaas@...nel.org>
Cc:     baolu.lu@...ux.intel.com, Bjorn Helgaas <bhelgaas@...gle.com>,
        Joerg Roedel <jroedel@...e.de>,
        Matt Fagnani <matt.fagnani@...l.net>,
        Christian König <christian.koenig@....com>,
        Jason Gunthorpe <jgg@...dia.com>,
        Kevin Tian <kevin.tian@...el.com>,
        Vasant Hegde <vasant.hegde@....com>,
        Tony Zhu <tony.zhu@...el.com>, linux-pci@...r.kernel.org,
        iommu@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 1/1] PCI: Add translated request only flag for
 pci_enable_pasid()

On 2023/2/1 8:14, Bjorn Helgaas wrote:
> On Tue, Jan 31, 2023 at 08:56:13PM +0800, Baolu Lu wrote:
>> On 2023/1/31 2:38, Bjorn Helgaas wrote:
>>>> PCI: Add translated request only flag for pci_enable_pasid()
>>>>
>>>> The PCIe fabric routes Memory Requests based on the TLP address, ignoring
>>>> the PASID. In order to ensure system integrity, commit 201007ef707a ("PCI:
>>>> Enable PASID only when ACS RR & UF enabled on upstream path") requires
>>>> some ACS features being supported on device's upstream path when enabling
>>>> PCI/PASID.
> 
> Looking up 201007ef707a to see what ensuring system integrity means,
> it prevents Memory Requests with PASID, which should always be routed
> to the RC, from being mistakenly routed as peer-to-peer requests.

Yes, exactly. I will update above paragraph like below

The PCIe fabric routes Memory Requests based on the TLP address, ignoring
the PASID. In order to prevent Memory Requests with PASID, which should
always be routed to the RC, from being mistakenly routed as peer-to-peer
requests, commit 201007ef707a ("PCI: Enable PASID only when ACS RR & UF
enabled on upstream path") requires some ACS features being supported on
device's upstream path when enabling PCI/PASID.


> 
>>>> However, above change causes the Linux kernel boots to black screen on a
>>>> system with below graphic device:
>>>
>>> We need a PCIe concept-level description of the issue first, i.e., in
>>> terms of DMA, PASID, ACS, etc.  Then we can mention the AMD GPU issue
>>> as an instance.
>>
>> How about below description?
> 
> Thanks, this is exactly the sort of thing I'm looking for.  But my
> understanding of ATS/PRI/PASID is weak, so I'm still working through
> this.  Tell me when I say something wrong below...
> 
>> PCIe endpoints can use ATS to request DMA remapping hardware to
>> translate an IOVA to its mapped physical address. If the translation is
>> missing or the permissions are insufficient, the PRI is used to trigger
>> an I/O page fault. The IOMMU driver will fill the mapping with desired
>> permissions and return the translated address to the device.
> 
> In PCIe spec language, I think you're saying that a PCIe Function may
> contain an ATC.  If the ATC Capability Enable bit is set, the Function
> can issue Translation Requests.
> 
> The TA (aka IOMMU) will respond with a Translation Completion.  If the
> Completion is a CplD, it contains the translated address and the
> Function can store the entry in its ATC.  I assume the I/O page fault
> case corresponds to a Cpl (with no data) meaning that the TA could not
> translate the address.
> 
> If the TA doesn't have a mapping with the desired permissions, and the
> Function's Page Request Capability Enable bit is set, it may issue a
> Page Request Message.  It's up to the TA/IOMMU to make this message
> visible to the OS, which can make the page resident, create an IOMMU
> mapping, and enable a PRG Response Message.  After the Function
> receives the PRG Response Message, it would issue another Translation
> Request.
> 
>> The translated address is specified by the IOMMU driver. The IOMMU
>> driver ensures that the address is a DMA buffer address instead of any
>> P2P address in the PCI fabric. Therefore, any translated memory request
>> will eventually be routed to IOMMU regardless of whether there is ACS
>> control in the up-streaming path.
> 
> A Memory Request with an address that is not a P2P address, i.e., it
> is not contained in any bridge aperture, will *always* be routed
> toward the RC, won't it?  Isn't that the case regardless of whether
> the address is translated or untranslated, and even regardless of ACS?
> 
> IIUC, ACS basically causes peer-to-peer requests to be routed upstream
> instead of directly to the peer.
> 
> OK, reading this again, I realize that I just restated exactly what
> you had already written, sorry about that.

Never mind. It's really a good chance for me to learn how to describe
this from the perspective of the PCI subsystem.. :-)

> 
>> AMD GPU is one of those devices.
> 
> I guess you mean the AMD GPU has ATS, PRI, and PASID Capabilities?
> And furthermore, that the GPU *always* uses Translated addresses with
> PASID?
> 
> So I guess what's going on here is that if:
> 
>    - A device only uses PASID with Translated addresses, and
>    - those Translated addresses are never P2P addresses, then
>    - those transactions will always be routed to the RC.
> 
> And this applies even if there is no ACS or ACS doesn't support
> PCI_ACS_RR and PCI_ACS_UF.

Yes.

> 
> The black screen happens because ... ?
> 
> What can we include in the commit log to help people find this fix?  I
> see these in the bugzilla:
> 
>    WARNING: CPU: 0 PID: 477 at drivers/pci/ats.c:251 pci_disable_pri+0x75/0x80
>    WARNING: CPU: 0 PID: 477 at drivers/pci/ats.c:419 pci_disable_pasid+0x45/0x50
> 
> (These look like defects in pdev_pri_ats_enable(), so really just
> distractions)
> 
>    kfd kfd: amdgpu: Failed to resume IOMMU for device 1002:9874
>    kfd kfd: amdgpu: device 1002:9874 NOT added due to errors
>    BUG: kernel NULL pointer dereference, address: 0000000000000058
>    RIP: 0010:report_iommu_fault+0x11/0x90
> 
> I couldn't figure out the NULL pointer dereference.  I expected it to
> be from a BUG() or similar in report_iommu_fault(), but I don't see
> that.

Above has been answered by Vasant Hegde in a separated reply.

Best regards,
baolu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ