lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <e0c1e50d-ee6c-290a-b0bc-24fc81dde90f@amd.com>
Date:   Fri, 17 Feb 2023 11:06:57 +0530
From:   Vasant Hegde <vasant.hegde@....com>
To:     Felix Kuehling <felix.kuehling@....com>,
        Matt Fagnani <matt.fagnani@...l.net>,
        Bjorn Helgaas <helgaas@...nel.org>,
        Baolu Lu <baolu.lu@...ux.intel.com>,
        "Huang, Shimmer" <Shimmer.Huang@....com>,
        "Liu, Aaron" <Aaron.Liu@....com>, Jason Gunthorpe <jgg@...dia.com>
Cc:     Joerg Roedel <jroedel@...e.de>,
        "regressions@...ts.linux.dev" <regressions@...ts.linux.dev>,
        Thorsten Leemhuis <regressions@...mhuis.info>,
        Linux PCI <linux-pci@...r.kernel.org>,
        "Pan, Xinhui" <Xinhui.Pan@....com>, amd-gfx@...ts.freedesktop.org,
        LKML <linux-kernel@...r.kernel.org>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        "iommu@...ts.linux.dev" <iommu@...ts.linux.dev>,
        "Deucher, Alexander" <Alexander.Deucher@....com>,
        Christian König <christian.koenig@....com>
Subject: Re: [regression, bisected, pci/iommu] Bug 216865 - Black screen when amdgpu started during 6.2-rc1 boot with AMD IOMMU enabled

Hi Felix,


On 2/17/2023 1:29 AM, Felix Kuehling wrote:
>> Feb 16 13:22:32 kernel: kfd kfd: amdgpu: Failed to resume IOMMU for device
>> 1002:9874
>> Feb 16 13:22:32 kernel: kfd kfd: amdgpu: device 1002:9874 NOT added due to errors 
> This look like IOMMU device initialization still fails (but more gracefully
> now). Vasant, is that expected?

My fix is to gracefully handle failure paths in IOMMU. So above logs are
expected. Basically it means IOMMU couldn't attach devices to new domain
(because it couldn't enable PASID on AMD GPU as ACS RR/UF flags are missing, see
commit 201007ef707 ) and we did fall back to old domain properly.

It also means that GPU will not be able to use PASID/PRI. If you need these
feauteres then you have to look into commit 201007ef707 and see how we can
enable PASID for GPU (without ACS UF/RR flag?).


> 
> This would lead to KFD not being available on Carrizo with this kernel, which is
> probably not a big limitation in practice. It would only affect compute
> applications using the ROCm user mode stack. I don't think anyone does that
> these days on these old APUs.
> 
> The SMU errors seem unrelated to this unless there is some subtle interaction
> I'm missing.

I have no idea about GPU warning. All I can say is IOMMU side looks good but
PASID/PRI is not enabled for GPU.

-Vasant


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ