lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9103b537-62c3-d6b2-b576-713406635455@amd.com>
Date:   Thu, 16 Feb 2023 11:07:06 +0530
From:   Vasant Hegde <vasant.hegde@....com>
To:     Jason Gunthorpe <jgg@...dia.com>,
        Felix Kuehling <felix.kuehling@....com>
Cc:     Bjorn Helgaas <helgaas@...nel.org>,
        Baolu Lu <baolu.lu@...ux.intel.com>,
        "Huang, Shimmer" <Shimmer.Huang@....com>,
        "Liu, Aaron" <Aaron.Liu@....com>, Joerg Roedel <jroedel@...e.de>,
        "regressions@...ts.linux.dev" <regressions@...ts.linux.dev>,
        Thorsten Leemhuis <regressions@...mhuis.info>,
        Linux PCI <linux-pci@...r.kernel.org>,
        "Pan, Xinhui" <Xinhui.Pan@....com>, amd-gfx@...ts.freedesktop.org,
        LKML <linux-kernel@...r.kernel.org>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        "iommu@...ts.linux.dev" <iommu@...ts.linux.dev>,
        Matt Fagnani <matt.fagnani@...l.net>,
        "Deucher, Alexander" <Alexander.Deucher@....com>,
        Christian König <christian.koenig@....com>
Subject: Re: [regression, bisected, pci/iommu] Bug 216865 - Black screen when amdgpu started during 6.2-rc1 boot with AMD IOMMU enabled

Hi Jason,


On 2/16/2023 6:14 AM, Jason Gunthorpe wrote:
> On Wed, Feb 15, 2023 at 07:35:45PM -0500, Felix Kuehling wrote:
>>
>> If I understand this correctly, the HW or the BIOS is doing something wrong
>> about reporting ACS. I don't know what the GPU driver can do other than add
>> some quirk to stop using AMD IOMMUv2 on this HW/BIOS.
> 
> How about this:
> 
> diff --git a/drivers/iommu/amd/iommu_v2.c b/drivers/iommu/amd/iommu_v2.c
> index 864e4ffb6aa94e..cc027ce9a6e86f 100644
> --- a/drivers/iommu/amd/iommu_v2.c
> +++ b/drivers/iommu/amd/iommu_v2.c
> @@ -732,6 +732,7 @@ EXPORT_SYMBOL(amd_iommu_unbind_pasid);
>  
>  int amd_iommu_init_device(struct pci_dev *pdev, int pasids)
>  {
> +	struct iommu_dev_data *dev_data = dev_iommu_priv_get(&pdev->dev);
>  	struct device_state *dev_state;
>  	struct iommu_group *group;
>  	unsigned long flags;
> @@ -740,6 +741,9 @@ int amd_iommu_init_device(struct pci_dev *pdev, int pasids)
>  
>  	might_sleep();
>  
> +	if (!dev_data->ats.enabled)
> +		return -EINVAL;
> +

Thanks for the proposed fix. But aactually this will not solve the issue because
current flow is :
  - in this function it tries to allocate new domain
  - Calls iommu_attach_group() which will call attach_device. In that path
    it will try to enable ATS/PASID and hitting error.

As I mentioned in other reply I think even current code returns error from
amd_iommu_init_device() to GPU. But the issue is, in __iommu_attach_group() path
it detached device from current domain, failed to attach to new domain and
returned error. We didn't put the device back to old domain thats causing the
issue. Below series should fix this issue.

https://lore.kernel.org/linux-iommu/20230215052642.6016-1-vasant.hegde@amd.com/

-Vasant

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ