lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 25 Oct 2021 13:23:36 +0200
From:   Christian König <ckoenig.leichtzumerken@...il.com>
To:     Paul Menzel <pmenzel@...gen.mpg.de>,
        Jörg Rödel <joro@...tes.org>,
        Suravee Suthikulpanit <suravee.suthikulpanit@....com>
Cc:     iommu@...ts.linux-foundation.org,
        Alex Deucher <alexander.deucher@....com>,
        Christian König <christian.koenig@....com>,
        Xinhui Pan <Xinhui.Pan@....com>, amd-gfx@...ts.freedesktop.org,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        x86@...nel.org, LKML <linux-kernel@...r.kernel.org>,
        it+linux-iommu@...gen.mpg.de
Subject: Re: I got an IOMMU IO page fault. What to do now?

Hi Paul,

not sure how the IOMMU gives out addresses, but the printed ones look 
suspicious to me. Something like we are using an invalid address like -1 
or similar.

Can you try that on an up to date kernel as well? E.g. ideally bleeding 
edge amd-staging-drm-next from Alex repository.

Regards,
Christian.

Am 25.10.21 um 12:25 schrieb Paul Menzel:
> Dear Linux folks,
>
>
> On a Dell OptiPlex 5055, Linux 5.10.24 logged the IOMMU messages 
> below. (GPU hang in amdgpu issue #1762 [1] might be related.)
>
>     $ lspci -nn -s 05:00.0
>     05:00.0 VGA compatible controller [0300]: Advanced Micro Devices, 
> Inc. [AMD/ATI] Oland [Radeon HD 8570 / R7 240/340 OEM] [1002:6611] 
> (rev 87)
>     $ dmesg
>     […]
>     [6318399.745242] amdgpu 0000:05:00.0: AMD-Vi: Event logged 
> [IO_PAGE_FAULT domain=0x000c address=0xfffffff0c0 flags=0x0020]
>     [6318399.757283] amdgpu 0000:05:00.0: AMD-Vi: Event logged 
> [IO_PAGE_FAULT domain=0x000c address=0xfffffff7c0 flags=0x0020]
>     [6318399.769154] amdgpu 0000:05:00.0: AMD-Vi: Event logged 
> [IO_PAGE_FAULT domain=0x000c address=0xffffffe0c0 flags=0x0020]
>     [6318399.780913] amdgpu 0000:05:00.0: AMD-Vi: Event logged 
> [IO_PAGE_FAULT domain=0x000c address=0xfffffffec0 flags=0x0020]
>     [6318399.792734] amdgpu 0000:05:00.0: AMD-Vi: Event logged 
> [IO_PAGE_FAULT domain=0x000c address=0xffffffe5c0 flags=0x0020]
>     [6318399.804309] amdgpu 0000:05:00.0: AMD-Vi: Event logged 
> [IO_PAGE_FAULT domain=0x000c address=0xffffffd0c0 flags=0x0020]
>     [6318399.816091] amdgpu 0000:05:00.0: AMD-Vi: Event logged 
> [IO_PAGE_FAULT domain=0x000c address=0xffffffecc0 flags=0x0020]
>     [6318399.827407] amdgpu 0000:05:00.0: AMD-Vi: Event logged 
> [IO_PAGE_FAULT domain=0x000c address=0xffffffd3c0 flags=0x0020]
>     [6318399.838708] amdgpu 0000:05:00.0: AMD-Vi: Event logged 
> [IO_PAGE_FAULT domain=0x000c address=0xffffffc0c0 flags=0x0020]
>     [6318399.850029] amdgpu 0000:05:00.0: AMD-Vi: Event logged 
> [IO_PAGE_FAULT domain=0x000c address=0xffffffdac0 flags=0x0020]
>     [6318399.861311] AMD-Vi: Event logged [IO_PAGE_FAULT 
> device=05:00.0 domain=0x000c address=0xffffffc1c0 flags=0x0020]
>     [6318399.872044] AMD-Vi: Event logged [IO_PAGE_FAULT 
> device=05:00.0 domain=0x000c address=0xffffffc8c0 flags=0x0020]
>     [6318399.882797] AMD-Vi: Event logged [IO_PAGE_FAULT 
> device=05:00.0 domain=0x000c address=0xffffffb0c0 flags=0x0020]
>     [6318399.893655] AMD-Vi: Event logged [IO_PAGE_FAULT 
> device=05:00.0 domain=0x000c address=0xffffffcfc0 flags=0x0020]
>     [6318399.904445] AMD-Vi: Event logged [IO_PAGE_FAULT 
> device=05:00.0 domain=0x000c address=0xffffffb6c0 flags=0x0020]
>     [6318399.915222] AMD-Vi: Event logged [IO_PAGE_FAULT 
> device=05:00.0 domain=0x000c address=0xffffffa0c0 flags=0x0020]
>     [6318399.925931] AMD-Vi: Event logged [IO_PAGE_FAULT 
> device=05:00.0 domain=0x000c address=0xffffffbdc0 flags=0x0020]
>     [6318399.936691] AMD-Vi: Event logged [IO_PAGE_FAULT 
> device=05:00.0 domain=0x000c address=0xffffffa4c0 flags=0x0020]
>     [6318399.947479] AMD-Vi: Event logged [IO_PAGE_FAULT 
> device=05:00.0 domain=0x000c address=0xffffff90c0 flags=0x0020]
>     [6318399.958270] AMD-Vi: Event logged [IO_PAGE_FAULT 
> device=05:00.0 domain=0x000c address=0xffffffabc0 flags=0x0020]
>
> As this is not reproducible, how would debugging go? (The system was 
> rebooted in the meantime.) What options should be enabled, that next 
> time the required information is logged, or what commands should I 
> execute when the system is still in that state, so the bug (driver, 
> userspace, …) can be pinpointed and fixed?
>
>
> Kind regards,
>
> Paul
>
>
> [1]: https://gitlab.freedesktop.org/drm/amd/-/issues/1762
>      "Oland [Radeon HD 8570 / R7 240/340 OEM]: GPU hang"

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ