lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAKMK7uEnPn=Tuc7bV-=Nr7gbk+uP9L0xH+HDL2+D4PewMmq6Cg@mail.gmail.com>
Date:   Fri, 6 Aug 2021 21:04:04 +0200
From:   Daniel Vetter <daniel.vetter@...ll.ch>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Dave Airlie <airlied@...il.com>,
        Alex Deucher <alexander.deucher@....com>,
        Christian König <christian.koenig@....com>,
        "Pan, Xinhui" <Xinhui.Pan@....com>,
        dri-devel <dri-devel@...ts.freedesktop.org>,
        LKML <linux-kernel@...r.kernel.org>,
        amd-gfx list <amd-gfx@...ts.freedesktop.org>
Subject: Re: [git pull] drm fixes for 5.14-rc4

On Thu, Aug 5, 2021 at 8:14 PM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> This might possibly have been fixed already by the previous drm pull,
> but I wanted to report it anyway, just in case.
>
> It happened after an uptime of over a week, so it might not be trivial
> to reproduce.
>
> It's a NULL pointer dereference in dc_stream_retain() with the code being
>
>         lock xadd %eax,0x390(%rdi) <-- trapping instruction
>
> and that's just the
>
>         kref_get(&stream->refcount);
>
> with a NULL 'stream' argument.
>
>   Call Trace:
>    dc_resource_state_copy_construct+0x13f/0x190 [amdgpu]
>    amdgpu_dm_atomic_commit_tail+0xd5/0x1540 [amdgpu]
>    commit_tail+0x97/0x180 [drm_kms_helper]
>    process_one_work+0x1df/0x3a0
>
> the oops is followed by a stream of
>
>   [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:55:crtc-1]
> hw_done or flip_done timed out
>
> and the machine was not usable afterwards.

Hm that part is a bit disappointing because the atomic modeset commit
helpers are designed to recover from this (assuming we didn't fry the
hw). But amdgpu does these waits in amdgpu_dm_atomic_check() which is
decidedly not great (you're not supposed to block on hw or a previous
in that atomic_check ever, because it can be called by userspace in a
TEST_ONLY mode to figure out whether a desired config would work), and
then returns that error to userspace, which is worse.

I guess that's another area where the integration between what atomic
modeset expects and the DC backend provides is suboptimal. I think the
data structures we managed to fuse together fairly ok, but the
check/commit flow and semantics are a bit a struggle.

Anyway this was just an aside, I guess given the bug the driver
wouldn't have recovered anyway.
-Daniel

> lspci says this is a
>
>  49:00.0 VGA compatible controller [0300]:
>    Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere
>    [Radeon RX 470/480/570/570X/580/580X/590]
>    [1002:67df] (rev e7) (prog-if 00 [VGA controller])
>
> Full oops in the attachment, but I think the above is all the really
> salient details.
>
>                    Linus



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ