linux-kernel - Re: [PATCH 0/2] drm/amd: fix VRR race condition during IRQ handling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <deb6d962-f24e-4769-b313-be3b0efb873b@math.uni-bielefeld.de>
Date: Sun, 8 Sep 2024 13:23:13 +0200
From: Tobias Jakobi <tjakobi@...h.uni-bielefeld.de>
To: Christopher Snowhill <chris@...e54.net>, amd-gfx@...ts.freedesktop.org,
 dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/2] drm/amd: fix VRR race condition during IRQ handling

On 9/8/24 09:35, Christopher Snowhill wrote:

> On Mon Sep 2, 2024 at 2:40 AM PDT, tjakobi wrote:
>> From: Tobias Jakobi <tjakobi@...h.uni-bielefeld.de>
>>
>> Hello,
>>
>> this fixes a nasty race condition in the set_drr() callbacks for DCN10
>> and DCN35 that has existed now since quite some time, see this GitLab
>> issue for reference.
>>
>> https://gitlab.freedesktop.org/drm/amd/-/issues/3142
>>
>> The report just focuses von DCN10, but the same problem also exists in
>> the DCN35 code.
> Does the problem not exist in the following references to funcs->set_drr?
>
> drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c:      if (pipe_ctx->stream_res.tg->funcs->set_drr)
> drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c:              pipe_ctx->stream_res.tg->funcs->set_drr(
> drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c:              pipe_ctx[i]->stream_res.tg->funcs->set_drr(
> drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c:        if (pipe_ctx->stream_res.tg->funcs->set_drr)
> drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c:                pipe_ctx->stream_res.tg->funcs->set_drr(
> drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c:                if (pipe_ctx->stream_res.tg->funcs->set_drr)
> drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c:                        pipe_ctx->stream_res.tg->funcs->set_drr(
> drivers/gpu/drm/amd/display/dc/hwss/dcn31/dcn31_hwseq.c:        if (pipe_ctx->stream_res.tg->funcs->set_drr)
> drivers/gpu/drm/amd/display/dc/hwss/dcn31/dcn31_hwseq.c:                pipe_ctx->stream_res.tg->funcs->set_drr(
> drivers/gpu/drm/amd/display/dc/hwss/dcn401/dcn401_hwseq.c:      if (pipe_ctx->stream_res.tg->funcs->set_drr)
> drivers/gpu/drm/amd/display/dc/hwss/dcn401/dcn401_hwseq.c:              pipe_ctx->stream_res.tg->funcs->set_drr(

Maybe. But the big difference I see here, is that in this code there 
isn't even any kind of NULL check applied to tg. Or most of the members 
of *pipe_ctx. If there really is the same kind of problem here, then one 
would need to rewrite a bit more code to fix stuff.

E.g. in the case of  dcn31_hwseq.c, the questionable code is in 
dcn31_reset_back_end_for_pipe(), which is static and only called from 
dcn31_reset_hw_ctx_wrap(). Which is assigned to the .reset_hw_ctx_wrap 
callback. And this specific callback, from what I can see, is only 
called from dce110_reset_hw_ctx_wrap(). Which is then assigned to the 
.apply_ctx_to_hw callback. The callback is only called from 
dc_commit_state_no_check(). That one is static again, and called from 
dc_commit_streams().

I could trace this even further. My point is: I don't think this is 
called from any IRQ handler code. And given the depth and complexity of 
the callgraph, I have to admit, that, at least at this point, this is a 
bit over my head.

Sure, I could now sprinkle a bunch of x != NULL in the code, but that 
would be merely voodoo. And I usually try to have a theoretical basis 
when I apply changes to code.

Maybe if someone from the AMD display team could give some insight if 
there still is potentially vulnerable code in some of the instances that 
Christopher has posted, then I would gladly take a look.

With best wishes,
Tobias

>
>> With best wishes,
>> Tobias
>>
>> Tobias Jakobi (2):
>>    drm/amd/display: Avoid race between dcn10_set_drr() and
>>      dc_state_destruct()
>>    drm/amd/display: Avoid race between dcn35_set_drr() and
>>      dc_state_destruct()
>>
>>   .../amd/display/dc/hwss/dcn10/dcn10_hwseq.c   | 20 +++++++++++--------
>>   .../amd/display/dc/hwss/dcn35/dcn35_hwseq.c   | 20 +++++++++++--------
>>   2 files changed, 24 insertions(+), 16 deletions(-)