[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <319c2913-78fa-4b35-828e-7d4d1d691a93@math.uni-bielefeld.de>
Date: Thu, 26 Dec 2024 13:29:15 +0100
From: Tobias Jakobi <tjakobi@...h.uni-bielefeld.de>
To: Chris Bainbridge <chris.bainbridge@...il.com>
Cc: Mario Limonciello <mario.limonciello@....com>,
amd-gfx@...ts.freedesktop.org, alex.hung@....com,
regressions@...ts.linux.dev, rafael@...nel.org, lenb@...nel.org,
linux-acpi@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] drm/amd: Fix random crashes due to bad kfree
On 12/26/24 02:27, Chris Bainbridge wrote:
> On Thu, Dec 26, 2024 at 12:19:02AM +0100, Tobias Jakobi wrote:
>> Hi Chris!
>>
>> On 12/26/24 00:09, Chris Bainbridge wrote:
>>
>>> Commit c6a837088bed ("drm/amd/display: Fetch the EDID from _DDC if
>>> available for eDP") added function dm_helpers_probe_acpi_edid, which
>>> fetches the EDID from the BIOS by calling acpi_video_get_edid.
>>> acpi_video_get_edid returns a pointer to the EDID, but this pointer does
>>> not originate from kmalloc - it is actually the internal "pointer" field
>>> from an acpi_buffer struct (which did come from kmalloc).
>>> dm_helpers_probe_acpi_edid then attempts to kfree the EDID pointer,
>>> resulting in memory corruption which leads to random, intermittent
>>> crashes (e.g. 4% of boots will fail with some Oops).
>>>
>>> Fix this by allocating a new array (which can be safely freed) for the
>>> EDID data in acpi_video_get_edid, and correctly freeing the acpi_buffer.
>> Hmm, maybe I'm missing something here. But shouldn't it suffice to just
>> remove the kfree call in dm_helpers_probe_acpi_edid()?
> Yes, that would work to fix the bad kfree, but there would be a small
> memory leak of the acpi_buffer struct. It's not a huge problem since
> this code is rarely run, and the Nouveau code has never tried to free
> the edid buffer and apparently nobody noticed, but it would be better to
> do the correct thing.
OK, thanks for explaining. I didn't immediately understand that
something was leaking memory. Only that we were freeing something that
we are not supposed to free.
> One other curiosity is this comment in the code that allocates the
> memory:
>
> case ACPI_ALLOCATE_BUFFER:
> /*
> * Allocate a new buffer. We directectly call acpi_os_allocate here to
> * purposefully bypass the (optionally enabled) internal allocation
> * tracking mechanism since we only want to track internal
> * allocations. Note: The caller should use acpi_os_free to free this
> * buffer created via ACPI_ALLOCATE_BUFFER.
> */
>
> Which makes me wonder if all the calls to kfree on acpi_buffer structs
> with ACPI_ALLOCATE_BUFFER in acpi_video.c should actually be calls to
> acpi_os_free instead? I used kfree just for consistency with the
> existing code.
Wouldn't it make more sense to do the memdup handling in
acpi_video_device_EDID()? This way you have both alloc and free in the
same function. But I'm no expert when it comes to the ACPI kernel code.
Just my two cents :-D
With best wishes,
Tobias
Powered by blists - more mailing lists