[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1a1cc125-9314-f569-a6c4-40fc4509a377@amd.com>
Date: Thu, 4 Nov 2021 08:39:18 +0100
From: Christian König <christian.koenig@....com>
To: Karol Herbst <kherbst@...hat.com>, Sven Joachim <svenjoac@....de>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
"Erhard F." <erhard_f@...lbox.org>,
nouveau <nouveau@...ts.freedesktop.org>,
LKML <linux-kernel@...r.kernel.org>, stable@...r.kernel.org,
Huang Rui <ray.huang@....com>
Subject: Re: [Nouveau] [PATCH 5.10 32/77] drm/ttm: fix memleak in
ttm_transfered_destroy
Am 03.11.21 um 22:25 schrieb Karol Herbst:
> On Wed, Nov 3, 2021 at 9:47 PM Sven Joachim <svenjoac@....de> wrote:
>> On 2021-11-03 21:32 +0100, Karol Herbst wrote:
>>
>>> On Wed, Nov 3, 2021 at 9:29 PM Karol Herbst <kherbst@...hat.com> wrote:
>>>> On Wed, Nov 3, 2021 at 8:52 PM Sven Joachim <svenjoac@....de> wrote:
>>>>> On 2021-11-01 10:17 +0100, Greg Kroah-Hartman wrote:
>>>>>
>>>>>> From: Christian König <christian.koenig@....com>
>>>>>>
>>>>>> commit 0db55f9a1bafbe3dac750ea669de9134922389b5 upstream.
>>>>>>
>>>>>> We need to cleanup the fences for ghost objects as well.
>>>>>>
>>>>>> Signed-off-by: Christian König <christian.koenig@....com>
>>>>>> Reported-by: Erhard F. <erhard_f@...lbox.org>
>>>>>> Tested-by: Erhard F. <erhard_f@...lbox.org>
>>>>>> Reviewed-by: Huang Rui <ray.huang@....com>
>>>>>> Bug: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214029&data=04%7C01%7Cchristian.koenig%40amd.com%7C9b70f83c53c74b35fee808d99f1091b3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637715715806624439%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=UIo0hw0OHeLlGL%2Bcj%2Fjt%2FgTwniaJoNmhgDHSFvymhCc%3D&reserved=0
>>>>>> Bug: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214447&data=04%7C01%7Cchristian.koenig%40amd.com%7C9b70f83c53c74b35fee808d99f1091b3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637715715806634433%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=TIAUb6AdYm2Bo0%2BvFZUFPS8yu55orjnfxMLCmUgC%2FDk%3D&reserved=0
>>>>>> CC: <stable@...r.kernel.org>
>>>>>> Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fpatch%2Fmsgid%2F20211020173211.2247-1-christian.koenig%40amd.com&data=04%7C01%7Cchristian.koenig%40amd.com%7C9b70f83c53c74b35fee808d99f1091b3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637715715806634433%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=c9i7AR44MVUyZuXHZkLOCBx2%2BZeetq8alGtbz0Wgqzk%3D&reserved=0
>>>>>> Signed-off-by: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
>>>>>> ---
>>>>>> drivers/gpu/drm/ttm/ttm_bo_util.c | 1 +
>>>>>> 1 file changed, 1 insertion(+)
>>>>>>
>>>>>> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
>>>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
>>>>>> @@ -322,6 +322,7 @@ static void ttm_transfered_destroy(struc
>>>>>> struct ttm_transfer_obj *fbo;
>>>>>>
>>>>>> fbo = container_of(bo, struct ttm_transfer_obj, base);
>>>>>> + dma_resv_fini(&fbo->base.base._resv);
>>>>>> ttm_bo_put(fbo->bo);
>>>>>> kfree(fbo);
>>>>>> }
>>>>> Alas, this innocuous looking commit causes one of my systems to lock up
>>>>> as soon as run startx. This happens with the nouveau driver, two other
>>>>> systems with radeon and intel graphics are not affected. Also I only
>>>>> noticed it in 5.10.77. Kernels 5.15 and 5.14.16 are not affected, and I
>>>>> do not use 5.4 anymore.
>>>>>
>>>>> I am not familiar with nouveau's ttm management and what has changed
>>>>> there between 5.10 and 5.14, but maybe one of their developers can shed
>>>>> a light on this.
>>>>>
>>>>> Cheers,
>>>>> Sven
>>>>>
>>>> could be related to 265ec0dd1a0d18f4114f62c0d4a794bb4e729bc1
>>> maybe not.. but I did remember there being a few tmm related patches
>>> which only hurt nouveau :/ I guess one could do a git bisect to
>>> figure out what change "fixes" it.
>> Maybe, but since the memory leaks reported by Erhard only started to
>> show up in 5.14 (if I read the bugzilla reports correctly), perhaps the
>> patch should simply be reverted on earlier kernels?
>>
> Yeah, I think this is probably the right approach.
I agree. The problem is this memory leak could potentially happen with
5.10 as wel, just much much much less likely.
But my guess is that 5.10 is so buggy that when the leak does NOT happen
we double free and obviously causing a crash.
So for the sake of stability please don't apply this patch to 5.10. I'm
going to comment on the original bug report as well.
Thanks,
Christian.
>
>>> On which GPU do you see this problem?
>> On an old GeForce 8500 GT, the whole PC is rather ancient.
>>
>> Cheers,
>> Sven
>>
Powered by blists - more mailing lists