[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CABXGCsMMg59UXnv0EkmjsiZNUsZUBzBaUR8EnSv4FqOTmpOf7Q@mail.gmail.com>
Date: Thu, 9 Oct 2025 21:55:11 +0500
From: Mikhail Gavrilov <mikhail.v.gavrilov@...il.com>
To: "Pillai, Aurabindo" <Aurabindo.Pillai@....com>
Cc: "Kazlauskas, Nicholas" <Nicholas.Kazlauskas@....com>, "Wu, Ray" <Ray.Wu@....com>,
"Wheeler, Daniel" <Daniel.Wheeler@....com>, "Li, Roman" <Roman.Li@....com>,
"Chung, ChiaHsuan (Tom)" <ChiaHsuan.Chung@....com>, "Deucher, Alexander" <Alexander.Deucher@....com>,
amd-gfx list <amd-gfx@...ts.freedesktop.org>,
Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
Linux regressions mailing list <regressions@...ts.linux.dev>
Subject: Re: 6.18/regression/bisected – BUG: sleeping function called from invalid context at ./include/linux/sched/mm.h:321 after 6d31602a9f57
On Tue, Oct 7, 2025 at 10:55 PM Pillai, Aurabindo
<Aurabindo.Pillai@....com> wrote:
>
> [AMD Official Use Only - AMD Internal Distribution Only]
>
> Hi Mikhail,
>
> schedule_dc_vmin_vmax() has an allocation which is incorrectly using GFP_KERNEL, which is likely the reason for the "sleeping function called from invalid context". We have a fix queued for this week's update (switching it to GFP_NOWAIT).
>
Hi,
Just a quick update regarding the second WARN I mentioned earlier,
triggered at drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:138 amdgpu_vm_set_pasid().
After some additional bisecting, I found that this warning first appears
in the merge commit:
342f141ba9f4c9e39de342d047a5245e8f4cda19
Merge: 0faeb8cf99c0 a490c8d77d50
Author: Dave Airlie <airlied@...hat.com>
Date: Mon Sep 22 08:44:52 2025 +1000
Merge tag 'amd-drm-next-6.18-2025-09-19' of
https://gitlab.freedesktop.org/agd5f/linux into drm-next
Both merge parents (0faeb8cf from drm-next and a490c8d7 from amd-drm-next)
are clean on my setup — no WARNs or other regressions.
It turns out that this WARN is triggered by an interaction between the
two sides of the merge.
The AMD branch introduced the new amdgpu_vm_assert_locked(vm) check inside
amdgpu_vm_set_pasid(), while the drm-next side still contained a code path
(for example, through amdgpu_driver_open_kms()) that calls
amdgpu_vm_set_pasid() without holding the expected reservation lock.
As a result, the merge commit combined these two changes and started hitting
the dma_resv_assert_held() check in that function.
Both parents on their own are fine, so this is a merge-only side effect —
the stricter locking assertion from AMD’s branch met an older call path
from drm-next that doesn’t yet satisfy it.
I verified that removing just the amdgpu_vm_assert_locked(vm) call
from amdgpu_vm_set_pasid() eliminates the WARN completely,
while keeping all other recent VM locking changes intact.
--
Best Regards,
Mike Gavrilov.
Powered by blists - more mailing lists