lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CABXGCsMMg59UXnv0EkmjsiZNUsZUBzBaUR8EnSv4FqOTmpOf7Q@mail.gmail.com>
Date: Thu, 9 Oct 2025 21:55:11 +0500
From: Mikhail Gavrilov <mikhail.v.gavrilov@...il.com>
To: "Pillai, Aurabindo" <Aurabindo.Pillai@....com>
Cc: "Kazlauskas, Nicholas" <Nicholas.Kazlauskas@....com>, "Wu, Ray" <Ray.Wu@....com>, 
	"Wheeler, Daniel" <Daniel.Wheeler@....com>, "Li, Roman" <Roman.Li@....com>, 
	"Chung, ChiaHsuan (Tom)" <ChiaHsuan.Chung@....com>, "Deucher, Alexander" <Alexander.Deucher@....com>, 
	amd-gfx list <amd-gfx@...ts.freedesktop.org>, 
	Linux List Kernel Mailing <linux-kernel@...r.kernel.org>, 
	Linux regressions mailing list <regressions@...ts.linux.dev>
Subject: Re: 6.18/regression/bisected – BUG: sleeping function called from invalid context at ./include/linux/sched/mm.h:321 after 6d31602a9f57

On Tue, Oct 7, 2025 at 10:55 PM Pillai, Aurabindo
<Aurabindo.Pillai@....com> wrote:
>
> [AMD Official Use Only - AMD Internal Distribution Only]
>
> Hi Mikhail,
>
> schedule_dc_vmin_vmax() has an allocation which is incorrectly using GFP_KERNEL, which is likely the reason for the "sleeping function called from invalid context". We have a fix queued for this week's update (switching it to GFP_NOWAIT).
>

Hi,

Just a quick update regarding the second WARN I mentioned earlier,
triggered at drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:138 amdgpu_vm_set_pasid().

After some additional bisecting, I found that this warning first appears
in the merge commit:
342f141ba9f4c9e39de342d047a5245e8f4cda19
Merge: 0faeb8cf99c0 a490c8d77d50
Author: Dave Airlie <airlied@...hat.com>
Date:   Mon Sep 22 08:44:52 2025 +1000
    Merge tag 'amd-drm-next-6.18-2025-09-19' of
https://gitlab.freedesktop.org/agd5f/linux into drm-next

Both merge parents (0faeb8cf from drm-next and a490c8d7 from amd-drm-next)
are clean on my setup — no WARNs or other regressions.

It turns out that this WARN is triggered by an interaction between the
two sides of the merge.
The AMD branch introduced the new amdgpu_vm_assert_locked(vm) check inside
amdgpu_vm_set_pasid(), while the drm-next side still contained a code path
(for example, through amdgpu_driver_open_kms()) that calls
amdgpu_vm_set_pasid() without holding the expected reservation lock.

As a result, the merge commit combined these two changes and started hitting
the dma_resv_assert_held() check in that function.
Both parents on their own are fine, so this is a merge-only side effect —
the stricter locking assertion from AMD’s branch met an older call path
from drm-next that doesn’t yet satisfy it.

I verified that removing just the amdgpu_vm_assert_locked(vm) call
from amdgpu_vm_set_pasid() eliminates the WARN completely,
while keeping all other recent VM locking changes intact.

-- 
Best Regards,
Mike Gavrilov.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ