linux-ext4 - Re: scheduling while atomic on rc3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7b395468-d72d-42c1-b891-75f127a1c534@amd.com>
Date: Mon, 21 Apr 2025 21:17:18 +0530
From: Raghavendra K T <raghavendra.kt@....com>
To: Kent Overstreet <kent.overstreet@...ux.dev>, linux-mm@...ck.org,
 linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org
Cc: wqu@...e.com
Subject: Re: scheduling while atomic on rc3 - migration + buffer heads

On 4/21/2025 8:44 PM, Kent Overstreet wrote:

+Qu as I see similar report from him

> This just popped up in one of my test runs.
> 
> Given that it's buffer heads, it has to be the ext4 root filesystem, not
> bcachefs.
> 
> 00465 ========= TEST   lz4_buffered
> 00465
> 00465 WATCHDOG 360
> 00466 bcachefs (vdb): starting version 1.25: extent_flags opts=errors=panic,compression=lz4
> 00466 bcachefs (vdb): initializing new filesystem
> 00466 bcachefs (vdb): going read-write
> 00466 bcachefs (vdb): marking superblocks
> 00466 bcachefs (vdb): initializing freespace
> 00466 bcachefs (vdb): done initializing freespace
> 00466 bcachefs (vdb): reading snapshots table
> 00466 bcachefs (vdb): reading snapshots done
> 00466 bcachefs (vdb): done starting filesystem
> 00466 starting copy
> 00515 BUG: sleeping function called from invalid context at mm/util.c:743
> 00515 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 120, name: kcompactd0
> 00515 preempt_count: 1, expected: 0
> 00515 RCU nest depth: 0, expected: 0
> 00515 1 lock held by kcompactd0/120:
> 00515  #0: ffffff80c0c558f0 (&mapping->i_private_lock){+.+.}-{3:3}, at: __buffer_migrate_folio+0x114/0x298
> 00515 Preemption disabled at:
> 00515 [<ffffffc08025fa84>] __buffer_migrate_folio+0x114/0x298
> 00515 CPU: 11 UID: 0 PID: 120 Comm: kcompactd0 Not tainted 6.15.0-rc3-ktest-gb2a78fdf7d2f #20530 PREEMPT
> 00515 Hardware name: linux,dummy-virt (DT)
> 00515 Call trace:
> 00515  show_stack+0x1c/0x30 (C)
> 00515  dump_stack_lvl+0xb0/0xc0
> 00515  dump_stack+0x14/0x20
> 00515  __might_resched+0x180/0x288
> 00515  folio_mc_copy+0x54/0x98
> 00515  __migrate_folio.isra.0+0x68/0x168
> 00515  __buffer_migrate_folio+0x280/0x298
> 00515  buffer_migrate_folio_norefs+0x18/0x28
> 00515  migrate_pages_batch+0x94c/0xeb8
> 00515  migrate_pages_sync+0x84/0x240
> 00515  migrate_pages+0x284/0x698
> 00515  compact_zone+0xa40/0x10f8
> 00515  kcompactd_do_work+0x204/0x498
> 00515  kcompactd+0x3c4/0x400
> 00515  kthread+0x13c/0x208
> 00515  ret_from_fork+0x10/0x20
> 00518 starting sync
> 00519 starting rm
> 00520 ========= FAILED TIMEOUT lz4_buffered in 360s
> 

I have also seen similar stack with folio_mc_copy() while testing
PTE A bit patches.

IIUC, it has something to do with cond_resched() called from
folio_mc_copy().

(Thomas (tglx) mentioned long back that cond_resched() does not have the
scope awareness), not sure where should the fix be done in these
cases..

(I mean caller of the migrate_folio should call with no spinlock held
but with mutex? )

Regards
- Raghu