[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <uy55hkjdrlnotqzb6rdjktgwv4abp2qxhspi3o63lnj2qjoreu@aegvqlbnfe2p>
Date: Mon, 21 Apr 2025 11:55:47 -0400
From: Kent Overstreet <kent.overstreet@...ux.dev>
To: Raghavendra K T <raghavendra.kt@....com>
Cc: linux-mm@...ck.org, linux-ext4@...r.kernel.org,
linux-fsdevel@...r.kernel.org, wqu@...e.com
Subject: Re: scheduling while atomic on rc3 - migration + buffer heads
On Mon, Apr 21, 2025 at 09:17:18PM +0530, Raghavendra K T wrote:
> On 4/21/2025 8:44 PM, Kent Overstreet wrote:
>
> +Qu as I see similar report from him
>
> > This just popped up in one of my test runs.
> >
> > Given that it's buffer heads, it has to be the ext4 root filesystem, not
> > bcachefs.
> >
> > 00465 ========= TEST lz4_buffered
> > 00465
> > 00465 WATCHDOG 360
> > 00466 bcachefs (vdb): starting version 1.25: extent_flags opts=errors=panic,compression=lz4
> > 00466 bcachefs (vdb): initializing new filesystem
> > 00466 bcachefs (vdb): going read-write
> > 00466 bcachefs (vdb): marking superblocks
> > 00466 bcachefs (vdb): initializing freespace
> > 00466 bcachefs (vdb): done initializing freespace
> > 00466 bcachefs (vdb): reading snapshots table
> > 00466 bcachefs (vdb): reading snapshots done
> > 00466 bcachefs (vdb): done starting filesystem
> > 00466 starting copy
> > 00515 BUG: sleeping function called from invalid context at mm/util.c:743
> > 00515 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 120, name: kcompactd0
> > 00515 preempt_count: 1, expected: 0
> > 00515 RCU nest depth: 0, expected: 0
> > 00515 1 lock held by kcompactd0/120:
> > 00515 #0: ffffff80c0c558f0 (&mapping->i_private_lock){+.+.}-{3:3}, at: __buffer_migrate_folio+0x114/0x298
> > 00515 Preemption disabled at:
> > 00515 [<ffffffc08025fa84>] __buffer_migrate_folio+0x114/0x298
> > 00515 CPU: 11 UID: 0 PID: 120 Comm: kcompactd0 Not tainted 6.15.0-rc3-ktest-gb2a78fdf7d2f #20530 PREEMPT
> > 00515 Hardware name: linux,dummy-virt (DT)
> > 00515 Call trace:
> > 00515 show_stack+0x1c/0x30 (C)
> > 00515 dump_stack_lvl+0xb0/0xc0
> > 00515 dump_stack+0x14/0x20
> > 00515 __might_resched+0x180/0x288
> > 00515 folio_mc_copy+0x54/0x98
> > 00515 __migrate_folio.isra.0+0x68/0x168
> > 00515 __buffer_migrate_folio+0x280/0x298
> > 00515 buffer_migrate_folio_norefs+0x18/0x28
> > 00515 migrate_pages_batch+0x94c/0xeb8
> > 00515 migrate_pages_sync+0x84/0x240
> > 00515 migrate_pages+0x284/0x698
> > 00515 compact_zone+0xa40/0x10f8
> > 00515 kcompactd_do_work+0x204/0x498
> > 00515 kcompactd+0x3c4/0x400
> > 00515 kthread+0x13c/0x208
> > 00515 ret_from_fork+0x10/0x20
> > 00518 starting sync
> > 00519 starting rm
> > 00520 ========= FAILED TIMEOUT lz4_buffered in 360s
> >
>
> I have also seen similar stack with folio_mc_copy() while testing
> PTE A bit patches.
>
> IIUC, it has something to do with cond_resched() called from
> folio_mc_copy().
>
> (Thomas (tglx) mentioned long back that cond_resched() does not have the
> scope awareness), not sure where should the fix be done in these
> cases..
That's true, calling cond_resched() while a spinlock held is a bug.
> (I mean caller of the migrate_folio should call with no spinlock held
> but with mutex? )
Yes. migrate_folio() does large data copies, so we don't want all that
running in atomic context.
Powered by blists - more mailing lists