[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b9fb0a69-6cfb-4285-8118-ef5301115948@suse.cz>
Date: Thu, 2 Oct 2025 10:14:55 +0200
From: Vlastimil Babka <vbabka@...e.cz>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>,
Harry Yoo <harry.yoo@...cle.com>
Cc: ranxiaokai627@....com, Andrew Morton <akpm@...ux-foundation.org>,
cl@...two.org, David Rientjes <rientjes@...gle.com>,
Roman Gushchin <roman.gushchin@...ux.dev>,
Alexei Starovoitov <ast@...nel.org>, LKML <linux-kernel@...r.kernel.org>,
linux-mm <linux-mm@...ck.org>, ran.xiaokai@....com.cn
Subject: Re: [PATCH] slab: Fix using this_cpu_ptr() in preemptible context
On 9/30/25 13:19, Alexei Starovoitov wrote:
> On Tue, Sep 30, 2025 at 12:54 PM Harry Yoo <harry.yoo@...cle.com> wrote:
>>
>> On Tue, Sep 30, 2025 at 08:34:02AM +0000, ranxiaokai627@....com wrote:
>> > From: Ran Xiaokai <ran.xiaokai@....com.cn>
>> >
>> > defer_free() maybe called in preemptible context, this will
>> > trigger the below warning message:
>> >
>> > BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1
>> > caller is defer_free+0x1b/0x60
>> > Call Trace:
>> > <TASK>
>> > dump_stack_lvl+0xac/0xc0
>> > check_preemption_disabled+0xbe/0xe0
>> > defer_free+0x1b/0x60
>> > kfree_nolock+0x1eb/0x2b0
>> > alloc_slab_obj_exts+0x356/0x390
>
> Please share config and repro details, since the stack trace
> looks theoretical, but you somehow got it?
> This is not CONFIG_SLUB_TINY, but kfree_nolock()
> sees locked per-cpu slab?
Could it be just the "slab != c->slab" condition in do_slab_free()? That's
more likely. However...
> Is this PREEMPT_RT ?
>
>> > __alloc_tagging_slab_alloc_hook+0xa0/0x300
>> > __kmalloc_cache_noprof+0x1c4/0x5c0
>> > __set_page_owner+0x10d/0x1c0
This is the part that puzzles me, where do we call kmalloc from
__set_page_owner()? And in a way that it loses the GFP_KERNEL passed all the
way? I don't even see a lib/stackdepot function here.
>> > post_alloc_hook+0x84/0xf0
>> > get_page_from_freelist+0x73b/0x1380
>> > __alloc_frozen_pages_noprof+0x110/0x2c0
>> > alloc_pages_mpol+0x44/0x140
>> > alloc_slab_page+0xac/0x150
>> > allocate_slab+0x78/0x3a0
>> > ___slab_alloc+0x76b/0xed0
>> > __slab_alloc.constprop.0+0x5a/0xb0
>> > __kmalloc_noprof+0x3dc/0x6d0
>> > __list_lru_init+0x6c/0x210
This has a kcalloc(GFP_KERNEL).
>> > alloc_super+0x3b6/0x470
>> > sget_fc+0x5f/0x3a0
>> > get_tree_nodev+0x27/0x90
>> > vfs_get_tree+0x26/0xc0
>> > vfs_kern_mount.part.0+0xb6/0x140
>> > kern_mount+0x24/0x40
>> > init_pipe_fs+0x4f/0x70
>> > do_one_initcall+0x62/0x2e0
>> > kernel_init_freeable+0x25b/0x4b0
Here we've set the full gfp_allowed_mask already so it's not masking our
GFP_KERNEL.
>> > kernel_init+0x1a/0x1c0
>> > ret_from_fork+0x290/0x2e0
>> > ret_from_fork_asm+0x11/0x20
>> > </TASK>
>> >
>> > Replace this_cpu_ptr with raw_cpu_ptr to eliminate
>> > the above warning message.
>> >
>> > Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().")
>>
>> There's no mainline commit hash yet, should be adjusted later.
>>
>> > Signed-off-by: Ran Xiaokai <ran.xiaokai@....com.cn>
>> > ---
>> > mm/slub.c | 4 ++--
>> > 1 file changed, 2 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/mm/slub.c b/mm/slub.c
>> > index 1433f5b988f7..67c57f1b5a86 100644
>> > --- a/mm/slub.c
>> > +++ b/mm/slub.c
>> > @@ -6432,7 +6432,7 @@ static void free_deferred_objects(struct irq_work *work)
>> >
>> > static void defer_free(struct kmem_cache *s, void *head)
>> > {
>> > - struct defer_free *df = this_cpu_ptr(&defer_free_objects);
>> > + struct defer_free *df = raw_cpu_ptr(&defer_free_objects);
>>
>> This suppresses warning, but let's answer the question;
>> Is it actually safe to not disable preemption here?
>>
>> > if (llist_add(head + s->offset, &df->objects))
>>
>> Let's say a task was running on CPU X and migrated to a different CPU
>> (say, Y) after returning from llist_add() or before calling llist_add(),
>> then we're queueing the irq_work of CPU X on CPU Y.
>>
>> I think technically this should be safe because, although we're using
>> per-cpu irq_work here, the irq_work framework itself is designed to handle
>> concurrent access from multiple CPUs (otherwise it won't be safe to use
>> a global irq_work like in other places) by using lockless list, which
>> uses try_cmpxchg() and xchg() for atomic update.
>>
>> So if I'm not missing something it should be safe, but it was very
>> confusing to confirm that it's safe as we're using per-cpu irq_work...
>>
>> I don't think these paths are very performance critical, so why not disable
>> preemption instead of replacing it with raw_cpu_ptr()?
>
> +1.
> Though irq_work_queue() works for any irq_work it should
> be used for current cpu, since it IPIs itself.
> So pls use guard(preempt)(); instead.
Agreed. But we should fix it like this. But the report is strange.
Powered by blists - more mailing lists