[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aFS2CMu277HzT7wW@hyeyoo>
Date: Fri, 20 Jun 2025 10:14:48 +0900
From: Harry Yoo <harry.yoo@...cle.com>
To: David Wang <00107082@....com>
Cc: surenb@...gle.com, cachen@...estorage.com, ahuang12@...ovo.com,
akpm@...ux-foundation.org, bhe@...hat.com, hch@...radead.org,
kent.overstreet@...ux.dev, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, lkp@...el.com, mjguzik@...il.com,
oe-lkp@...ts.linux.dev, oliver.sang@...el.com, urezki@...il.com
Subject: Re: Kernel crash due to alloc_tag_top_users() being called when
!mem_profiling_support?
On Thu, Jun 19, 2025 at 11:08:09PM +0800, David Wang wrote:
> > On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote:
> > >
> > > Hello,
> > >
> > > for this change, we reported
> > > "[linux-next:master] [lib/test_vmalloc.c] 7fc85b92db: Mem-Info"
> > > in
> > > https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/
> > >
> > > at that time, we made some tests with x86_64 config which runs well.
> > >
> > > now we noticed the commit is in mainline now.
> >
> > (Re-sending due to not Ccing people and the list...)
> >
> > Hi, I'm facing the same error on my testing environment.
> >
> > I think this is related to memory allocation profiling & code tagging
> > subsystems rather than vmalloc, so let's add related folks to Cc.
> >
> > After a quick skimming of the code, it seems the condition
> > to trigger this is that on 1) MEM_ALLOC_PROFILING is compiled but
> > 2) not enabled by default. and 3) allocation somehow failed, calling
> > alloc_tag_top_users().
> >
> > I see "Memory allocation profiling is not supported!" in the dmesg,
> > which means it did not alloc & inititialize alloc_tag_cttype properly,
> > but alloc_tag_top_users() tries to acquire the semaphore.
> >
> > I think the kernel should not call alloc_tag_top_users() at all (or it
> > should return an error) if mem_profiling_support == false?
> >
> > Does the following work on your testing environment?
> >
> > (Only did very light testing on my QEMU, but seems to fix the issue for me.)
> >
> > diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> > index d48b80f3f007..57d4d5673855 100644
> > --- a/lib/alloc_tag.c
> > +++ b/lib/alloc_tag.c
> > @@ -134,7 +134,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
> > struct codetag_bytes n;
> > unsigned int i, nr = 0;
> >
> > - if (can_sleep)
> > + if (!mem_profiling_support)
> > + return 0;
> > + else if (can_sleep)
> > codetag_lock_module_list(alloc_tag_cttype, true);
> > else if (!codetag_trylock_module_list(alloc_tag_cttype))
> > return 0;
>
> I think you are correct, this was introduced/exposed by
> commit 780138b1 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
Oh, I wasn't aware of that commit.
Thanks for pointing it out!
Indeed, prior to 780138b1, it was unconditionally allocated,
so it shouldn't have been a problem unless the allocation fails.
I've sent a formal patch to help testing.
> (Before the commit, the BUG would only be triggered when alloc_tag_init failed)
That is nearly impossible to trigger as the allocation size is
too small to fail, and the allocation is done at boot step,
so it shouldn't fail in practice.
Or should we be more paranoid and fix it in v6.12 stable?
--
Cheers,
Harry / Hyeonggon
Powered by blists - more mailing lists