[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJuCfpFJabb02OK8Rj08d7WU_7AM674i=vsZxzfw7i7h-PGftQ@mail.gmail.com>
Date: Tue, 16 Sep 2025 14:46:47 -0700
From: Suren Baghdasaryan <surenb@...gle.com>
To: Usama Arif <usamaarif642@...il.com>
Cc: Vlastimil Babka <vbabka@...e.cz>, akpm@...ux-foundation.org, kent.overstreet@...ux.dev,
hannes@...xchg.org, rientjes@...gle.com, roman.gushchin@...ux.dev,
harry.yoo@...cle.com, shakeel.butt@...ux.dev, 00107082@....com,
pyyjason@...il.com, pasha.tatashin@...een.com, souravpanda@...gle.com,
linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/1] alloc_tag: mark inaccurate allocation counters in
/proc/allocinfo output
On Tue, Sep 16, 2025 at 2:11 PM Usama Arif <usamaarif642@...il.com> wrote:
>
>
>
> On 16/09/2025 16:51, Suren Baghdasaryan wrote:
> > On Tue, Sep 16, 2025 at 5:57 AM Vlastimil Babka <vbabka@...e.cz> wrote:
> >>
> >> On 9/16/25 01:02, Suren Baghdasaryan wrote:
> >>> While rare, memory allocation profiling can contain inaccurate counters
> >>> if slab object extension vector allocation fails. That allocation might
> >>> succeed later but prior to that, slab allocations that would have used
> >>> that object extension vector will not be accounted for. To indicate
> >>> incorrect counters, "accurate:no" marker is appended to the call site
> >>> line in the /proc/allocinfo output.
> >>> Bump up /proc/allocinfo version to reflect the change in the file format
> >>> and update documentation.
> >>>
> >>> Example output with invalid counters:
> >>> allocinfo - version: 2.0
> >>> 0 0 arch/x86/kernel/kdebugfs.c:105 func:create_setup_data_nodes
> >>> 0 0 arch/x86/kernel/alternative.c:2090 func:alternatives_smp_module_add
> >>> 0 0 arch/x86/kernel/alternative.c:127 func:__its_alloc accurate:no
> >>> 0 0 arch/x86/kernel/fpu/regset.c:160 func:xstateregs_set
> >>> 0 0 arch/x86/kernel/fpu/xstate.c:1590 func:fpstate_realloc
> >>> 0 0 arch/x86/kernel/cpu/aperfmperf.c:379 func:arch_enable_hybrid_capacity_scale
> >>> 0 0 arch/x86/kernel/cpu/amd_cache_disable.c:258 func:init_amd_l3_attrs
> >>> 49152 48 arch/x86/kernel/cpu/mce/core.c:2709 func:mce_device_create accurate:no
> >>> 32768 1 arch/x86/kernel/cpu/mce/genpool.c:132 func:mce_gen_pool_create
> >>> 0 0 arch/x86/kernel/cpu/mce/amd.c:1341 func:mce_threshold_create_device
> >>>
> >>> Suggested-by: Johannes Weiner <hannes@...xchg.org>
> >>> Signed-off-by: Suren Baghdasaryan <surenb@...gle.com>
> >>> Acked-by: Shakeel Butt <shakeel.butt@...ux.dev>
> >>> Acked-by: Usama Arif <usamaarif642@...il.com>
> >>> Acked-by: Johannes Weiner <hannes@...xchg.org>
> >>
> >> With this format you could instead print the accumulated size of allocations
> >> that could not allocate their objext (for the given tag). It should be then
> >> an upper bound of the actual error, because obviously we cannot recognize
> >> moments where these allocations are freed - so we don't know for which tag
> >> to decrement. Maybe it could be more useful output than the yes/no
> >> information, although of course require more storage in struct codetag, so I
> >> don't know if it's worth it.
> >
> > Yeah, I'm reluctant to add more fields to the codetag and increase the
> > overhead until we have a usecases. If that happens and with the new
> > format we can add something like error_size:<value> to indicate the
> > amount of the error.
> >
> >>
> >> Maybe a global counter of sum size for all these missed objexts could be
> >> also maintained, and that wouldn't be an upper bound but an actual current
> >> error, that is if we can precisely determine that when freeing an object, we
> >> don't have a tag to decrement because objext allocation had failed on it and
> >> thus that allocation had incremented this global error counter and it's
> >> correct to decrement it.
> >
> > That's a good idea and should be doable without too much overhead. Thanks!
> > For the UAPI... I think for this case IOCTL would work and the use
> > scenario would be that the user sees the "accurate:no" mark and issues
> > ioctl command to retrieve this global counter value.
> > Usama, since you initiated this feature request, do you think such a
> > counter would be useful?
> >
>
>
> hmm, I really dont like suggesting changing /proc/allocinfo as it will break parsers,
> but it might be better to put it there?
> If the value is in the file, I imagine people will be more prone to looking at it?
> I am not completely sure if everyone will do an ioctl to try and find this out?
> Especially if you just have infra that is just automatically collecting info from
> this file.
The current file reports per-codetag data and not global counters. We
could report it somewhere in the header but the first question to
answer is: would this be really useful (not in a way of "nice to
have" but for a concrete usecase)? If not then I would suggest keeping
things simple until there is a need for it.
>
> >>
> >>> ---
> >>> Changes since v1[1]:
> >>> - Changed the marker from asterisk to accurate:no pair, per Andrew Morton
> >>> - Documented /proc/allocinfo v2 format
> >>> - Update the changelog
> >>> - Added Acked-by from v2 since the functionality is the same,
> >>> per Shakeel Butt, Usama Arif and Johannes Weiner
> >>>
> >>> [1] https://lore.kernel.org/all/20250909234942.1104356-1-surenb@google.com/
> >>>
> >>> Documentation/filesystems/proc.rst | 4 ++++
> >>> include/linux/alloc_tag.h | 12 ++++++++++++
> >>> include/linux/codetag.h | 5 ++++-
> >>> lib/alloc_tag.c | 4 +++-
> >>> mm/slub.c | 2 ++
> >>> 5 files changed, 25 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> >>> index 915a3e44bc12..1776a06571c2 100644
> >>> --- a/Documentation/filesystems/proc.rst
> >>> +++ b/Documentation/filesystems/proc.rst
> >>> @@ -1009,6 +1009,10 @@ number, module (if originates from a loadable module) and the function calling
> >>> the allocation. The number of bytes allocated and number of calls at each
> >>> location are reported. The first line indicates the version of the file, the
> >>> second line is the header listing fields in the file.
> >>> +If file version is 2.0 or higher then each line may contain additional
> >>> +<key>:<value> pairs representing extra information about the call site.
> >>> +For example if the counters are not accurate, the line will be appended with
> >>> +"accurate:no" pair.
> >>>
> >>> Example output.
> >>>
> >>> diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
> >>> index 9ef2633e2c08..d40ac39bfbe8 100644
> >>> --- a/include/linux/alloc_tag.h
> >>> +++ b/include/linux/alloc_tag.h
> >>> @@ -221,6 +221,16 @@ static inline void alloc_tag_sub(union codetag_ref *ref, size_t bytes)
> >>> ref->ct = NULL;
> >>> }
> >>>
> >>> +static inline void alloc_tag_set_inaccurate(struct alloc_tag *tag)
> >>> +{
> >>> + tag->ct.flags |= CODETAG_FLAG_INACCURATE;
> >>> +}
> >>> +
> >>> +static inline bool alloc_tag_is_inaccurate(struct alloc_tag *tag)
> >>> +{
> >>> + return !!(tag->ct.flags & CODETAG_FLAG_INACCURATE);
> >>> +}
> >>> +
> >>> #define alloc_tag_record(p) ((p) = current->alloc_tag)
> >>>
> >>> #else /* CONFIG_MEM_ALLOC_PROFILING */
> >>> @@ -230,6 +240,8 @@ static inline bool mem_alloc_profiling_enabled(void) { return false; }
> >>> static inline void alloc_tag_add(union codetag_ref *ref, struct alloc_tag *tag,
> >>> size_t bytes) {}
> >>> static inline void alloc_tag_sub(union codetag_ref *ref, size_t bytes) {}
> >>> +static inline void alloc_tag_set_inaccurate(struct alloc_tag *tag) {}
> >>> +static inline bool alloc_tag_is_inaccurate(struct alloc_tag *tag) { return false; }
> >>> #define alloc_tag_record(p) do {} while (0)
> >>>
> >>> #endif /* CONFIG_MEM_ALLOC_PROFILING */
> >>> diff --git a/include/linux/codetag.h b/include/linux/codetag.h
> >>> index 457ed8fd3214..8ea2a5f7c98a 100644
> >>> --- a/include/linux/codetag.h
> >>> +++ b/include/linux/codetag.h
> >>> @@ -16,13 +16,16 @@ struct module;
> >>> #define CODETAG_SECTION_START_PREFIX "__start_"
> >>> #define CODETAG_SECTION_STOP_PREFIX "__stop_"
> >>>
> >>> +/* codetag flags */
> >>> +#define CODETAG_FLAG_INACCURATE (1 << 0)
> >>> +
> >>> /*
> >>> * An instance of this structure is created in a special ELF section at every
> >>> * code location being tagged. At runtime, the special section is treated as
> >>> * an array of these.
> >>> */
> >>> struct codetag {
> >>> - unsigned int flags; /* used in later patches */
> >>> + unsigned int flags;
> >>> unsigned int lineno;
> >>> const char *modname;
> >>> const char *function;
> >>> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> >>> index 79891528e7b6..12ff80bbbd22 100644
> >>> --- a/lib/alloc_tag.c
> >>> +++ b/lib/alloc_tag.c
> >>> @@ -80,7 +80,7 @@ static void allocinfo_stop(struct seq_file *m, void *arg)
> >>> static void print_allocinfo_header(struct seq_buf *buf)
> >>> {
> >>> /* Output format version, so we can change it. */
> >>> - seq_buf_printf(buf, "allocinfo - version: 1.0\n");
> >>> + seq_buf_printf(buf, "allocinfo - version: 2.0\n");
> >>> seq_buf_printf(buf, "# <size> <calls> <tag info>\n");
> >>> }
> >>>
> >>> @@ -92,6 +92,8 @@ static void alloc_tag_to_text(struct seq_buf *out, struct codetag *ct)
> >>>
> >>> seq_buf_printf(out, "%12lli %8llu ", bytes, counter.calls);
> >>> codetag_to_text(out, ct);
> >>> + if (unlikely(alloc_tag_is_inaccurate(tag)))
> >>> + seq_buf_printf(out, " accurate:no");
> >>> seq_buf_putc(out, ' ');
> >>> seq_buf_putc(out, '\n');
> >>> }
> >>> diff --git a/mm/slub.c b/mm/slub.c
> >>> index af343ca570b5..9c04f29ee8de 100644
> >>> --- a/mm/slub.c
> >>> +++ b/mm/slub.c
> >>> @@ -2143,6 +2143,8 @@ __alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
> >>> */
> >>> if (likely(obj_exts))
> >>> alloc_tag_add(&obj_exts->ref, current->alloc_tag, s->size);
> >>> + else
> >>> + alloc_tag_set_inaccurate(current->alloc_tag);
> >>> }
> >>>
> >>> static inline void
> >>
>
Powered by blists - more mailing lists