[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJuCfpG4P2hKuUqQ=w-t72tT4dmh_7_VJPY6gw=nYk-C7DkEjA@mail.gmail.com>
Date: Thu, 8 May 2025 14:41:36 -0700
From: Suren Baghdasaryan <surenb@...gle.com>
To: David Wang <00107082@....com>
Cc: kent.overstreet@...ux.dev, akpm@...ux-foundation.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] alloc_tag: avoid mem alloc and iter reset when reading allocinfo
On Thu, May 8, 2025 at 8:32 AM David Wang <00107082@....com> wrote:
>
>
>
> At 2025-05-08 07:36:14, "Suren Baghdasaryan" <surenb@...gle.com> wrote:
> >On Wed, May 7, 2025 at 5:55 PM David Wang <00107082@....com> wrote:
> >>
> >> ---
> >> The patch is not complete, just want to get feedbacks whether this
> >> worth carrying on.
> >
> >In such cases it's customary to mark the patch as RFC which saves you
> >time on explaining your motivation :)
> >
> >> ---
> >> When reading /proc/allocinfo, for each read syscall, seq_file would
> >> invoke start/stop callbacks. In start callback, a memory is alloced
> >> to store iterator and the iterator would restart from beginning to
> >> walk to its previous position.
> >> Each seq_file read() takes at most 4096 bytes, even read with a larger
> >> user space buffer, meaning read out /proc/allocinfo, tens of read
> >> syscalls are needed. For example, a 306036 bytes allocinfo files need
> >> 76 reads:
> >>
> >> $ sudo cat /proc/allocinfo | wc
> >> 3964 16678 306036
> >>
> >> For those n=3964 lines, each read takes about m=3964/76=52 lines,
> >> the iter would be rewinding:
> >> m steps on 1st read,
> >> 2*m steps on 2nd read
> >> 3*m steps on 3rd read
> >> ...
> >> n steps on the last read
> >> totally, the iterator would be iterated O(n*n/m) times.
> >> (Each read would take more time than previous one.)
> >>
> >> To use a private data alloced when /proc/allocinfo is opened,
> >> the n/m memory alloction could be avoid, and there is no need
> >> to restart the iterator from very beginning everytime.
> >> So only 1 memory allocation and n steps for iterating are needed.
> >> (Only when module changed, the iterator should be invalidated and
> >> restart.)
> >
> >Yeah, your change makes sense and looks like a good optimization. From
> >a quick look at the code, codetag_next_ct() should handle the case
> >when a module gets removed from under us while we are not holding
> >cttype->mod_lock. I'll need to take another closer look at it once you
> >post an official patch.
> >Thanks!
> >
> The module tag container designed more "compact" than I imaged. It seems that no
> extra iterator validation needed for most situations, but I get anxious about the following
> possibility:
>
> In between read() calls, module A removed and then module B inserted, accidentally A
> and B have same IDR id (id reused) and same "struct module" address (kmalloc happened
> to pick the cmod address kfree by module A).
> If this happened, the `if (cmod != iter->cmod)` check in codetag_next_ct may not be
> solid safe....
>
> What about adding a clock/timestamp/expiration to cttype/module/iterator:
I see there was a followup discussion but I don't think your question
was answered. Instead of expiration I would suggest adding a timestamp
in the struct codetag_module that would store the time module was
loaded (basically the time when struct codetag_module gets created)
and also add a timestamp in the struct codetag_iterator. Whenever
iter->cmod gets assigned a new module during the walk (see
https://elixir.bootlin.com/linux/v6.14.5/source/lib/codetag.c#L95) we
update iterator's timestamp (iter->timestamp = cmod->timestamp) and
then we can validate that the module was not replaced from under us by
comparing ter->timestamp and cmod->timestamp. If the module was
replaced from under us, the timestamps will not be equal, so we can
reset the iterator.
>
> diff --git a/include/linux/codetag.h b/include/linux/codetag.h
> index d14dbd26b370..fc9f430090ae 100644
> --- a/include/linux/codetag.h
> +++ b/include/linux/codetag.h
> @@ -54,6 +54,7 @@ struct codetag_iterator {
> struct codetag_module *cmod;
> unsigned long mod_id;
> struct codetag *ct;
> + unsigned long expiration;
> };
>
> #ifdef MODULE
> diff --git a/lib/codetag.c b/lib/codetag.c
> index 42aadd6c1454..a795b152ce92 100644
> --- a/lib/codetag.c
> +++ b/lib/codetag.c
> @@ -13,6 +13,8 @@ struct codetag_type {
> struct idr mod_idr;
> struct rw_semaphore mod_lock; /* protects mod_idr */
> struct codetag_type_desc desc;
> + /* timestamping iterator expiration */
> + unsigned long clock;
> };
>
> struct codetag_range {
> @@ -23,6 +25,8 @@ struct codetag_range {
> struct codetag_module {
> struct module *mod;
> struct codetag_range range;
> + /* creation timestamp */
> + unsigned long timestamp;
> };
>
> static DEFINE_MUTEX(codetag_lock);
> @@ -48,6 +52,7 @@ struct codetag_iterator codetag_get_ct_iter(struct codetag_type *cttype)
> .cmod = NULL,
> .mod_id = 0,
> .ct = NULL,
> + .expiration = 0,
> };
>
> return iter;
> @@ -93,6 +98,11 @@ struct codetag *codetag_next_ct(struct codetag_iterator *iter)
>
> if (cmod != iter->cmod) {
> iter->cmod = cmod;
> + iter->expiration = cmod->timestamp;
> + ct = get_first_module_ct(cmod);
> + } else if (cmod->timestamp != iter->expiration) {
> + pr_warn("Same IDR id and module address, but different module!");
> + iter->expiration = cmod->timestamp;
> ct = get_first_module_ct(cmod);
> } else
> ct = get_next_module_ct(iter);
> @@ -101,6 +111,7 @@ struct codetag *codetag_next_ct(struct codetag_iterator *iter)
> break;
>
> iter->mod_id++;
> + iter->cmod = NULL;
> }
>
> iter->ct = ct;
> @@ -169,6 +180,7 @@ static int codetag_module_init(struct codetag_type *cttype, struct module *mod)
> struct codetag_module *cmod;
> int err;
>
> + cttype->clock++;
> range = get_section_range(mod, cttype->desc.section);
> if (!range.start || !range.stop) {
> pr_warn("Failed to load code tags of type %s from the module %s\n",
> @@ -188,6 +200,7 @@ static int codetag_module_init(struct codetag_type *cttype, struct module *mod)
>
> cmod->mod = mod;
> cmod->range = range;
> + cmod->timestamp = cttype->clock;
>
> down_write(&cttype->mod_lock);
> err = idr_alloc(&cttype->mod_idr, cmod, 0, 0, GFP_KERNEL);
>
>
>
>
>
> And I notice another issue: there are duplicating keys(file:line+module+func) in allocinfo even without this patch:
> On my workstation :
> =======================
> 1400832 114 ././common/inc/nv-linux.h:1117 [nvidia] func:nv_kmem_cache_alloc_stack 840764
> 0 0 ././common/inc/nv-linux.h:1117 [nvidia] func:nv_kmem_cache_alloc_stack 1
> 0 0 ././common/inc/nv-linux.h:1117 [nvidia] func:nv_kmem_cache_alloc_stack 2
> 0 0 ././common/inc/nv-linux.h:1117 [nvidia] func:nv_kmem_cache_alloc_stack 758
> 0 0 ././common/inc/nv-linux.h:1117 [nvidia] func:nv_kmem_cache_alloc_stack 62951
> 0 0 ././common/inc/nv-linux.h:1117 [nvidia] func:nv_kmem_cache_alloc_stack 1
> 0 0 ././common/inc/nv-linux.h:1117 [nvidia] func:nv_kmem_cache_alloc_stack 325450
> 12288 1 ././common/inc/nv-linux.h:1117 [nvidia] func:nv_kmem_cache_alloc_stack 1
> =======================
> 81920 20 drivers/iommu/amd/../iommu-pages.h:94 func:iommu_alloc_pages_node 20
> 1441792 352 drivers/iommu/amd/../iommu-pages.h:94 func:iommu_alloc_pages_node 352
> =======================
> 112 7 kernel/sched/sched.h:2587 func:alloc_user_cpus_ptr 1591
> 48 3 kernel/sched/sched.h:2587 func:alloc_user_cpus_ptr 12
> =======================
> 48 1 mm/mm_slot.h:28 func:mm_slot_alloc 4
> 2160 54 mm/mm_slot.h:28 func:mm_slot_alloc 10705
>
> Most duplicating keys are from "*.h" files
> On a KVM, duplication also happens to some "*.c" files:
> =======================
> 0 0 ./include/crypto/kpp.h:185 func:kpp_request_alloc
> 0 0 ./include/crypto/kpp.h:185 func:kpp_request_alloc
> =======================
> 0 0 ./include/net/tcp.h:2548 func:tcp_v4_save_options
> 0 0 ./include/net/tcp.h:2548 func:tcp_v4_save_options
> =======================
> 0 0 drivers/iommu/amd/../iommu-pages.h:94 func:iommu_alloc_pages_node
> 0 0 drivers/iommu/amd/../iommu-pages.h:94 func:iommu_alloc_pages_node
> 0 0 drivers/iommu/amd/../iommu-pages.h:94 func:iommu_alloc_pages_node
> =======================
> 0 0 drivers/iommu/intel/../iommu-pages.h:94 func:iommu_alloc_pages_node
> 0 0 drivers/iommu/intel/../iommu-pages.h:94 func:iommu_alloc_pages_node
> 0 0 drivers/iommu/intel/../iommu-pages.h:94 func:iommu_alloc_pages_node
> 0 0 drivers/iommu/intel/../iommu-pages.h:94 func:iommu_alloc_pages_node
> 0 0 drivers/iommu/intel/../iommu-pages.h:94 func:iommu_alloc_pages_node
> =======================
> 0 0 fs/binfmt_elf.c:1395 func:load_elf_library
> 0 0 fs/binfmt_elf.c:1395 func:load_elf_library
> =======================
> 0 0 fs/binfmt_elf.c:1653 func:fill_files_note
> 0 0 fs/binfmt_elf.c:1653 func:fill_files_note
> =======================
> 0 0 fs/binfmt_elf.c:1851 func:fill_note_info
> 0 0 fs/binfmt_elf.c:1851 func:fill_note_info
> =======================
> 0 0 fs/binfmt_elf.c:1891 func:fill_note_info
> 0 0 fs/binfmt_elf.c:1891 func:fill_note_info
> =======================
> 0 0 fs/binfmt_elf.c:1899 func:fill_note_info
> 0 0 fs/binfmt_elf.c:1899 func:fill_note_info
> =======================
> 0 0 fs/binfmt_elf.c:2057 func:elf_core_dump
> 0 0 fs/binfmt_elf.c:2057 func:elf_core_dump
> =======================
> 0 0 fs/binfmt_elf.c:2072 func:elf_core_dump
> 0 0 fs/binfmt_elf.c:2072 func:elf_core_dump
> =======================
> 0 0 fs/binfmt_elf.c:532 func:load_elf_phdrs
> 0 0 fs/binfmt_elf.c:532 func:load_elf_phdrs
> =======================
> 0 0 fs/binfmt_elf.c:885 func:load_elf_binary
> 0 0 fs/binfmt_elf.c:885 func:load_elf_binary
> =======================
> 0 0 fs/binfmt_elf.c:910 func:load_elf_binary
> 0 0 fs/binfmt_elf.c:910 func:load_elf_binary
> =======================
> 0 0 fs/fuse/fuse_i.h:1082 [fuse] func:fuse_folios_alloc
> 0 0 fs/fuse/fuse_i.h:1082 [fuse] func:fuse_folios_alloc
> =======================
> 0 0 io_uring/io_uring.h:253 func:io_uring_alloc_async_data
> 0 0 io_uring/io_uring.h:253 func:io_uring_alloc_async_data
> 0 0 io_uring/io_uring.h:253 func:io_uring_alloc_async_data
> 0 0 io_uring/io_uring.h:253 func:io_uring_alloc_async_data
> 0 0 io_uring/io_uring.h:253 func:io_uring_alloc_async_data
> =======================
> 0 0 kernel/sched/sched.h:2587 func:alloc_user_cpus_ptr
> 0 0 kernel/sched/sched.h:2587 func:alloc_user_cpus_ptr
> =======================
> 0 0 mm/mm_slot.h:28 func:mm_slot_alloc
> 160 4 mm/mm_slot.h:28 func:mm_slot_alloc
> =======================
> 0 0 security/apparmor/domain.c:1136 func:change_hat
> 0 0 security/apparmor/domain.c:1136 func:change_hat
> =======================
> 0 0 security/apparmor/domain.c:1455 func:aa_change_profile
> 0 0 security/apparmor/domain.c:1455 func:aa_change_profile
> =======================
> 0 0 security/apparmor/domain.c:835 func:handle_onexec
> 0 0 security/apparmor/domain.c:835 func:handle_onexec
> =======================
> 0 0 security/apparmor/domain.c:909 func:apparmor_bprm_creds_for_exec
> 0 0 security/apparmor/domain.c:909 func:apparmor_bprm_creds_for_exec
> =======================
> 0 0 security/apparmor/mount.c:738 func:aa_pivotroot
> 0 0 security/apparmor/mount.c:738 func:aa_pivotroot
>
>
>
> My workstation have my own changes based on 6.15-rc5, but I didn't touch any code about tags...
> The KVM runs 6.15-rc5.
>
> The script for checking:
>
> #!/bin/env python
> def fetch():
> r = {}
> with open("/proc/allocinfo") as f:
> for l in f:
> f = l.strip().split()[2]
> if f not in r: r[f]=[]
> r[f].append(l)
> keys = []
> for f, ls in r.items():
> if len(ls) > 1: keys.append(f)
> keys.sort()
> for f in keys:
> print "======================="
> for l in r[f]: print l,
>
> fetch()
>
>
>
>
>
> David
>
>
> >>
Powered by blists - more mailing lists