linux-kernel - Re: [PATCH] alloc_tag: avoid mem alloc and iter reset when reading allocinfo

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJuCfpG4P2hKuUqQ=w-t72tT4dmh_7_VJPY6gw=nYk-C7DkEjA@mail.gmail.com>
Date: Thu, 8 May 2025 14:41:36 -0700
From: Suren Baghdasaryan <surenb@...gle.com>
To: David Wang <00107082@....com>
Cc: kent.overstreet@...ux.dev, akpm@...ux-foundation.org, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] alloc_tag: avoid mem alloc and iter reset when reading allocinfo

On Thu, May 8, 2025 at 8:32 AM David Wang <00107082@....com> wrote:
>
>
>
> At 2025-05-08 07:36:14, "Suren Baghdasaryan" <surenb@...gle.com> wrote:
> >On Wed, May 7, 2025 at 5:55 PM David Wang <00107082@....com> wrote:
> >>
> >> ---
> >> The patch is not complete, just want to get feedbacks whether this
> >> worth carrying on.
> >
> >In such cases it's customary to mark the patch as RFC which saves you
> >time on explaining your motivation :)
> >
> >> ---
> >> When reading /proc/allocinfo, for each read syscall, seq_file would
> >> invoke start/stop callbacks. In start callback, a memory is alloced
> >> to store iterator and the iterator would restart from beginning to
> >> walk to its previous position.
> >> Each seq_file read() takes at most 4096 bytes, even read with a larger
> >> user space buffer, meaning read out /proc/allocinfo, tens of read
> >> syscalls are needed. For example, a 306036 bytes allocinfo files need
> >> 76 reads:
> >>
> >> $ sudo cat /proc/allocinfo  | wc
> >>    3964   16678  306036
> >>
> >> For those n=3964 lines, each read takes about m=3964/76=52 lines,
> >> the iter would be rewinding:
> >>  m    steps on 1st read,
> >>  2*m  steps on 2nd read
> >>  3*m  steps on 3rd read
> >> ...
> >>  n  steps on the last read
> >> totally, the iterator would be iterated O(n*n/m) times.
> >> (Each read would take more time than previous one.)
> >>
> >> To use a private data alloced when /proc/allocinfo is opened,
> >> the n/m memory alloction could be avoid, and there is no need
> >> to restart the iterator from very beginning everytime.
> >> So only 1 memory allocation and n steps for iterating are needed.
> >> (Only when module changed, the iterator should be invalidated and
> >> restart.)
> >
> >Yeah, your change makes sense and looks like a good optimization. From
> >a quick look at the code, codetag_next_ct() should handle the case
> >when a module gets removed from under us while we are not holding
> >cttype->mod_lock. I'll need to take another closer look at it once you
> >post an official patch.
> >Thanks!
> >
> The module tag container designed more "compact" than I imaged. It seems that no
> extra iterator validation needed for most situations, but I get anxious about the following
> possibility:
>
> In between read() calls, module A removed and then module B inserted, accidentally A
> and B have same IDR id (id reused) and same "struct module" address (kmalloc happened
> to pick the cmod address kfree by module A).
> If this happened, the `if (cmod != iter->cmod)` check in codetag_next_ct may not be
> solid safe....
>
> What about adding a clock/timestamp/expiration to cttype/module/iterator:

I see there was a followup discussion but I don't think your question
was answered. Instead of expiration I would suggest adding a timestamp
in the struct codetag_module that would store the time module was
loaded (basically the time when struct codetag_module gets created)
and also add a timestamp in the struct codetag_iterator. Whenever
iter->cmod gets assigned a new module during the walk (see
https://elixir.bootlin.com/linux/v6.14.5/source/lib/codetag.c#L95) we
update iterator's timestamp (iter->timestamp = cmod->timestamp) and
then we can validate that the module was not replaced from under us by
comparing ter->timestamp and cmod->timestamp. If the module was
replaced from under us, the timestamps will not be equal, so we can
reset the iterator.

>
> diff --git a/include/linux/codetag.h b/include/linux/codetag.h
> index d14dbd26b370..fc9f430090ae 100644
> --- a/include/linux/codetag.h
> +++ b/include/linux/codetag.h
> @@ -54,6 +54,7 @@ struct codetag_iterator {
>         struct codetag_module *cmod;
>         unsigned long mod_id;
>         struct codetag *ct;
> +       unsigned long expiration;
>  };
>
>  #ifdef MODULE
> diff --git a/lib/codetag.c b/lib/codetag.c
> index 42aadd6c1454..a795b152ce92 100644
> --- a/lib/codetag.c
> +++ b/lib/codetag.c
> @@ -13,6 +13,8 @@ struct codetag_type {
>         struct idr mod_idr;
>         struct rw_semaphore mod_lock; /* protects mod_idr */
>         struct codetag_type_desc desc;
> +       /* timestamping iterator expiration */
> +       unsigned long clock;
>  };
>
>  struct codetag_range {
> @@ -23,6 +25,8 @@ struct codetag_range {
>  struct codetag_module {
>         struct module *mod;
>         struct codetag_range range;
> +       /* creation timestamp */
> +       unsigned long timestamp;
>  };
>
>  static DEFINE_MUTEX(codetag_lock);
> @@ -48,6 +52,7 @@ struct codetag_iterator codetag_get_ct_iter(struct codetag_type *cttype)
>                 .cmod = NULL,
>                 .mod_id = 0,
>                 .ct = NULL,
> +               .expiration = 0,
>         };
>
>         return iter;
> @@ -93,6 +98,11 @@ struct codetag *codetag_next_ct(struct codetag_iterator *iter)
>
>                 if (cmod != iter->cmod) {
>                         iter->cmod = cmod;
> +                       iter->expiration = cmod->timestamp;
> +                       ct = get_first_module_ct(cmod);
> +               } else if (cmod->timestamp != iter->expiration) {
> +                       pr_warn("Same IDR id and module address, but different module!");
> +                       iter->expiration = cmod->timestamp;
>                         ct = get_first_module_ct(cmod);
>                 } else
>                         ct = get_next_module_ct(iter);
> @@ -101,6 +111,7 @@ struct codetag *codetag_next_ct(struct codetag_iterator *iter)
>                         break;
>
>                 iter->mod_id++;
> +               iter->cmod = NULL;
>         }
>
>         iter->ct = ct;
> @@ -169,6 +180,7 @@ static int codetag_module_init(struct codetag_type *cttype, struct module *mod)
>         struct codetag_module *cmod;
>         int err;
>
> +       cttype->clock++;
>         range = get_section_range(mod, cttype->desc.section);
>         if (!range.start || !range.stop) {
>                 pr_warn("Failed to load code tags of type %s from the module %s\n",
> @@ -188,6 +200,7 @@ static int codetag_module_init(struct codetag_type *cttype, struct module *mod)
>
>         cmod->mod = mod;
>         cmod->range = range;
> +       cmod->timestamp = cttype->clock;
>
>         down_write(&cttype->mod_lock);
>         err = idr_alloc(&cttype->mod_idr, cmod, 0, 0, GFP_KERNEL);
>
>
>
>
>
> And  I notice another issue: there are duplicating keys(file:line+module+func) in allocinfo even without this patch:
> On my workstation :
> =======================
>      1400832      114 ././common/inc/nv-linux.h:1117 [nvidia] func:nv_kmem_cache_alloc_stack 840764
>            0        0 ././common/inc/nv-linux.h:1117 [nvidia] func:nv_kmem_cache_alloc_stack 1
>            0        0 ././common/inc/nv-linux.h:1117 [nvidia] func:nv_kmem_cache_alloc_stack 2
>            0        0 ././common/inc/nv-linux.h:1117 [nvidia] func:nv_kmem_cache_alloc_stack 758
>            0        0 ././common/inc/nv-linux.h:1117 [nvidia] func:nv_kmem_cache_alloc_stack 62951
>            0        0 ././common/inc/nv-linux.h:1117 [nvidia] func:nv_kmem_cache_alloc_stack 1
>            0        0 ././common/inc/nv-linux.h:1117 [nvidia] func:nv_kmem_cache_alloc_stack 325450
>        12288        1 ././common/inc/nv-linux.h:1117 [nvidia] func:nv_kmem_cache_alloc_stack 1
> =======================
>        81920       20 drivers/iommu/amd/../iommu-pages.h:94 func:iommu_alloc_pages_node 20
>      1441792      352 drivers/iommu/amd/../iommu-pages.h:94 func:iommu_alloc_pages_node 352
> =======================
>          112        7 kernel/sched/sched.h:2587 func:alloc_user_cpus_ptr 1591
>           48        3 kernel/sched/sched.h:2587 func:alloc_user_cpus_ptr 12
> =======================
>           48        1 mm/mm_slot.h:28 func:mm_slot_alloc 4
>         2160       54 mm/mm_slot.h:28 func:mm_slot_alloc 10705
>
> Most duplicating keys are from "*.h" files
> On a KVM, duplication also happens to some "*.c" files:
> =======================
>            0        0 ./include/crypto/kpp.h:185 func:kpp_request_alloc
>            0        0 ./include/crypto/kpp.h:185 func:kpp_request_alloc
> =======================
>            0        0 ./include/net/tcp.h:2548 func:tcp_v4_save_options
>            0        0 ./include/net/tcp.h:2548 func:tcp_v4_save_options
> =======================
>            0        0 drivers/iommu/amd/../iommu-pages.h:94 func:iommu_alloc_pages_node
>            0        0 drivers/iommu/amd/../iommu-pages.h:94 func:iommu_alloc_pages_node
>            0        0 drivers/iommu/amd/../iommu-pages.h:94 func:iommu_alloc_pages_node
> =======================
>            0        0 drivers/iommu/intel/../iommu-pages.h:94 func:iommu_alloc_pages_node
>            0        0 drivers/iommu/intel/../iommu-pages.h:94 func:iommu_alloc_pages_node
>            0        0 drivers/iommu/intel/../iommu-pages.h:94 func:iommu_alloc_pages_node
>            0        0 drivers/iommu/intel/../iommu-pages.h:94 func:iommu_alloc_pages_node
>            0        0 drivers/iommu/intel/../iommu-pages.h:94 func:iommu_alloc_pages_node
> =======================
>            0        0 fs/binfmt_elf.c:1395 func:load_elf_library
>            0        0 fs/binfmt_elf.c:1395 func:load_elf_library
> =======================
>            0        0 fs/binfmt_elf.c:1653 func:fill_files_note
>            0        0 fs/binfmt_elf.c:1653 func:fill_files_note
> =======================
>            0        0 fs/binfmt_elf.c:1851 func:fill_note_info
>            0        0 fs/binfmt_elf.c:1851 func:fill_note_info
> =======================
>            0        0 fs/binfmt_elf.c:1891 func:fill_note_info
>            0        0 fs/binfmt_elf.c:1891 func:fill_note_info
> =======================
>            0        0 fs/binfmt_elf.c:1899 func:fill_note_info
>            0        0 fs/binfmt_elf.c:1899 func:fill_note_info
> =======================
>            0        0 fs/binfmt_elf.c:2057 func:elf_core_dump
>            0        0 fs/binfmt_elf.c:2057 func:elf_core_dump
> =======================
>            0        0 fs/binfmt_elf.c:2072 func:elf_core_dump
>            0        0 fs/binfmt_elf.c:2072 func:elf_core_dump
> =======================
>            0        0 fs/binfmt_elf.c:532 func:load_elf_phdrs
>            0        0 fs/binfmt_elf.c:532 func:load_elf_phdrs
> =======================
>            0        0 fs/binfmt_elf.c:885 func:load_elf_binary
>            0        0 fs/binfmt_elf.c:885 func:load_elf_binary
> =======================
>            0        0 fs/binfmt_elf.c:910 func:load_elf_binary
>            0        0 fs/binfmt_elf.c:910 func:load_elf_binary
> =======================
>            0        0 fs/fuse/fuse_i.h:1082 [fuse] func:fuse_folios_alloc
>            0        0 fs/fuse/fuse_i.h:1082 [fuse] func:fuse_folios_alloc
> =======================
>            0        0 io_uring/io_uring.h:253 func:io_uring_alloc_async_data
>            0        0 io_uring/io_uring.h:253 func:io_uring_alloc_async_data
>            0        0 io_uring/io_uring.h:253 func:io_uring_alloc_async_data
>            0        0 io_uring/io_uring.h:253 func:io_uring_alloc_async_data
>            0        0 io_uring/io_uring.h:253 func:io_uring_alloc_async_data
> =======================
>            0        0 kernel/sched/sched.h:2587 func:alloc_user_cpus_ptr
>            0        0 kernel/sched/sched.h:2587 func:alloc_user_cpus_ptr
> =======================
>            0        0 mm/mm_slot.h:28 func:mm_slot_alloc
>          160        4 mm/mm_slot.h:28 func:mm_slot_alloc
> =======================
>            0        0 security/apparmor/domain.c:1136 func:change_hat
>            0        0 security/apparmor/domain.c:1136 func:change_hat
> =======================
>            0        0 security/apparmor/domain.c:1455 func:aa_change_profile
>            0        0 security/apparmor/domain.c:1455 func:aa_change_profile
> =======================
>            0        0 security/apparmor/domain.c:835 func:handle_onexec
>            0        0 security/apparmor/domain.c:835 func:handle_onexec
> =======================
>            0        0 security/apparmor/domain.c:909 func:apparmor_bprm_creds_for_exec
>            0        0 security/apparmor/domain.c:909 func:apparmor_bprm_creds_for_exec
> =======================
>            0        0 security/apparmor/mount.c:738 func:aa_pivotroot
>            0        0 security/apparmor/mount.c:738 func:aa_pivotroot
>
>
>
> My workstation have my own changes based on 6.15-rc5, but I didn't touch any code about tags...
> The KVM runs 6.15-rc5.
>
> The script for checking:
>
> #!/bin/env python
> def fetch():
>     r = {}
>     with open("/proc/allocinfo") as f:
>         for l in f:
>             f = l.strip().split()[2]
>             if f not in r: r[f]=[]
>             r[f].append(l)
>     keys = []
>     for f, ls in r.items():
>         if len(ls) > 1: keys.append(f)
>     keys.sort()
>     for f in keys:
>         print "======================="
>         for l in r[f]: print l,
>
> fetch()
>
>
>
>
>
> David
>
>
> >>