linux-kernel - Re: [PATCH] alloc_tag: avoid mem alloc and iter reset when reading allocinfo

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <531adbba.b537.196b0868a8c.Coremail.00107082@163.com>
Date: Thu, 8 May 2025 23:32:09 +0800 (CST)
From: "David Wang" <00107082@....com>
To: "Suren Baghdasaryan" <surenb@...gle.com>
Cc: kent.overstreet@...ux.dev, akpm@...ux-foundation.org, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] alloc_tag: avoid mem alloc and iter reset when reading
 allocinfo



At 2025-05-08 07:36:14, "Suren Baghdasaryan" <surenb@...gle.com> wrote:
>On Wed, May 7, 2025 at 5:55 PM David Wang <00107082@....com> wrote:
>>
>> ---
>> The patch is not complete, just want to get feedbacks whether this
>> worth carrying on.
>
>In such cases it's customary to mark the patch as RFC which saves you
>time on explaining your motivation :)
>
>> ---
>> When reading /proc/allocinfo, for each read syscall, seq_file would
>> invoke start/stop callbacks. In start callback, a memory is alloced
>> to store iterator and the iterator would restart from beginning to
>> walk to its previous position.
>> Each seq_file read() takes at most 4096 bytes, even read with a larger
>> user space buffer, meaning read out /proc/allocinfo, tens of read
>> syscalls are needed. For example, a 306036 bytes allocinfo files need
>> 76 reads:
>>
>> $ sudo cat /proc/allocinfo  | wc
>>    3964   16678  306036
>>
>> For those n=3964 lines, each read takes about m=3964/76=52 lines,
>> the iter would be rewinding:
>>  m    steps on 1st read,
>>  2*m  steps on 2nd read
>>  3*m  steps on 3rd read
>> ...
>>  n  steps on the last read
>> totally, the iterator would be iterated O(n*n/m) times.
>> (Each read would take more time than previous one.)
>>
>> To use a private data alloced when /proc/allocinfo is opened,
>> the n/m memory alloction could be avoid, and there is no need
>> to restart the iterator from very beginning everytime.
>> So only 1 memory allocation and n steps for iterating are needed.
>> (Only when module changed, the iterator should be invalidated and
>> restart.)
>
>Yeah, your change makes sense and looks like a good optimization. From
>a quick look at the code, codetag_next_ct() should handle the case
>when a module gets removed from under us while we are not holding
>cttype->mod_lock. I'll need to take another closer look at it once you
>post an official patch.
>Thanks!
>
The module tag container designed more "compact" than I imaged. It seems that no
extra iterator validation needed for most situations, but I get anxious about the following 
possibility:

In between read() calls, module A removed and then module B inserted, accidentally A
and B have same IDR id (id reused) and same "struct module" address (kmalloc happened
to pick the cmod address kfree by module A).
If this happened, the `if (cmod != iter->cmod)` check in codetag_next_ct may not be
solid safe....

What about adding a clock/timestamp/expiration to cttype/module/iterator:

diff --git a/include/linux/codetag.h b/include/linux/codetag.h
index d14dbd26b370..fc9f430090ae 100644
--- a/include/linux/codetag.h
+++ b/include/linux/codetag.h
@@ -54,6 +54,7 @@ struct codetag_iterator {
 	struct codetag_module *cmod;
 	unsigned long mod_id;
 	struct codetag *ct;
+	unsigned long expiration;
 };
 
 #ifdef MODULE
diff --git a/lib/codetag.c b/lib/codetag.c
index 42aadd6c1454..a795b152ce92 100644
--- a/lib/codetag.c
+++ b/lib/codetag.c
@@ -13,6 +13,8 @@ struct codetag_type {
 	struct idr mod_idr;
 	struct rw_semaphore mod_lock; /* protects mod_idr */
 	struct codetag_type_desc desc;
+	/* timestamping iterator expiration */
+	unsigned long clock;
 };
 
 struct codetag_range {
@@ -23,6 +25,8 @@ struct codetag_range {
 struct codetag_module {
 	struct module *mod;
 	struct codetag_range range;
+	/* creation timestamp */
+	unsigned long timestamp;
 };
 
 static DEFINE_MUTEX(codetag_lock);
@@ -48,6 +52,7 @@ struct codetag_iterator codetag_get_ct_iter(struct codetag_type *cttype)
 		.cmod = NULL,
 		.mod_id = 0,
 		.ct = NULL,
+		.expiration = 0,
 	};
 
 	return iter;
@@ -93,6 +98,11 @@ struct codetag *codetag_next_ct(struct codetag_iterator *iter)
 
 		if (cmod != iter->cmod) {
 			iter->cmod = cmod;
+			iter->expiration = cmod->timestamp;
+			ct = get_first_module_ct(cmod);
+		} else if (cmod->timestamp != iter->expiration) {
+			pr_warn("Same IDR id and module address, but different module!");
+			iter->expiration = cmod->timestamp;
 			ct = get_first_module_ct(cmod);
 		} else
 			ct = get_next_module_ct(iter);
@@ -101,6 +111,7 @@ struct codetag *codetag_next_ct(struct codetag_iterator *iter)
 			break;
 
 		iter->mod_id++;
+		iter->cmod = NULL;
 	}
 
 	iter->ct = ct;
@@ -169,6 +180,7 @@ static int codetag_module_init(struct codetag_type *cttype, struct module *mod)
 	struct codetag_module *cmod;
 	int err;
 
+	cttype->clock++;
 	range = get_section_range(mod, cttype->desc.section);
 	if (!range.start || !range.stop) {
 		pr_warn("Failed to load code tags of type %s from the module %s\n",
@@ -188,6 +200,7 @@ static int codetag_module_init(struct codetag_type *cttype, struct module *mod)
 
 	cmod->mod = mod;
 	cmod->range = range;
+	cmod->timestamp = cttype->clock;
 
 	down_write(&cttype->mod_lock);
 	err = idr_alloc(&cttype->mod_idr, cmod, 0, 0, GFP_KERNEL);


 


And  I notice another issue: there are duplicating keys(file:line+module+func) in allocinfo even without this patch:
On my workstation :
=======================
     1400832      114 ././common/inc/nv-linux.h:1117 [nvidia] func:nv_kmem_cache_alloc_stack 840764 
           0        0 ././common/inc/nv-linux.h:1117 [nvidia] func:nv_kmem_cache_alloc_stack 1 
           0        0 ././common/inc/nv-linux.h:1117 [nvidia] func:nv_kmem_cache_alloc_stack 2 
           0        0 ././common/inc/nv-linux.h:1117 [nvidia] func:nv_kmem_cache_alloc_stack 758 
           0        0 ././common/inc/nv-linux.h:1117 [nvidia] func:nv_kmem_cache_alloc_stack 62951 
           0        0 ././common/inc/nv-linux.h:1117 [nvidia] func:nv_kmem_cache_alloc_stack 1 
           0        0 ././common/inc/nv-linux.h:1117 [nvidia] func:nv_kmem_cache_alloc_stack 325450 
       12288        1 ././common/inc/nv-linux.h:1117 [nvidia] func:nv_kmem_cache_alloc_stack 1 
=======================
       81920       20 drivers/iommu/amd/../iommu-pages.h:94 func:iommu_alloc_pages_node 20 
     1441792      352 drivers/iommu/amd/../iommu-pages.h:94 func:iommu_alloc_pages_node 352 
=======================
         112        7 kernel/sched/sched.h:2587 func:alloc_user_cpus_ptr 1591 
          48        3 kernel/sched/sched.h:2587 func:alloc_user_cpus_ptr 12 
=======================
          48        1 mm/mm_slot.h:28 func:mm_slot_alloc 4 
        2160       54 mm/mm_slot.h:28 func:mm_slot_alloc 10705

Most duplicating keys are from "*.h" files
On a KVM, duplication also happens to some "*.c" files:
=======================
           0        0 ./include/crypto/kpp.h:185 func:kpp_request_alloc 
           0        0 ./include/crypto/kpp.h:185 func:kpp_request_alloc 
=======================
           0        0 ./include/net/tcp.h:2548 func:tcp_v4_save_options 
           0        0 ./include/net/tcp.h:2548 func:tcp_v4_save_options 
=======================
           0        0 drivers/iommu/amd/../iommu-pages.h:94 func:iommu_alloc_pages_node 
           0        0 drivers/iommu/amd/../iommu-pages.h:94 func:iommu_alloc_pages_node 
           0        0 drivers/iommu/amd/../iommu-pages.h:94 func:iommu_alloc_pages_node 
=======================
           0        0 drivers/iommu/intel/../iommu-pages.h:94 func:iommu_alloc_pages_node 
           0        0 drivers/iommu/intel/../iommu-pages.h:94 func:iommu_alloc_pages_node 
           0        0 drivers/iommu/intel/../iommu-pages.h:94 func:iommu_alloc_pages_node 
           0        0 drivers/iommu/intel/../iommu-pages.h:94 func:iommu_alloc_pages_node 
           0        0 drivers/iommu/intel/../iommu-pages.h:94 func:iommu_alloc_pages_node 
=======================
           0        0 fs/binfmt_elf.c:1395 func:load_elf_library 
           0        0 fs/binfmt_elf.c:1395 func:load_elf_library 
=======================
           0        0 fs/binfmt_elf.c:1653 func:fill_files_note 
           0        0 fs/binfmt_elf.c:1653 func:fill_files_note 
=======================
           0        0 fs/binfmt_elf.c:1851 func:fill_note_info 
           0        0 fs/binfmt_elf.c:1851 func:fill_note_info 
=======================
           0        0 fs/binfmt_elf.c:1891 func:fill_note_info 
           0        0 fs/binfmt_elf.c:1891 func:fill_note_info 
=======================
           0        0 fs/binfmt_elf.c:1899 func:fill_note_info 
           0        0 fs/binfmt_elf.c:1899 func:fill_note_info 
=======================
           0        0 fs/binfmt_elf.c:2057 func:elf_core_dump 
           0        0 fs/binfmt_elf.c:2057 func:elf_core_dump 
=======================
           0        0 fs/binfmt_elf.c:2072 func:elf_core_dump 
           0        0 fs/binfmt_elf.c:2072 func:elf_core_dump 
=======================
           0        0 fs/binfmt_elf.c:532 func:load_elf_phdrs 
           0        0 fs/binfmt_elf.c:532 func:load_elf_phdrs 
=======================
           0        0 fs/binfmt_elf.c:885 func:load_elf_binary 
           0        0 fs/binfmt_elf.c:885 func:load_elf_binary 
=======================
           0        0 fs/binfmt_elf.c:910 func:load_elf_binary 
           0        0 fs/binfmt_elf.c:910 func:load_elf_binary 
=======================
           0        0 fs/fuse/fuse_i.h:1082 [fuse] func:fuse_folios_alloc 
           0        0 fs/fuse/fuse_i.h:1082 [fuse] func:fuse_folios_alloc 
=======================
           0        0 io_uring/io_uring.h:253 func:io_uring_alloc_async_data 
           0        0 io_uring/io_uring.h:253 func:io_uring_alloc_async_data 
           0        0 io_uring/io_uring.h:253 func:io_uring_alloc_async_data 
           0        0 io_uring/io_uring.h:253 func:io_uring_alloc_async_data 
           0        0 io_uring/io_uring.h:253 func:io_uring_alloc_async_data 
=======================
           0        0 kernel/sched/sched.h:2587 func:alloc_user_cpus_ptr 
           0        0 kernel/sched/sched.h:2587 func:alloc_user_cpus_ptr 
=======================
           0        0 mm/mm_slot.h:28 func:mm_slot_alloc 
         160        4 mm/mm_slot.h:28 func:mm_slot_alloc 
=======================
           0        0 security/apparmor/domain.c:1136 func:change_hat 
           0        0 security/apparmor/domain.c:1136 func:change_hat 
=======================
           0        0 security/apparmor/domain.c:1455 func:aa_change_profile 
           0        0 security/apparmor/domain.c:1455 func:aa_change_profile 
=======================
           0        0 security/apparmor/domain.c:835 func:handle_onexec 
           0        0 security/apparmor/domain.c:835 func:handle_onexec 
=======================
           0        0 security/apparmor/domain.c:909 func:apparmor_bprm_creds_for_exec 
           0        0 security/apparmor/domain.c:909 func:apparmor_bprm_creds_for_exec 
=======================
           0        0 security/apparmor/mount.c:738 func:aa_pivotroot 
           0        0 security/apparmor/mount.c:738 func:aa_pivotroot 



My workstation have my own changes based on 6.15-rc5, but I didn't touch any code about tags...
The KVM runs 6.15-rc5.

The script for checking:

#!/bin/env python
def fetch():
    r = {}
    with open("/proc/allocinfo") as f:
        for l in f:
            f = l.strip().split()[2]
            if f not in r: r[f]=[]
            r[f].append(l)
    keys = []
    for f, ls in r.items():
        if len(ls) > 1: keys.append(f)
    keys.sort()
    for f in keys:
        print "======================="
        for l in r[f]: print l,

fetch()





David


>>