[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <531adbba.b537.196b0868a8c.Coremail.00107082@163.com>
Date: Thu, 8 May 2025 23:32:09 +0800 (CST)
From: "David Wang" <00107082@....com>
To: "Suren Baghdasaryan" <surenb@...gle.com>
Cc: kent.overstreet@...ux.dev, akpm@...ux-foundation.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] alloc_tag: avoid mem alloc and iter reset when reading
allocinfo
At 2025-05-08 07:36:14, "Suren Baghdasaryan" <surenb@...gle.com> wrote:
>On Wed, May 7, 2025 at 5:55 PM David Wang <00107082@....com> wrote:
>>
>> ---
>> The patch is not complete, just want to get feedbacks whether this
>> worth carrying on.
>
>In such cases it's customary to mark the patch as RFC which saves you
>time on explaining your motivation :)
>
>> ---
>> When reading /proc/allocinfo, for each read syscall, seq_file would
>> invoke start/stop callbacks. In start callback, a memory is alloced
>> to store iterator and the iterator would restart from beginning to
>> walk to its previous position.
>> Each seq_file read() takes at most 4096 bytes, even read with a larger
>> user space buffer, meaning read out /proc/allocinfo, tens of read
>> syscalls are needed. For example, a 306036 bytes allocinfo files need
>> 76 reads:
>>
>> $ sudo cat /proc/allocinfo | wc
>> 3964 16678 306036
>>
>> For those n=3964 lines, each read takes about m=3964/76=52 lines,
>> the iter would be rewinding:
>> m steps on 1st read,
>> 2*m steps on 2nd read
>> 3*m steps on 3rd read
>> ...
>> n steps on the last read
>> totally, the iterator would be iterated O(n*n/m) times.
>> (Each read would take more time than previous one.)
>>
>> To use a private data alloced when /proc/allocinfo is opened,
>> the n/m memory alloction could be avoid, and there is no need
>> to restart the iterator from very beginning everytime.
>> So only 1 memory allocation and n steps for iterating are needed.
>> (Only when module changed, the iterator should be invalidated and
>> restart.)
>
>Yeah, your change makes sense and looks like a good optimization. From
>a quick look at the code, codetag_next_ct() should handle the case
>when a module gets removed from under us while we are not holding
>cttype->mod_lock. I'll need to take another closer look at it once you
>post an official patch.
>Thanks!
>
The module tag container designed more "compact" than I imaged. It seems that no
extra iterator validation needed for most situations, but I get anxious about the following
possibility:
In between read() calls, module A removed and then module B inserted, accidentally A
and B have same IDR id (id reused) and same "struct module" address (kmalloc happened
to pick the cmod address kfree by module A).
If this happened, the `if (cmod != iter->cmod)` check in codetag_next_ct may not be
solid safe....
What about adding a clock/timestamp/expiration to cttype/module/iterator:
diff --git a/include/linux/codetag.h b/include/linux/codetag.h
index d14dbd26b370..fc9f430090ae 100644
--- a/include/linux/codetag.h
+++ b/include/linux/codetag.h
@@ -54,6 +54,7 @@ struct codetag_iterator {
struct codetag_module *cmod;
unsigned long mod_id;
struct codetag *ct;
+ unsigned long expiration;
};
#ifdef MODULE
diff --git a/lib/codetag.c b/lib/codetag.c
index 42aadd6c1454..a795b152ce92 100644
--- a/lib/codetag.c
+++ b/lib/codetag.c
@@ -13,6 +13,8 @@ struct codetag_type {
struct idr mod_idr;
struct rw_semaphore mod_lock; /* protects mod_idr */
struct codetag_type_desc desc;
+ /* timestamping iterator expiration */
+ unsigned long clock;
};
struct codetag_range {
@@ -23,6 +25,8 @@ struct codetag_range {
struct codetag_module {
struct module *mod;
struct codetag_range range;
+ /* creation timestamp */
+ unsigned long timestamp;
};
static DEFINE_MUTEX(codetag_lock);
@@ -48,6 +52,7 @@ struct codetag_iterator codetag_get_ct_iter(struct codetag_type *cttype)
.cmod = NULL,
.mod_id = 0,
.ct = NULL,
+ .expiration = 0,
};
return iter;
@@ -93,6 +98,11 @@ struct codetag *codetag_next_ct(struct codetag_iterator *iter)
if (cmod != iter->cmod) {
iter->cmod = cmod;
+ iter->expiration = cmod->timestamp;
+ ct = get_first_module_ct(cmod);
+ } else if (cmod->timestamp != iter->expiration) {
+ pr_warn("Same IDR id and module address, but different module!");
+ iter->expiration = cmod->timestamp;
ct = get_first_module_ct(cmod);
} else
ct = get_next_module_ct(iter);
@@ -101,6 +111,7 @@ struct codetag *codetag_next_ct(struct codetag_iterator *iter)
break;
iter->mod_id++;
+ iter->cmod = NULL;
}
iter->ct = ct;
@@ -169,6 +180,7 @@ static int codetag_module_init(struct codetag_type *cttype, struct module *mod)
struct codetag_module *cmod;
int err;
+ cttype->clock++;
range = get_section_range(mod, cttype->desc.section);
if (!range.start || !range.stop) {
pr_warn("Failed to load code tags of type %s from the module %s\n",
@@ -188,6 +200,7 @@ static int codetag_module_init(struct codetag_type *cttype, struct module *mod)
cmod->mod = mod;
cmod->range = range;
+ cmod->timestamp = cttype->clock;
down_write(&cttype->mod_lock);
err = idr_alloc(&cttype->mod_idr, cmod, 0, 0, GFP_KERNEL);
And I notice another issue: there are duplicating keys(file:line+module+func) in allocinfo even without this patch:
On my workstation :
=======================
1400832 114 ././common/inc/nv-linux.h:1117 [nvidia] func:nv_kmem_cache_alloc_stack 840764
0 0 ././common/inc/nv-linux.h:1117 [nvidia] func:nv_kmem_cache_alloc_stack 1
0 0 ././common/inc/nv-linux.h:1117 [nvidia] func:nv_kmem_cache_alloc_stack 2
0 0 ././common/inc/nv-linux.h:1117 [nvidia] func:nv_kmem_cache_alloc_stack 758
0 0 ././common/inc/nv-linux.h:1117 [nvidia] func:nv_kmem_cache_alloc_stack 62951
0 0 ././common/inc/nv-linux.h:1117 [nvidia] func:nv_kmem_cache_alloc_stack 1
0 0 ././common/inc/nv-linux.h:1117 [nvidia] func:nv_kmem_cache_alloc_stack 325450
12288 1 ././common/inc/nv-linux.h:1117 [nvidia] func:nv_kmem_cache_alloc_stack 1
=======================
81920 20 drivers/iommu/amd/../iommu-pages.h:94 func:iommu_alloc_pages_node 20
1441792 352 drivers/iommu/amd/../iommu-pages.h:94 func:iommu_alloc_pages_node 352
=======================
112 7 kernel/sched/sched.h:2587 func:alloc_user_cpus_ptr 1591
48 3 kernel/sched/sched.h:2587 func:alloc_user_cpus_ptr 12
=======================
48 1 mm/mm_slot.h:28 func:mm_slot_alloc 4
2160 54 mm/mm_slot.h:28 func:mm_slot_alloc 10705
Most duplicating keys are from "*.h" files
On a KVM, duplication also happens to some "*.c" files:
=======================
0 0 ./include/crypto/kpp.h:185 func:kpp_request_alloc
0 0 ./include/crypto/kpp.h:185 func:kpp_request_alloc
=======================
0 0 ./include/net/tcp.h:2548 func:tcp_v4_save_options
0 0 ./include/net/tcp.h:2548 func:tcp_v4_save_options
=======================
0 0 drivers/iommu/amd/../iommu-pages.h:94 func:iommu_alloc_pages_node
0 0 drivers/iommu/amd/../iommu-pages.h:94 func:iommu_alloc_pages_node
0 0 drivers/iommu/amd/../iommu-pages.h:94 func:iommu_alloc_pages_node
=======================
0 0 drivers/iommu/intel/../iommu-pages.h:94 func:iommu_alloc_pages_node
0 0 drivers/iommu/intel/../iommu-pages.h:94 func:iommu_alloc_pages_node
0 0 drivers/iommu/intel/../iommu-pages.h:94 func:iommu_alloc_pages_node
0 0 drivers/iommu/intel/../iommu-pages.h:94 func:iommu_alloc_pages_node
0 0 drivers/iommu/intel/../iommu-pages.h:94 func:iommu_alloc_pages_node
=======================
0 0 fs/binfmt_elf.c:1395 func:load_elf_library
0 0 fs/binfmt_elf.c:1395 func:load_elf_library
=======================
0 0 fs/binfmt_elf.c:1653 func:fill_files_note
0 0 fs/binfmt_elf.c:1653 func:fill_files_note
=======================
0 0 fs/binfmt_elf.c:1851 func:fill_note_info
0 0 fs/binfmt_elf.c:1851 func:fill_note_info
=======================
0 0 fs/binfmt_elf.c:1891 func:fill_note_info
0 0 fs/binfmt_elf.c:1891 func:fill_note_info
=======================
0 0 fs/binfmt_elf.c:1899 func:fill_note_info
0 0 fs/binfmt_elf.c:1899 func:fill_note_info
=======================
0 0 fs/binfmt_elf.c:2057 func:elf_core_dump
0 0 fs/binfmt_elf.c:2057 func:elf_core_dump
=======================
0 0 fs/binfmt_elf.c:2072 func:elf_core_dump
0 0 fs/binfmt_elf.c:2072 func:elf_core_dump
=======================
0 0 fs/binfmt_elf.c:532 func:load_elf_phdrs
0 0 fs/binfmt_elf.c:532 func:load_elf_phdrs
=======================
0 0 fs/binfmt_elf.c:885 func:load_elf_binary
0 0 fs/binfmt_elf.c:885 func:load_elf_binary
=======================
0 0 fs/binfmt_elf.c:910 func:load_elf_binary
0 0 fs/binfmt_elf.c:910 func:load_elf_binary
=======================
0 0 fs/fuse/fuse_i.h:1082 [fuse] func:fuse_folios_alloc
0 0 fs/fuse/fuse_i.h:1082 [fuse] func:fuse_folios_alloc
=======================
0 0 io_uring/io_uring.h:253 func:io_uring_alloc_async_data
0 0 io_uring/io_uring.h:253 func:io_uring_alloc_async_data
0 0 io_uring/io_uring.h:253 func:io_uring_alloc_async_data
0 0 io_uring/io_uring.h:253 func:io_uring_alloc_async_data
0 0 io_uring/io_uring.h:253 func:io_uring_alloc_async_data
=======================
0 0 kernel/sched/sched.h:2587 func:alloc_user_cpus_ptr
0 0 kernel/sched/sched.h:2587 func:alloc_user_cpus_ptr
=======================
0 0 mm/mm_slot.h:28 func:mm_slot_alloc
160 4 mm/mm_slot.h:28 func:mm_slot_alloc
=======================
0 0 security/apparmor/domain.c:1136 func:change_hat
0 0 security/apparmor/domain.c:1136 func:change_hat
=======================
0 0 security/apparmor/domain.c:1455 func:aa_change_profile
0 0 security/apparmor/domain.c:1455 func:aa_change_profile
=======================
0 0 security/apparmor/domain.c:835 func:handle_onexec
0 0 security/apparmor/domain.c:835 func:handle_onexec
=======================
0 0 security/apparmor/domain.c:909 func:apparmor_bprm_creds_for_exec
0 0 security/apparmor/domain.c:909 func:apparmor_bprm_creds_for_exec
=======================
0 0 security/apparmor/mount.c:738 func:aa_pivotroot
0 0 security/apparmor/mount.c:738 func:aa_pivotroot
My workstation have my own changes based on 6.15-rc5, but I didn't touch any code about tags...
The KVM runs 6.15-rc5.
The script for checking:
#!/bin/env python
def fetch():
r = {}
with open("/proc/allocinfo") as f:
for l in f:
f = l.strip().split()[2]
if f not in r: r[f]=[]
r[f].append(l)
keys = []
for f, ls in r.items():
if len(ls) > 1: keys.append(f)
keys.sort()
for f in keys:
print "======================="
for l in r[f]: print l,
fetch()
David
>>
Powered by blists - more mailing lists