lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJuCfpHEt2n6sA7m5zvc-F+z=3-twVEKfVGCa0+y62bT10b0Bw@mail.gmail.com>
Date: Fri, 5 Apr 2024 07:14:13 -0700
From: Suren Baghdasaryan <surenb@...gle.com>
To: Klara Modin <klarasmodin@...il.com>
Cc: akpm@...ux-foundation.org, kent.overstreet@...ux.dev, mhocko@...e.com, 
	vbabka@...e.cz, hannes@...xchg.org, roman.gushchin@...ux.dev, mgorman@...e.de, 
	dave@...olabs.net, willy@...radead.org, liam.howlett@...cle.com, 
	penguin-kernel@...ove.sakura.ne.jp, corbet@....net, void@...ifault.com, 
	peterz@...radead.org, juri.lelli@...hat.com, catalin.marinas@....com, 
	will@...nel.org, arnd@...db.de, tglx@...utronix.de, mingo@...hat.com, 
	dave.hansen@...ux.intel.com, x86@...nel.org, peterx@...hat.com, 
	david@...hat.com, axboe@...nel.dk, mcgrof@...nel.org, masahiroy@...nel.org, 
	nathan@...nel.org, dennis@...nel.org, jhubbard@...dia.com, tj@...nel.org, 
	muchun.song@...ux.dev, rppt@...nel.org, paulmck@...nel.org, 
	pasha.tatashin@...een.com, yosryahmed@...gle.com, yuzhao@...gle.com, 
	dhowells@...hat.com, hughd@...gle.com, andreyknvl@...il.com, 
	keescook@...omium.org, ndesaulniers@...gle.com, vvvvvv@...gle.com, 
	gregkh@...uxfoundation.org, ebiggers@...gle.com, ytcoode@...il.com, 
	vincent.guittot@...aro.org, dietmar.eggemann@....com, rostedt@...dmis.org, 
	bsegall@...gle.com, bristot@...hat.com, vschneid@...hat.com, cl@...ux.com, 
	penberg@...nel.org, iamjoonsoo.kim@....com, 42.hyeyoo@...il.com, 
	glider@...gle.com, elver@...gle.com, dvyukov@...gle.com, 
	songmuchun@...edance.com, jbaron@...mai.com, aliceryhl@...gle.com, 
	rientjes@...gle.com, minchan@...gle.com, kaleshsingh@...gle.com, 
	kernel-team@...roid.com, linux-doc@...r.kernel.org, 
	linux-kernel@...r.kernel.org, iommu@...ts.linux.dev, 
	linux-arch@...r.kernel.org, linux-fsdevel@...r.kernel.org, linux-mm@...ck.org, 
	linux-modules@...r.kernel.org, kasan-dev@...glegroups.com, 
	cgroups@...r.kernel.org
Subject: Re: [PATCH v6 00/37] Memory allocation profiling

On Fri, Apr 5, 2024 at 6:37 AM Klara Modin <klarasmodin@...il.com> wrote:
>
> Hi,
>
> On 2024-03-21 17:36, Suren Baghdasaryan wrote:
> > Overview:
> > Low overhead [1] per-callsite memory allocation profiling. Not just for
> > debug kernels, overhead low enough to be deployed in production.
> >
> > Example output:
> >    root@...ia-kvm:~# sort -rn /proc/allocinfo
> >     127664128    31168 mm/page_ext.c:270 func:alloc_page_ext
> >      56373248     4737 mm/slub.c:2259 func:alloc_slab_page
> >      14880768     3633 mm/readahead.c:247 func:page_cache_ra_unbounded
> >      14417920     3520 mm/mm_init.c:2530 func:alloc_large_system_hash
> >      13377536      234 block/blk-mq.c:3421 func:blk_mq_alloc_rqs
> >      11718656     2861 mm/filemap.c:1919 func:__filemap_get_folio
> >       9192960     2800 kernel/fork.c:307 func:alloc_thread_stack_node
> >       4206592        4 net/netfilter/nf_conntrack_core.c:2567 func:nf_ct_alloc_hashtable
> >       4136960     1010 drivers/staging/ctagmod/ctagmod.c:20 [ctagmod] func:ctagmod_start
> >       3940352      962 mm/memory.c:4214 func:alloc_anon_folio
> >       2894464    22613 fs/kernfs/dir.c:615 func:__kernfs_new_node
> >       ...
> >
> > Since v5 [2]:
> > - Added Reviewed-by and Acked-by, per Vlastimil Babka and Miguel Ojeda
> > - Changed pgalloc_tag_{add|sub} to use number of pages instead of order, per Matthew Wilcox
> > - Changed pgalloc_tag_sub_bytes to pgalloc_tag_sub_pages and adjusted the usage, per Matthew Wilcox
> > - Moved static key check before prepare_slab_obj_exts_hook(), per Vlastimil Babka
> > - Fixed RUST helper, per Miguel Ojeda
> > - Fixed documentation, per Randy Dunlap
> > - Rebased over mm-unstable
> >
> > Usage:
> > kconfig options:
> >   - CONFIG_MEM_ALLOC_PROFILING
> >   - CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT
> >   - CONFIG_MEM_ALLOC_PROFILING_DEBUG
> >     adds warnings for allocations that weren't accounted because of a
> >     missing annotation
> >
> > sysctl:
> >    /proc/sys/vm/mem_profiling
> >
> > Runtime info:
> >    /proc/allocinfo
> >
> > Notes:
> >
> > [1]: Overhead
> > To measure the overhead we are comparing the following configurations:
> > (1) Baseline with CONFIG_MEMCG_KMEM=n
> > (2) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
> >      CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n)
> > (3) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
> >      CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y)
> > (4) Enabled at runtime (CONFIG_MEM_ALLOC_PROFILING=y &&
> >      CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n && /proc/sys/vm/mem_profiling=1)
> > (5) Baseline with CONFIG_MEMCG_KMEM=y && allocating with __GFP_ACCOUNT
> > (6) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
> >      CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n)  && CONFIG_MEMCG_KMEM=y
> > (7) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
> >      CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y) && CONFIG_MEMCG_KMEM=y
> >
> > Performance overhead:
> > To evaluate performance we implemented an in-kernel test executing
> > multiple get_free_page/free_page and kmalloc/kfree calls with allocation
> > sizes growing from 8 to 240 bytes with CPU frequency set to max and CPU
> > affinity set to a specific CPU to minimize the noise. Below are results
> > from running the test on Ubuntu 22.04.2 LTS with 6.8.0-rc1 kernel on
> > 56 core Intel Xeon:
> >
> >                          kmalloc                 pgalloc
> > (1 baseline)            6.764s                  16.902s
> > (2 default disabled)    6.793s  (+0.43%)        17.007s (+0.62%)
> > (3 default enabled)     7.197s  (+6.40%)        23.666s (+40.02%)
> > (4 runtime enabled)     7.405s  (+9.48%)        23.901s (+41.41%)
> > (5 memcg)               13.388s (+97.94%)       48.460s (+186.71%)
> > (6 def disabled+memcg)  13.332s (+97.10%)       48.105s (+184.61%)
> > (7 def enabled+memcg)   13.446s (+98.78%)       54.963s (+225.18%)
> >
> > Memory overhead:
> > Kernel size:
> >
> >     text           data        bss         dec         diff
> > (1) 26515311        18890222    17018880    62424413
> > (2) 26524728        19423818    16740352    62688898    264485
> > (3) 26524724        19423818    16740352    62688894    264481
> > (4) 26524728        19423818    16740352    62688898    264485
> > (5) 26541782        18964374    16957440    62463596    39183
> >
> > Memory consumption on a 56 core Intel CPU with 125GB of memory:
> > Code tags:           192 kB
> > PageExts:         262144 kB (256MB)
> > SlabExts:           9876 kB (9.6MB)
> > PcpuExts:            512 kB (0.5MB)
> >
> > Total overhead is 0.2% of total memory.
> >
> > Benchmarks:
> >
> > Hackbench tests run 100 times:
> > hackbench -s 512 -l 200 -g 15 -f 25 -P
> >        baseline       disabled profiling           enabled profiling
> > avg   0.3543         0.3559 (+0.0016)             0.3566 (+0.0023)
> > stdev 0.0137         0.0188                       0.0077
> >
> >
> > hackbench -l 10000
> >        baseline       disabled profiling           enabled profiling
> > avg   6.4218         6.4306 (+0.0088)             6.5077 (+0.0859)
> > stdev 0.0933         0.0286                       0.0489
> >
> > stress-ng tests:
> > stress-ng --class memory --seq 4 -t 60
> > stress-ng --class cpu --seq 4 -t 60
> > Results posted at: https://evilpiepirate.org/~kent/memalloc_prof_v4_stress-ng/
> >
> > [2] https://lore.kernel.org/all/20240306182440.2003814-1-surenb@google.com/
>
> If I enable this, I consistently get percpu allocation failures. I can
> occasionally reproduce it in qemu. I've attached the logs and my config,
> please let me know if there's anything else that could be relevant.

Thanks for the report!
In debug_alloc_profiling.log I see:

[    7.445127] percpu: limit reached, disable warning

That's probably the reason. I'll take a closer look at the cause of
that and how we can fix it.

 In qemu-alloc3.log I see couple of warnings:

[    1.111620] alloc_tag was not set
[    1.111880] WARNING: CPU: 0 PID: 164 at
include/linux/alloc_tag.h:118 kfree (./include/linux/alloc_tag.h:118
(discriminator 1) ./include/linux/alloc_tag.h:161 (discriminator 1)
mm/slub.c:2043 ...

[    1.161710] alloc_tag was not cleared (got tag for fs/squashfs/cache.c:413)
[    1.162289] WARNING: CPU: 0 PID: 195 at
include/linux/alloc_tag.h:109 kmalloc_trace_noprof
(./include/linux/alloc_tag.h:109 (discriminator 1)
/include/linux/alloc_tag.h:149 (discriminator 1) ...

Which means we missed to instrument some allocation. Can you please
check if disabling CONFIG_MEM_ALLOC_PROFILING_DEBUG fixes QEMU case?
In the meantime I'll try to reproduce and fix this.
Thanks,
Suren.



>
> Kind regards,
> Klara Modin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ