[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20220322005101.actefn6nttzeo2qr@moria.home.lan>
Date: Mon, 21 Mar 2022 20:51:01 -0400
From: Kent Overstreet <kent.overstreet@...il.com>
To: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
linux-fsdevel@...r.kernel.org, lsf-pc@...ts.linux-foundation.org
Subject: [LSF/MM TOPIC] Improving OOM debugging
Frustration when debugging OOMs, memory usage, and memory reclaim behaviour is a
topic I think a lot of us can relate to.
I think it might be worth having a talk to collectively air our frustrations and
collect ideas for improvements.
To start with: on memory allocation failure or OOM, we currently don't have a
lot to go on. We get information about the allocation that failed, and only very
coarse grained information about how memory is being tied up - page granural
informatian aka show_mem() is nigh useless in most situations, and slab granural
information is only slightly better.
I have a couple ideas I want to float:
- An old idea I've had and mentioned to some people before is to steal dynamic
debug's trick of statically allocating tracking structs in a special elf
section, and use it to wrap kmalloc(), alloc_pages() etc. calls for memory
allocation tracking _per call site_, and then available in debugs broken out
by file and line number.
This would be cheap enough that it could be always on in production, unlike
doing the same sort of thing with tracepoints. The cost would be another
pointer of overhead for each allocation - for page allocations we've got
CONFIG_PAGE_OWNER that does something like this (in a much more expensive
fashion), and the pointer it uses could be repurposed. For slub/slab I think
something analogous exists, but last I looked it'd probably need help from
those developers (in both cases, really; mm code is hairy).
- In bcachefs, I've been evolving a 'printbuf' thingy - heap allocated strings
that you can pass around and append to. They make it really convenient to
write pretty-printers for lots of things and pass them around, which in turn
has made my life considerably easier in the debugging realm.
I think that could be useful here: On a typical system shrinkers own a
signifcant fraction of non-pagecache kernel memory, and shrinkers have
internal state that's particular to each shrinker that's relevant to how much
memory is currently freeable (dirtyness, locking issues).
Imagine if shrinkers all had .to_text() methods, and then on memory
allocation failure we could call those and print them for top-10 shrinkers by
memory owned - in addition to sticking it in sysfs or debugfs.
Powered by blists - more mailing lists