linux-kernel - Re: [question] how to figure out OOM reason? should dump slab/vmalloc info when OOM?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.02.1401211236520.10355@chino.kir.corp.google.com>
Date:	Tue, 21 Jan 2014 12:41:41 -0800 (PST)
From:	David Rientjes <rientjes@...gle.com>
To:	Jianguo Wu <wujianguo@...wei.com>
cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Johannes Weiner <hannes@...xchg.org>,
	Rik van Riel <riel@...hat.com>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: Re: [question] how to figure out OOM reason? should dump slab/vmalloc
 info when OOM?

On Tue, 21 Jan 2014, Jianguo Wu wrote:

> > The problem is that slabinfo becomes excessively verbose and dumping it 
> > all to the kernel log often times causes important messages to be lost.  
> > This is why we control things like the tasklist dump with a VM sysctl.  It 
> > would be possible to dump, say, the top ten slab caches with the highest 
> > memory usage, but it will only be helpful for slab leaks.  Typically there 
> > are better debugging tools available than analyzing the kernel log; if you 
> > see unusually high slab memory in the meminfo dump, you can enable it.
> > 
> 
> But, when OOM has happened, we can only use kernel log, slab/vmalloc info from proc
> is stale. Maybe we can dump slab/vmalloc with a VM sysctl, and only top 10/20 entrys?
> 

You could, but it's a tradeoff between how much to dump to a general 
resource such as the kernel log and how many sysctls we add that control 
every possible thing.  Slab leaks would definitely be a minority of oom 
conditions and you should normally be able to reproduce them by running 
the same workload; just use slabtop(1) or manually inspect /proc/slabinfo 
while such a workload is running for indicators.  I don't think we want to 
add the information by default, though, nor do we want to add sysctls to 
control the behavior (you'd still need to reproduce the issue after 
enabling it).

We are currently discussing userspace oom handlers, though, that would 
allow you to run a process that would be notified and allowed to allocate 
a small amount of memory on oom conditions.  It would then be trivial to 
dump any information you feel pertinent in userspace prior to killing 
something.  I like to inspect heap profiles for memory hogs while 
debugging our malloc() issues, for example, and you could look more 
closely at kernel memory.

I'll cc you on future discussions of that feature.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/