linux-kernel - Re: [PATCH] Only print kernel debug information for OOMs caused by kernel allocations

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <20080128012718.65b7889a.akpm@linux-foundation.org>
Date:	Mon, 28 Jan 2008 01:27:18 -0800
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Andi Kleen <ak@...e.de>
Cc:	linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH] Only print kernel debug information for OOMs caused by
 kernel allocations

On Mon, 28 Jan 2008 10:11:57 +0100 Andi Kleen <ak@...e.de> wrote:

> On Monday 28 January 2008 09:56, Andrew Morton wrote:
> > On Mon, 28 Jan 2008 07:10:07 +0100 Andi Kleen <ak@...e.de> wrote:
> > > On Monday 28 January 2008 06:52, Andrew Morton wrote:
> > > > On Wed, 16 Jan 2008 23:24:21 +0100 Andi Kleen <ak@...e.de> wrote:
> > > > > I recently suffered an 20+ minutes oom thrash disk to death and
> > > > > computer completely unresponsive situation on my desktop when some
> > > > > user program decided to grab all memory. It eventually recovered, but
> > > > > left lots of ugly and imho misleading messages in the kernel log.
> > > > > here's a minor improvement
> > >
> > > As a followup this was with swap over dm crypt. I've recently heard
> > > about other people having trouble with this too so this setup seems to
> > > trigger something bad in the VM.
> >
> > Where's the backtrace and show_mem() output? :)
> 
> I don't have it anymore. You want me to reproduce it? I don't think
> I saw messages from the other people either; just heard complaints.

May as well - it doesn't sound like it'll fix itself...

> > > > That information is useful for working out why a userspace allocation
> > > > attempt failed.  If we don't print it, and the application gets killed
> > > > and thus frees a lot of memory, we will just never know why the
> > > > allocation failed.
> > >
> > > But it's basically only either page fault (direct or indirect) or write
> > > et.al. who do these page cache allocations. Do you really think it is
> > > that important to distingush these cases individually? In 95+% of all
> > > cases it should be a standard user page fault which always has the same
> > > backtrace.
> >
> > Sure, the backtrace isn't very important.  The show_mem() output is vital.
> 
> I see. So would the patch be acceptable if it only disabled the backtrace? 

Spose so.  The show_mem() spew is probably larger than the backtrace
though.

Are you sure we aren't doing dump_stack()/show_mem() mutiple times for a
single process?  If we are, that would mena the TIF_MEMDIE thing broke.

It must have been one heck of an oomkilling slaughter.

> > Plus an additional function call.  On the already-deep page allocation
> > path, I might add.
> 
> The function call is already there if the kernel has CPUSETs enabled.

s/CPUSETS/NUMA/, which makes rather a difference.

> And that is what distribution kernels usually do. And most users
> use distribution kernels or distribution .config.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/