lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 25 Nov 2006 23:30:23 -0800
From:	Andrew Morton <akpm@...l.org>
To:	Dave Jones <davej@...hat.com>
Cc:	"Martin J. Bligh" <mbligh@...igh.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Andy Whitcroft <apw@...dowen.org>,
	Larry Woodman <lwoodman@...hat.com>
Subject: Re: OOM killer firing on 2.6.18 and later during LTP runs

On Sun, 26 Nov 2006 02:25:38 -0500
Dave Jones <davej@...hat.com> wrote:

> On Sat, Nov 25, 2006 at 11:11:53PM -0800, Andrew Morton wrote:
>  > On Sat, 25 Nov 2006 22:00:45 -0500
>  > Dave Jones <davej@...hat.com> wrote:
>  > 
>  > > On Sat, Nov 25, 2006 at 01:28:28PM -0800, Andrew Morton wrote:
>  > >  > On Sat, 25 Nov 2006 13:03:45 -0800
>  > >  > "Martin J. Bligh" <mbligh@...igh.org> wrote:
>  > >  > 
>  > >  > > On 2.6.18-rc7 and later during LTP:
>  > >  > > http://test.kernel.org/abat/48393/debug/console.log
>  > >  > 
>  > >  > The traces are a bit confusing, but I don't actually see anything wrong
>  > >  > there.  The machine has used up all swap, has used up all memory and has
>  > >  > correctly gone and killed things.  After that, there's free memory again.
>  > > 
>  > > We covered this a month or two back.  For RHEL5, we've ended up
>  > > reintroducing the oom killer prevention logic that we had up until
>  > > circa 2.6.10.   It seemed that there exist circumstances where
>  > > given a little more time, some memory hogging apps will run to completion
>  > > allowing other allocators to succeed instead of being killed.
>  > 
>  > I _think_ what you're describing here is a false-positive oom-killing?  But
>  > Martin appears to be hitting a genuine oom.
>  
> what we saw during the rhel5 testing was that yes, the machine _was_ OOM
> *temporarily*, but if instead of killing the task trying to allocate, we
> postponed the killing a few times, it would give other tasks the opportunity
> to complete writeout, or free up memory some other way, allowing the
> allocating process to succeed shortly afterwards.

That would be a false positive then.

In Martin's case he's 100% out of swapspace and has only a few tens of
pages letf mapped into pagetables, so Iassume that all memory is unmapped
swapcache (but that cannot be confirmed from the info which we have).  But
it looks like a real oom.

That's not to say that we don't have omm-killer problems.

>  > But it does appear that some changes are needed, because lots of things got
>  > oom-killed.
>  >
>  > I think.  Maybe not - there's no timestamping in those logs and it is of
>  > course possible that we're seeing unrelated ooms which happened a long time
>  > apart.
> 
> Maybe, but it does sound spookily familiar.
> The last time Larry's patch got floated to lkml it was met with
> "Ah!, but we have new oom killer changes in -git which might solve this".
> We tried them. They didn't.

What's the testcase?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ