[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20061126072538.GA5223@redhat.com>
Date: Sun, 26 Nov 2006 02:25:38 -0500
From: Dave Jones <davej@...hat.com>
To: Andrew Morton <akpm@...l.org>
Cc: "Martin J. Bligh" <mbligh@...igh.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Andy Whitcroft <apw@...dowen.org>,
Larry Woodman <lwoodman@...hat.com>
Subject: Re: OOM killer firing on 2.6.18 and later during LTP runs
On Sat, Nov 25, 2006 at 11:11:53PM -0800, Andrew Morton wrote:
> On Sat, 25 Nov 2006 22:00:45 -0500
> Dave Jones <davej@...hat.com> wrote:
>
> > On Sat, Nov 25, 2006 at 01:28:28PM -0800, Andrew Morton wrote:
> > > On Sat, 25 Nov 2006 13:03:45 -0800
> > > "Martin J. Bligh" <mbligh@...igh.org> wrote:
> > >
> > > > On 2.6.18-rc7 and later during LTP:
> > > > http://test.kernel.org/abat/48393/debug/console.log
> > >
> > > The traces are a bit confusing, but I don't actually see anything wrong
> > > there. The machine has used up all swap, has used up all memory and has
> > > correctly gone and killed things. After that, there's free memory again.
> >
> > We covered this a month or two back. For RHEL5, we've ended up
> > reintroducing the oom killer prevention logic that we had up until
> > circa 2.6.10. It seemed that there exist circumstances where
> > given a little more time, some memory hogging apps will run to completion
> > allowing other allocators to succeed instead of being killed.
>
> I _think_ what you're describing here is a false-positive oom-killing? But
> Martin appears to be hitting a genuine oom.
what we saw during the rhel5 testing was that yes, the machine _was_ OOM
*temporarily*, but if instead of killing the task trying to allocate, we
postponed the killing a few times, it would give other tasks the opportunity
to complete writeout, or free up memory some other way, allowing the
allocating process to succeed shortly afterwards.
> But it does appear that some changes are needed, because lots of things got
> oom-killed.
>
> I think. Maybe not - there's no timestamping in those logs and it is of
> course possible that we're seeing unrelated ooms which happened a long time
> apart.
Maybe, but it does sound spookily familiar.
The last time Larry's patch got floated to lkml it was met with
"Ah!, but we have new oom killer changes in -git which might solve this".
We tried them. They didn't.
Dave
--
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists