[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.1006031313040.10856@chino.kir.corp.google.com>
Date: Thu, 3 Jun 2010 13:26:25 -0700 (PDT)
From: David Rientjes <rientjes@...gle.com>
To: Oleg Nesterov <oleg@...hat.com>
cc: KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
"Luis Claudio R. Goncalves" <lclaudio@...g.org>,
LKML <linux-kernel@...r.kernel.org>,
linux-mm <linux-mm@...ck.org>,
Andrew Morton <akpm@...ux-foundation.org>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
Nick Piggin <npiggin@...e.de>
Subject: Re: [PATCH 09/12] oom: remove PF_EXITING check completely
On Thu, 3 Jun 2010, Oleg Nesterov wrote:
> On 06/02, David Rientjes wrote:
> >
> > On Thu, 3 Jun 2010, KOSAKI Motohiro wrote:
> >
> > > Currently, PF_EXITING check is completely broken. because 1) It only
> > > care main-thread and ignore sub-threads
> >
> > Then check the subthreads.
> >
Did you want to respond to this?
> > > 2) If user enable core-dump
> > > feature, it can makes deadlock because the task during coredump ignore
> > > SIGKILL.
> > >
> >
> > It may ignore SIGKILL, but does not ignore fatal_signal_pending() being
> > true
>
> Wrong.
>
> Unless the oom victim is exactly the thread which dumps the core,
> fatal_signal_pending() won't be true for the dumper. Even if the
> victim and the dumper are from the same group, this thread group
> already has SIGNAL_GROUP_EXIT. And if they do not belong to the
> same group, SIGKILL has even less effect.
>
I'm guessing at the relevancy here because the changelog is extremely
poorly worded (if I were Andrew I would have no idea how important this
patch is based on the description other than the alarmist words of "... is
completely broken)", but if we're concerned about the coredumper not being
able to find adequate resources to allocate memory from, we can give it
access to reserves specifically, we don't need to go killing additional
tasks which may have their own coredumpers.
> Even if we chose the right thread we can race with
> clear_thread_flag(TIF_SIGPENDING), but fatal_signal_pending()
> checks signal_pending().
>
> > which gives it access to memory reserves with my patchset
>
> __get_user_pages() already checks fatal_signal_pending(), this
> is where the dumper allocates the memory (mostly).
>
> And I am not sure I understand the "access to memory reserves",
> the dumper should just stop if oom-kill decides it should die,
> it can use a lot more memory if it doesn't stop.
>
That's an alternative solution as well, but I'm disagreeing with the
approach here because this enforces absolutely no guarantee that the next
task to be oom killed will be the coredumper, its much more likely that
we're just going to kill yet another task for the coredump. That task may
have a coredumper too. Who knows.
> > Nacked-by: David Rientjes <rientjes@...gle.com>
>
> Kosaki removes the code which only pretends to work, but it doesn't
> and leads to problems.
>
LOL, this code doesn't pretend to work, we rely heavily on it and have for
three years to prevent needless oom killing. We use the oom killer
constantly on every machine with memory isolation for enforcing a policy,
as the memcg will as well as it becomes more popular. Killing a job's
task is a serious matter and we'd prefer to kill as few as possible to
free memory. That's the entire motivation behind having a badness()
heuristic in the first place: so we don't kill tons of tasks to free
enough memory. Removing this check specifically will cause the oom killer
to kill tasks when it is currently a no-op and will free memory and we
have never had a problem with these so-called "deadlocks" that are being
found only by code inspection. So please care a little bit more about the
ramifications of other people's use of Linux before you go and insist that
certain code doesn't do a complete job in certain cases or it can
introduce a deadlock in situations that are anything but from the real
world must be removed entirely even though it has saved tons of our jobs
over the course of the past three years.
> If you think we need this check, imho it is better to make the patch
> which adds the "right" code with the nice changelog explaining how
> this code works.
>
I've got no objection to that, but until you find the right solution,
please don't remove what works for people in practice. Show me an oom
deadlock as the result of this and the use of sane applications. Nobody
reports them because these are such ridiculous cornercases that are being
addressed by sweeping and extreme code changes without other people's
input being considered.
Since Google uses the oom killer to enforce memory isolation on every one
of our machines, and we do it at a scale that is unparalleled to anybody
else, I think you should consider this input.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists