[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.1105270220580.22108@chino.kir.corp.google.com>
Date: Fri, 27 May 2011 02:43:26 -0700 (PDT)
From: David Rientjes <rientjes@...gle.com>
To: Steven Rostedt <rostedt@...dmis.org>
cc: Vaibhav Nagarnaik <vnagarnaik@...gle.com>,
Ingo Molnar <mingo@...hat.com>,
Frederic Weisbecker <fweisbec@...il.com>,
Michael Rubin <mrubin@...gle.com>,
David Sharp <dhsharp@...gle.com>, linux-kernel@...r.kernel.org,
Peter Zijlstra <peterz@...radead.org>,
Mel Gorman <mel@....ul.ie>, Rik Van Riel <riel@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] trace: Set oom_score_adj to maximum for ring buffer
allocating process
On Thu, 26 May 2011, Steven Rostedt wrote:
> > What do you think of this?
> >
> > test_set_oom_score_adj(MAXIMUM);
> > allocate_ring_buffer(GFP_KERNEL | __GFP_NORETRY);
> > test_set_oom_score_adj(original);
> >
> > This makes sure that the allocation fails much sooner and more
> > gracefully. If oom-killer is invoked in any circumstance, then the ring
> > buffer allocation process gives up memory and is killed.
>
> I don't know. But as I never seen this function before, I went and took
> a look. This test_set_oom_score_adj() is new, and coincidentally written
> by another google developer ;)
>
Ignore the history of function, it simply duplicates the old PF_OOM_ORIGIN
flag that is now removed.
> As there's not really a precedence to this, if those that I added to the
> Cc, give their acks, I'm happy to apply this for the next merge window.
>
This problem isn't new, it's always been possible that an allocation that
is higher order, using GFP_ATOMIC or GFP_NOWAIT, or utilizing
__GFP_NORETRY as I suggested here, would deplete memory at the same time
that a GFP_FS allocation on another cpu would invoke the oom killer.
If that happens between the time when tracing_resize_ring_buffer() goes
oom and its nicely written error handling starts freeing memory, then it's
possible that another task will be unfairly oom killed. Note that the
suggested solution of test_set_oom_score_adj(OOM_SCORE_ADJ_MAX) doesn't
prevent that in all cases: it's possible that another thread on the system
also has an oom_score_adj of OOM_SCORE_ADJ_MAX and it would be killed in
its place just because it appeared in the tasklist first (which is
guaranteed if this is simply an echo command).
Relying on the oom killer to kill this task for parallel blockable
allocations doesn't seem like the best solution for the sole reason that
the program that wrote to buffer_size_kb may count on its return value.
It may be able to handle an -ENOMEM return value and, perhaps, try to
write a smaller value?
I think what this patch really wants to do is utilize __GFP_NORETRY as
previously suggested and, if we're really concerned about parallel
allocations in this instance even though the same situation exists all
over the kernel, also create an oom notifier with register_oom_notifier()
that may be called in oom conditions that would free memory when
buffer->record_disabled is non-zero and prevent the oom. That would
increase the size of the ring buffer as large as possible up until oom
even though it may not be to what the user requested.
Otherwise, you'll just want to use oom_killer_disable() to preven the oom
killer altogether.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists