linux-kernel - Re: [PATCH] trace: Set oom_score_adj to maximum for ring buffer allocating process

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1306500502.3857.26.camel@gandalf.stny.rr.com>
Date:	Fri, 27 May 2011 08:48:22 -0400
From:	Steven Rostedt <rostedt@...dmis.org>
To:	David Rientjes <rientjes@...gle.com>
Cc:	Vaibhav Nagarnaik <vnagarnaik@...gle.com>,
	Ingo Molnar <mingo@...hat.com>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Michael Rubin <mrubin@...gle.com>,
	David Sharp <dhsharp@...gle.com>, linux-kernel@...r.kernel.org,
	Peter Zijlstra <peterz@...radead.org>,
	Mel Gorman <mel@....ul.ie>, Rik Van Riel <riel@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] trace: Set oom_score_adj to maximum for ring buffer
 allocating process

On Fri, 2011-05-27 at 02:43 -0700, David Rientjes wrote:

> This problem isn't new, it's always been possible that an allocation that 
> is higher order, using GFP_ATOMIC or GFP_NOWAIT, or utilizing 
> __GFP_NORETRY as I suggested here, would deplete memory at the same time 
> that a GFP_FS allocation on another cpu would invoke the oom killer.
> 
> If that happens between the time when tracing_resize_ring_buffer() goes 
> oom and its nicely written error handling starts freeing memory, then it's 
> possible that another task will be unfairly oom killed.  Note that the 
> suggested solution of test_set_oom_score_adj(OOM_SCORE_ADJ_MAX) doesn't 
> prevent that in all cases: it's possible that another thread on the system 
> also has an oom_score_adj of OOM_SCORE_ADJ_MAX and it would be killed in 
> its place just because it appeared in the tasklist first (which is 
> guaranteed if this is simply an echo command).
> 
> Relying on the oom killer to kill this task for parallel blockable 
> allocations doesn't seem like the best solution for the sole reason that 
> the program that wrote to buffer_size_kb may count on its return value.  
> It may be able to handle an -ENOMEM return value and, perhaps, try to 
> write a smaller value?
> 
> I think what this patch really wants to do is utilize __GFP_NORETRY as 
> previously suggested and, if we're really concerned about parallel 
> allocations in this instance even though the same situation exists all 
> over the kernel, also create an oom notifier with register_oom_notifier() 
> that may be called in oom conditions that would free memory when 
> buffer->record_disabled is non-zero and prevent the oom.  That would 
> increase the size of the ring buffer as large as possible up until oom 
> even though it may not be to what the user requested.
> 
> Otherwise, you'll just want to use oom_killer_disable() to preven the oom 
> killer altogether.

Thanks for the detailed explanation. OK, I'm convinced. The proper
solution looks to be both the use of __GFP_NORETRY and the use of
test_set_oom_score_adj(). I don't think it is necessary to worry about
multiple users of this score adj, as when we are in an OOM situation,
things are just bad to begin with. But Google has to deal with bad
situations more than others, so if it becomes an issue for you, then we
can discuss those changes later.

Vaibhav, can you send an updated patch?

Thanks,

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/