lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1306500502.3857.26.camel@gandalf.stny.rr.com>
Date:	Fri, 27 May 2011 08:48:22 -0400
From:	Steven Rostedt <rostedt@...dmis.org>
To:	David Rientjes <rientjes@...gle.com>
Cc:	Vaibhav Nagarnaik <vnagarnaik@...gle.com>,
	Ingo Molnar <mingo@...hat.com>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Michael Rubin <mrubin@...gle.com>,
	David Sharp <dhsharp@...gle.com>, linux-kernel@...r.kernel.org,
	Peter Zijlstra <peterz@...radead.org>,
	Mel Gorman <mel@....ul.ie>, Rik Van Riel <riel@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] trace: Set oom_score_adj to maximum for ring buffer
 allocating process

On Fri, 2011-05-27 at 02:43 -0700, David Rientjes wrote:

> This problem isn't new, it's always been possible that an allocation that 
> is higher order, using GFP_ATOMIC or GFP_NOWAIT, or utilizing 
> __GFP_NORETRY as I suggested here, would deplete memory at the same time 
> that a GFP_FS allocation on another cpu would invoke the oom killer.
> 
> If that happens between the time when tracing_resize_ring_buffer() goes 
> oom and its nicely written error handling starts freeing memory, then it's 
> possible that another task will be unfairly oom killed.  Note that the 
> suggested solution of test_set_oom_score_adj(OOM_SCORE_ADJ_MAX) doesn't 
> prevent that in all cases: it's possible that another thread on the system 
> also has an oom_score_adj of OOM_SCORE_ADJ_MAX and it would be killed in 
> its place just because it appeared in the tasklist first (which is 
> guaranteed if this is simply an echo command).
> 
> Relying on the oom killer to kill this task for parallel blockable 
> allocations doesn't seem like the best solution for the sole reason that 
> the program that wrote to buffer_size_kb may count on its return value.  
> It may be able to handle an -ENOMEM return value and, perhaps, try to 
> write a smaller value?
> 
> I think what this patch really wants to do is utilize __GFP_NORETRY as 
> previously suggested and, if we're really concerned about parallel 
> allocations in this instance even though the same situation exists all 
> over the kernel, also create an oom notifier with register_oom_notifier() 
> that may be called in oom conditions that would free memory when 
> buffer->record_disabled is non-zero and prevent the oom.  That would 
> increase the size of the ring buffer as large as possible up until oom 
> even though it may not be to what the user requested.
> 
> Otherwise, you'll just want to use oom_killer_disable() to preven the oom 
> killer altogether.

Thanks for the detailed explanation. OK, I'm convinced. The proper
solution looks to be both the use of __GFP_NORETRY and the use of
test_set_oom_score_adj(). I don't think it is necessary to worry about
multiple users of this score adj, as when we are in an OOM situation,
things are just bad to begin with. But Google has to deal with bad
situations more than others, so if it becomes an issue for you, then we
can discuss those changes later.

Vaibhav, can you send an updated patch?

Thanks,

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ