linux-kernel - Re: [PATCH] trace: Set oom_score_adj to maximum for ring buffer allocating process

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.00.1105270220580.22108@chino.kir.corp.google.com>
Date:	Fri, 27 May 2011 02:43:26 -0700 (PDT)
From:	David Rientjes <rientjes@...gle.com>
To:	Steven Rostedt <rostedt@...dmis.org>
cc:	Vaibhav Nagarnaik <vnagarnaik@...gle.com>,
	Ingo Molnar <mingo@...hat.com>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Michael Rubin <mrubin@...gle.com>,
	David Sharp <dhsharp@...gle.com>, linux-kernel@...r.kernel.org,
	Peter Zijlstra <peterz@...radead.org>,
	Mel Gorman <mel@....ul.ie>, Rik Van Riel <riel@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] trace: Set oom_score_adj to maximum for ring buffer
 allocating process

On Thu, 26 May 2011, Steven Rostedt wrote:

> > What do you think of this?
> > 
> > test_set_oom_score_adj(MAXIMUM);
> > allocate_ring_buffer(GFP_KERNEL | __GFP_NORETRY);
> > test_set_oom_score_adj(original);
> > 
> > This makes sure that the allocation fails much sooner and more
> > gracefully. If oom-killer is invoked in any circumstance, then the ring
> > buffer allocation process gives up memory and is killed.
> 
> I don't know. But as I never seen this function before, I went and took
> a look. This test_set_oom_score_adj() is new, and coincidentally written
> by another google developer ;)
> 

Ignore the history of function, it simply duplicates the old PF_OOM_ORIGIN 
flag that is now removed.

> As there's not really a precedence to this, if those that I added to the
> Cc, give their acks, I'm happy to apply this for the next merge window.
> 

This problem isn't new, it's always been possible that an allocation that 
is higher order, using GFP_ATOMIC or GFP_NOWAIT, or utilizing 
__GFP_NORETRY as I suggested here, would deplete memory at the same time 
that a GFP_FS allocation on another cpu would invoke the oom killer.

If that happens between the time when tracing_resize_ring_buffer() goes 
oom and its nicely written error handling starts freeing memory, then it's 
possible that another task will be unfairly oom killed.  Note that the 
suggested solution of test_set_oom_score_adj(OOM_SCORE_ADJ_MAX) doesn't 
prevent that in all cases: it's possible that another thread on the system 
also has an oom_score_adj of OOM_SCORE_ADJ_MAX and it would be killed in 
its place just because it appeared in the tasklist first (which is 
guaranteed if this is simply an echo command).

Relying on the oom killer to kill this task for parallel blockable 
allocations doesn't seem like the best solution for the sole reason that 
the program that wrote to buffer_size_kb may count on its return value.  
It may be able to handle an -ENOMEM return value and, perhaps, try to 
write a smaller value?

I think what this patch really wants to do is utilize __GFP_NORETRY as 
previously suggested and, if we're really concerned about parallel 
allocations in this instance even though the same situation exists all 
over the kernel, also create an oom notifier with register_oom_notifier() 
that may be called in oom conditions that would free memory when 
buffer->record_disabled is non-zero and prevent the oom.  That would 
increase the size of the ring buffer as large as possible up until oom 
even though it may not be to what the user requested.

Otherwise, you'll just want to use oom_killer_disable() to preven the oom 
killer altogether.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/