linux-kernel - Re: [PATCH] trace: Set oom_score_adj to maximum for ring buffer allocating process

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1306443612.3857.15.camel@gandalf.stny.rr.com>
Date:	Thu, 26 May 2011 17:00:12 -0400
From:	Steven Rostedt <rostedt@...dmis.org>
To:	David Rientjes <rientjes@...gle.com>
Cc:	Vaibhav Nagarnaik <vnagarnaik@...gle.com>,
	Ingo Molnar <mingo@...hat.com>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Michael Rubin <mrubin@...gle.com>,
	David Sharp <dhsharp@...gle.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] trace: Set oom_score_adj to maximum for ring buffer
 allocating process

On Thu, 2011-05-26 at 13:33 -0700, David Rientjes wrote:
> On Thu, 26 May 2011, Vaibhav Nagarnaik wrote:
> 

> 
> Not sure that's true, this is allocating with kzalloc_node(GFP_KERNEL), 
> correct?  If current is oom killed, it will have access to all memory 
> reserves which will increase the liklihood that the allocation will 
> succeed before handling the SIGKILL.

Actually it uses get_free_page()

> 
> > This API is now being used in other parts of kernel too, where it knows
> > that the allocation could cause OOM.
> > 
> 
> What's wrong with using __GFP_NORETRY to avoid oom killing entirely and 

I have no problem with NORETRY.

> then failing the ring buffer memory allocation?  Seems like a better 
> solution than relying on the oom killer, since there may be other threads 
> with a max oom_score_adj as well that would appear in the tasklist first 
> and get killed unnecessarily.  Is there some ring buffer code that can't 
> handle failing allocations appropriately?

The ring buffer code can handle failed allocations just fine, and will
free up the pages it allocated before a full success. It allocates the
pages one at a time and adds it to a list. After all pages it wants to
allocate has successfully been allocated, it then applies them to the
ring buffer. If it fails an allocation, all pages are freed that were
not added to the ring buffer yet.

But the issue is, if the process increasing the size of the ring buffer
causes the oom, it will not handle the SIGKILL until after the ring
buffer has finished allocating. Now, if it failed to allocate, then we
are fine, but if it does not fail, but now we start killing processes,
then we may be in trouble.

I like the NORETRY better. But then, would this mean that if we have a
lot of cached filesystems, we wont be able to extend the ring buffer?

I'm thinking the oom killer used here got lucky. As it killed this task,
we were still out of memory, and the ring buffer failed to get the
memory it needed and freed up everything that it previously allocated,
and returned. Then the process calling this function would be killed by
the OOM. Ideally, the process shouldn't be killed and the ring buffer
just returned -ENOMEM to the user.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/