lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 06 Oct 2015 09:52:50 -0500
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Michal Hocko <mhocko@...nel.org>,
	Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
	David Rientjes <rientjes@...gle.com>,
	Oleg Nesterov <oleg@...hat.com>,
	Kyle Walker <kwalker@...hat.com>,
	Christoph Lameter <cl@...ux.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Johannes Weiner <hannes@...xchg.org>,
	Vladimir Davydov <vdavydov@...allels.com>,
	linux-mm <linux-mm@...ck.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Stanislav Kozina <skozina@...hat.com>
Subject: Re: can't oom-kill zap the victim's memory?

Linus Torvalds <torvalds@...ux-foundation.org> writes:

> On Tue, Oct 6, 2015 at 9:49 AM, Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
>>
>> The basic fact remains: kernel allocations are so important that
>> rather than fail, you should kill user space. Only kernel allocations
>> that *explicitly* know that they have fallback code should fail, and
>> they should just do the __GFP_NORETRY.

If you have reached the point of killing userspace you might as well
panic the box.  Userspace will recover more cleanly and more quickly.
The oom-killer is like an oops.  Nice for debugging but not something
you want on a production workload.

> To be clear: "big" orders (I forget if the limit is at order-3 or
> order-4) do fail much more aggressively. But no, we do not limit retry
> to just order-0, because even small kmalloc sizes tend to often do
> order-1 or order-2 just because of memory packing issues (ie trying to
> pack into a single page wastes too much memory if the allocation sizes
> don't come out right).

I am not asking that we limit retry to just order-0 pages.  I am asking
that we limit the oom-killer on failure to just order-0 pages.

> So no, order-0 isn't special. 1/2 are rather important too.

That is a justification for retrying.  That is not a justification for
killing the box.

> [ Checking /proc/slabinfo: it looks like several slabs are order-3,
> for things like files_cache, signal_cache and sighand_cache for me at
> least. So I think it's up to order-3 that we basically need to
> consider "we'll need to shrink user space aggressively unless we have
> an explicit fallback for the allocation" ]

What I know is that order-3 is definitely too big.  I had 4G of RAM
free.  I needed 16K to exapand the fd table.  The box died.  That is
not good.

We have static checkers now, failure to check and handle errors tends to
be caught.

So yes for the rare case of order-[123] allocations failing we should
return the failure to the caller.  The kernel can handle it.  Userspace
can handle just about anything better than random processes dying.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ