linux-kernel - Re: can't oom-kill zap the victim's memory?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87lhbgf3r1.fsf@x220.int.ebiederm.org>
Date:	Tue, 06 Oct 2015 09:52:50 -0500
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Michal Hocko <mhocko@...nel.org>,
	Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
	David Rientjes <rientjes@...gle.com>,
	Oleg Nesterov <oleg@...hat.com>,
	Kyle Walker <kwalker@...hat.com>,
	Christoph Lameter <cl@...ux.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Johannes Weiner <hannes@...xchg.org>,
	Vladimir Davydov <vdavydov@...allels.com>,
	linux-mm <linux-mm@...ck.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Stanislav Kozina <skozina@...hat.com>
Subject: Re: can't oom-kill zap the victim's memory?

Linus Torvalds <torvalds@...ux-foundation.org> writes:

> On Tue, Oct 6, 2015 at 9:49 AM, Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
>>
>> The basic fact remains: kernel allocations are so important that
>> rather than fail, you should kill user space. Only kernel allocations
>> that *explicitly* know that they have fallback code should fail, and
>> they should just do the __GFP_NORETRY.

If you have reached the point of killing userspace you might as well
panic the box.  Userspace will recover more cleanly and more quickly.
The oom-killer is like an oops.  Nice for debugging but not something
you want on a production workload.

> To be clear: "big" orders (I forget if the limit is at order-3 or
> order-4) do fail much more aggressively. But no, we do not limit retry
> to just order-0, because even small kmalloc sizes tend to often do
> order-1 or order-2 just because of memory packing issues (ie trying to
> pack into a single page wastes too much memory if the allocation sizes
> don't come out right).

I am not asking that we limit retry to just order-0 pages.  I am asking
that we limit the oom-killer on failure to just order-0 pages.

> So no, order-0 isn't special. 1/2 are rather important too.

That is a justification for retrying.  That is not a justification for
killing the box.

> [ Checking /proc/slabinfo: it looks like several slabs are order-3,
> for things like files_cache, signal_cache and sighand_cache for me at
> least. So I think it's up to order-3 that we basically need to
> consider "we'll need to shrink user space aggressively unless we have
> an explicit fallback for the allocation" ]

What I know is that order-3 is definitely too big.  I had 4G of RAM
free.  I needed 16K to exapand the fd table.  The box died.  That is
not good.

We have static checkers now, failure to check and handle errors tends to
be caught.

So yes for the rare case of order-[123] allocations failing we should
return the failure to the caller.  The kernel can handle it.  Userspace
can handle just about anything better than random processes dying.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/