linux-kernel - Re: Can't we use timeout based OOM warning/killing?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <201510062351.JHJ57310.VFQLFHFOJtSMOO@I-love.SAKURA.ne.jp>
Date:	Tue, 6 Oct 2015 23:51:49 +0900
From:	Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To:	mhocko@...nel.org
Cc:	rientjes@...gle.com, oleg@...hat.com,
	torvalds@...ux-foundation.org, kwalker@...hat.com, cl@...ux.com,
	akpm@...ux-foundation.org, hannes@...xchg.org,
	vdavydov@...allels.com, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, skozina@...hat.com
Subject: Re: Can't we use timeout based OOM warning/killing?

Tetsuo Handa wrote:
> Sorry. This was my misunderstanding. But I still think that we need to be
> prepared for cases where zapping OOM victim's mm approach fails.
> ( http://lkml.kernel.org/r/201509242050.EHE95837.FVFOOtMQHLJOFS@I-love.SAKURA.ne.jp )

I tested whether it is easy/difficult to make zapping OOM victim's mm
approach fail. The result seems that not difficult to make it fail.

---------- Reproducer start ----------
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sched.h>
#include <sys/mman.h>

static int reader(void *unused)
{
	char c;
	int fd = open("/proc/self/cmdline", O_RDONLY);
	while (pread(fd, &c, 1, 0) == 1);
	return 0;
}

static int writer(void *unused)
{
	const int fd = open("/proc/self/exe", O_RDONLY);
	static void *ptr[10000];
	int i;
	sleep(2);
	while (1) {
		for (i = 0; i < 10000; i++)
			ptr[i] = mmap(NULL, 4096, PROT_READ, MAP_PRIVATE, fd,
				      0);
		for (i = 0; i < 10000; i++)
			munmap(ptr[i], 4096);
	}
	return 0;
}

int main(int argc, char *argv[])
{
	int zero_fd = open("/dev/zero", O_RDONLY);
	char *buf = NULL;
	unsigned long size = 0;
	int i;
	for (size = 1048576; size < 512UL * (1 << 30); size <<= 1) {
		char *cp = realloc(buf, size);
		if (!cp) {
			size >>= 1;
			break;
		}
		buf = cp;
	}
	for (i = 0; i < 100; i++) {
		clone(reader, malloc(1024) + 1024, CLONE_THREAD | CLONE_SIGHAND | CLONE_VM,
		      NULL);
	}
	clone(writer, malloc(1024) + 1024, CLONE_THREAD | CLONE_SIGHAND | CLONE_VM, NULL);
	read(zero_fd, buf, size); /* Will cause OOM due to overcommit */
	return * (char *) NULL; /* Kill all threads. */
}
---------- Reproducer end ----------

(I wrote this program for trying to mimic a trouble that a customer's system
 hung up with a lot of ps processes blocked at reading /proc/pid/ entries
 due to unkillable down_read(&mm->mmap_sem) in __access_remote_vm(). Though
 I couldn't identify what function was holding the mmap_sem for writing...)

Uptime > 429 of http://I-love.SAKURA.ne.jp/tmp/serial-20151006.txt.xz showed
a OOM livelock that

  (1) thread group leader is blocked at down_read(&mm->mmap_sem) in exit_mm()
      called from do_exit().

  (2) writer thread is blocked at down_write(&mm->mmap_sem) in vm_mmap_pgoff()
      called from SyS_mmap_pgoff() called from SyS_mmap().

  (3) many reader threads are blocking the writer thread because of
      down_read(&mm->mmap_sem) called from proc_pid_cmdline_read().

  (4) while the thread group leader is blocked at down_read(&mm->mmap_sem),
      some of the reader threads are trying to allocate memory via page fault.

So, zapping the first OOM victim's mm might fail by chance.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/