lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 12 Oct 2015 15:43:45 +0900
From:	Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To:	mhocko@...nel.org
Cc:	rientjes@...gle.com, oleg@...hat.com,
	torvalds@...ux-foundation.org, kwalker@...hat.com, cl@...ux.com,
	akpm@...ux-foundation.org, hannes@...xchg.org,
	vdavydov@...allels.com, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, skozina@...hat.com
Subject: Re: Can't we use timeout based OOM warning/killing?

Tetsuo Handa wrote:
> So, zapping the first OOM victim's mm might fail by chance.

I retested with a slightly different version.

---------- Reproducer start ----------
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sched.h>
#include <sys/mman.h>

static int writer(void *unused)
{
	const int fd = open("/proc/self/exe", O_RDONLY);
	while (1) {
		void *ptr = mmap(NULL, 4096, PROT_READ, MAP_PRIVATE, fd, 0);
		munmap(ptr, 4096);
	}
	return 0;
}

int main(int argc, char *argv[])
{
	char buffer[128] = { };
	const pid_t pid = fork();
	if (pid == 0) { /* down_write(&mm->mmap_sem) requester which is chosen as an OOM victim. */
		int i;
		for (i = 0; i < 9; i++)
			clone(writer, malloc(1024) + 1024, CLONE_THREAD | CLONE_SIGHAND | CLONE_VM, NULL);
		writer(NULL);
	}
	snprintf(buffer, sizeof(buffer) - 1, "/proc/%u/stat", pid);
	if (fork() == 0) { /* down_read(&mm->mmap_sem) requester. */
		const int fd = open(buffer, O_RDONLY);
		while (pread(fd, buffer, sizeof(buffer), 0) > 0);
		_exit(0);
	} else { /* A dummy process for invoking the OOM killer. */
		char *buf = NULL;
		unsigned long size = 0;
		const int fd = open("/dev/zero", O_RDONLY);
		for (size = 1048576; size < 512UL * (1 << 30); size <<= 1) {
			char *cp = realloc(buf, size);
			if (!cp) {
				size >>= 1;
				break;
			}
			buf = cp;
		}
		read(fd, buf, size); /* Will cause OOM due to overcommit */
		return 0;
	}
}
---------- Reproducer end ----------

Complete log is at http://I-love.SAKURA.ne.jp/tmp/serial-20151012.txt.xz .

Uptime between 101 and 300 is a silent hang up (i.e. no OOM killer messages,
no SIGKILL pending tasks, no TIF_MEMDIE tasks) which I solved using SysRq-f
at uptime = 289. I don't know the reason of this silent hang up, but the
memory unzapping kernel thread will not help because there is no OOM victim.

----------
[  101.438951] MemAlloc-Info: 10 stalling task, 0 dying task, 0 victim task.
(...snipped...)
[  111.817922] MemAlloc-Info: 12 stalling task, 0 dying task, 0 victim task.
(...snipped...)
[  122.281828] MemAlloc-Info: 13 stalling task, 0 dying task, 0 victim task.
(...snipped...)
[  132.793724] MemAlloc-Info: 14 stalling task, 0 dying task, 0 victim task.
(...snipped...)
[  143.336154] MemAlloc-Info: 16 stalling task, 0 dying task, 0 victim task.
(...snipped...)
[  289.343187] sysrq: SysRq : Manual OOM execution
(...snipped...)
[  292.065650] MemAlloc-Info: 16 stalling task, 0 dying task, 0 victim task.
(...snipped...)
[  302.590736] kworker/3:2 invoked oom-killer: gfp_mask=0x24000c0, order=-1, oom_score_adj=0
(...snipped...)
[  302.690047] MemAlloc-Info: 4 stalling task, 0 dying task, 0 victim task.
----------

Uptime between 379 and 605 is a mmap_sem livelock after the OOM killer was
invoked.

----------
[  380.039897] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[  380.042500] [  467]     0   467    14047     1815      28       3        0             0 systemd-journal
[  380.045055] [  482]     0   482    10413      259      23       3        0         -1000 systemd-udevd
[  380.047637] [  504]     0   504    12795      119      25       3        0         -1000 auditd
[  380.050127] [ 1244]     0  1244    82428     4257      81       3        0             0 firewalld
[  380.052536] [ 1247]    70  1247     6988       61      21       3        0             0 avahi-daemon
[  380.055028] [ 1250]     0  1250    54104     1372      42       4        0             0 rsyslogd
[  380.057505] [ 1251]     0  1251   137547     2620      91       3        0             0 tuned
[  380.059996] [ 1255]     0  1255     4823       77      15       3        0             0 irqbalance
[  380.062552] [ 1256]     0  1256     1095       37       8       3        0             0 rngd
[  380.065020] [ 1259]     0  1259    53626      441      60       3        0             0 abrtd
[  380.067383] [ 1260]     0  1260    53001      341      58       5        0             0 abrt-watch-log
[  380.069965] [ 1265]     0  1265     8673       83      21       3        0             0 systemd-logind
[  380.072554] [ 1266]    81  1266     6663      117      18       3        0          -900 dbus-daemon
[  380.075122] [ 1272]     0  1272    31577      154      21       3        0             0 crond
[  380.077544] [ 1314]    70  1314     6988       57      19       3        0             0 avahi-daemon
[  380.080013] [ 1427]     0  1427    46741      225      44       3        0             0 vmtoolsd
[  380.082478] [ 1969]     0  1969    25942     3100      48       3        0             0 dhclient
[  380.084969] [ 1990]   999  1990   128626     1929      50       4        0             0 polkitd
[  380.087516] [ 2073]     0  2073    20629      214      45       3        0         -1000 sshd
[  380.090065] [ 2201]     0  2201     7320       68      21       3        0             0 xinetd
[  380.092465] [ 3215]     0  3215    22773      257      44       3        0             0 master
[  380.094879] [ 3217]    89  3217    22816      249      45       3        0             0 qmgr
[  380.097304] [ 3249]     0  3249    75245      315      97       3        0             0 nmbd
[  380.099666] [ 3259]     0  3259    92963      486     131       5        0             0 smbd
[  380.101956] [ 3282]     0  3282    27503       30      12       3        0             0 agetty
[  380.104277] [ 3283]     0  3283    21788      154      49       3        0             0 login
[  380.106574] [ 3286]     0  3286    92963      486     126       5        0             0 smbd
[  380.108835] [ 3296]  1000  3296    28864      117      13       3        0             0 bash
[  380.111073] [ 3374]    89  3374    22799      249      46       3        0             0 pickup
[  380.113298] [ 3378]    89  3378    22836      252      45       3        0             0 cleanup
[  380.115555] [ 3385]    89  3385    22800      248      44       3        0             0 trivial-rewrite
[  380.117811] [ 3392]     0  3392    22825      265      48       3        0             0 local
[  380.119995] [ 3393]     0  3393    30828       59      17       3        0             0 anacron
[  380.122183] [ 3417]  1000  3417   541715   397587     787       6        0             0 a.out
[  380.124315] [ 3418]  1000  3418     1081       24       8       3        0             0 a.out
[  380.126410] [ 3419]  1000  3419     1042       21       7       3        0             0 a.out
[  380.128535] Out of memory: Kill process 3417 (a.out) score 890 or sacrifice child
[  380.130392] Killed process 3418 (a.out) total-vm:4324kB, anon-rss:96kB, file-rss:0kB
[  392.704028] MemAlloc-Info: 7 stalling task, 10 dying task, 1 victim task.
(...snipped...)
[  601.129977] a.out           R  running task        0  3417   3296 0x00000080
[  601.131899]  ffff8800774dba10 ffffffff8112b174 0000000000000100 0000000000000000
[  601.134026]  0000000000000000 0000000000000000 00000000a23cb49d 0000000000000000
[  601.136076]  ffff880077603200 00000000024280ca 0000000000000000 ffff880077603200
[  601.138090] Call Trace:
[  601.139145]  [<ffffffff8112b174>] ? try_to_free_pages+0x94/0xc0
[  601.140831]  [<ffffffff8111a8c4>] ? out_of_memory+0x2f4/0x460
[  601.142489]  [<ffffffff8111fa63>] ? __alloc_pages_nodemask+0x613/0xc30
[  601.144328]  [<ffffffff81161c40>] ? alloc_pages_vma+0xb0/0x200
[  601.145994]  [<ffffffff81143056>] ? handle_mm_fault+0xfa6/0x1370
[  601.147677]  [<ffffffff8162f557>] ? native_iret+0x7/0x7
[  601.149258]  [<ffffffff81058217>] ? __do_page_fault+0x177/0x400
[  601.150966]  [<ffffffff810584d0>] ? do_page_fault+0x30/0x80
[  601.152625]  [<ffffffff81630518>] ? page_fault+0x28/0x30
[  601.154159]  [<ffffffff813230c0>] ? __clear_user+0x20/0x50
[  601.155723]  [<ffffffff81327a68>] ? iov_iter_zero+0x68/0x250
[  601.157329]  [<ffffffff813fc6c8>] ? read_iter_zero+0x38/0xa0
[  601.158923]  [<ffffffff81187f04>] ? __vfs_read+0xc4/0xf0
[  601.160453]  [<ffffffff8118868a>] ? vfs_read+0x7a/0x120
[  601.161961]  [<ffffffff811893a0>] ? SyS_read+0x50/0xc0
[  601.163513]  [<ffffffff8162e9ee>] ? entry_SYSCALL_64_fastpath+0x12/0x71
[  601.165254] a.out           D ffff8800777b7e08     0  3418   3417 0x00100084
[  601.167118]  ffff8800777b7e08 ffff880077606400 ffff8800777b8000 ffff880036032e00
[  601.169137]  ffff880036032de8 ffffffff00000000 ffffffff00000001 ffff8800777b7e20
[  601.171159]  ffffffff8162a570 ffff880077606400 ffff8800777b7ea8 ffffffff8162d8eb
[  601.173183] Call Trace:
[  601.174193]  [<ffffffff8162a570>] schedule+0x30/0x80
[  601.175661]  [<ffffffff8162d8eb>] rwsem_down_write_failed+0x1fb/0x350
[  601.177388]  [<ffffffff81322f64>] ? call_rwsem_down_read_failed+0x14/0x30
[  601.179194]  [<ffffffff81322f93>] call_rwsem_down_write_failed+0x13/0x20
[  601.180971]  [<ffffffff8162d05f>] ? down_write+0x1f/0x30
[  601.182509]  [<ffffffff81147abe>] vm_munmap+0x2e/0x60
[  601.183992]  [<ffffffff811489fd>] SyS_munmap+0x1d/0x30
[  601.185485]  [<ffffffff8162e9ee>] entry_SYSCALL_64_fastpath+0x12/0x71
[  601.187224] a.out           D ffff88007c60fdf0     0  3420   3417 0x00000084
[  601.189130]  ffff88007c60fdf0 ffff880078e15780 ffff88007c610000 ffff880036032de8
[  601.191158]  ffff880036032e00 ffff88007c60ff58 ffff880078e15780 ffff88007c60fe08
[  601.193180]  ffffffff8162a570 ffff880078e15780 ffff88007c60fe68 ffffffff8162d698
[  601.195217] Call Trace:
[  601.196226]  [<ffffffff8162a570>] schedule+0x30/0x80
[  601.197683]  [<ffffffff8162d698>] rwsem_down_read_failed+0xf8/0x150
[  601.199407]  [<ffffffff81322f64>] call_rwsem_down_read_failed+0x14/0x30
[  601.201192]  [<ffffffff8162d032>] ? down_read+0x12/0x20
[  601.202711]  [<ffffffff810583f7>] __do_page_fault+0x357/0x400
[  601.204328]  [<ffffffff810584d0>] do_page_fault+0x30/0x80
[  601.205874]  [<ffffffff81630518>] page_fault+0x28/0x30
[  601.207376] a.out           D ffff88007c24fdf0     0  3421   3417 0x00000084
[  601.209286]  ffff88007c24fdf0 ffff880078e13200 ffff88007c250000 ffff880036032de8
[  601.211316]  ffff880036032e00 ffff88007c24ff58 ffff880078e13200 ffff88007c24fe08
[  601.213335]  ffffffff8162a570 ffff880078e13200 ffff88007c24fe68 ffffffff8162d698
[  601.215356] Call Trace:
[  601.216377]  [<ffffffff8162a570>] schedule+0x30/0x80
[  601.217831]  [<ffffffff8162d698>] rwsem_down_read_failed+0xf8/0x150
[  601.219529]  [<ffffffff81322f64>] call_rwsem_down_read_failed+0x14/0x30
[  601.221296]  [<ffffffff8162d032>] ? down_read+0x12/0x20
[  601.222802]  [<ffffffff810583f7>] __do_page_fault+0x357/0x400
[  601.224403]  [<ffffffff810584d0>] do_page_fault+0x30/0x80
[  601.225958]  [<ffffffff81630518>] page_fault+0x28/0x30
[  601.227453] a.out           D ffff88007823bdf0     0  3422   3417 0x00000084
[  601.229348]  ffff88007823bdf0 ffff880078e10000 ffff88007823c000 ffff880036032de8
[  601.231395]  ffff880036032e00 ffff88007823bf58 ffff880078e10000 ffff88007823be08
[  601.233427]  ffffffff8162a570 ffff880078e10000 ffff88007823be68 ffffffff8162d698
[  601.235472] Call Trace:
[  601.236504]  [<ffffffff8162a570>] schedule+0x30/0x80
[  601.237989]  [<ffffffff8162d698>] rwsem_down_read_failed+0xf8/0x150
[  601.239720]  [<ffffffff81322f64>] call_rwsem_down_read_failed+0x14/0x30
[  601.241583]  [<ffffffff8162d032>] ? down_read+0x12/0x20
[  601.243144]  [<ffffffff810583f7>] __do_page_fault+0x357/0x400
[  601.244777]  [<ffffffff810584d0>] do_page_fault+0x30/0x80
[  601.246307]  [<ffffffff81630518>] page_fault+0x28/0x30
[  601.247823] a.out           D ffff88007c483df0     0  3423   3417 0x00000084
[  601.249719]  ffff88007c483df0 ffff880078e13e80 ffff88007c484000 ffff880036032de8
[  601.251765]  ffff880036032e00 ffff88007c483f58 ffff880078e13e80 ffff88007c483e08
[  601.253808]  ffffffff8162a570 ffff880078e13e80 ffff88007c483e68 ffffffff8162d698
[  601.255831] Call Trace:
[  601.256850]  [<ffffffff8162a570>] schedule+0x30/0x80
[  601.258286]  [<ffffffff8162d698>] rwsem_down_read_failed+0xf8/0x150
[  601.260005]  [<ffffffff81322f64>] call_rwsem_down_read_failed+0x14/0x30
[  601.261803]  [<ffffffff8162d032>] ? down_read+0x12/0x20
[  601.263329]  [<ffffffff810583f7>] __do_page_fault+0x357/0x400
[  601.264936]  [<ffffffff810584d0>] do_page_fault+0x30/0x80
[  601.266504]  [<ffffffff81630518>] page_fault+0x28/0x30
[  601.268019] a.out           D ffff880035893e08     0  3424   3417 0x00000084
[  601.269940]  ffff880035893e08 ffff880078e17080 ffff880035894000 ffff880036032e00
[  601.271945]  ffff880036032de8 ffffffff00000000 ffffffff00000001 ffff880035893e20
[  601.273954]  ffffffff8162a570 ffff880078e17080 ffff880035893ea8 ffffffff8162d8eb
[  601.276000] Call Trace:
[  601.277007]  [<ffffffff8162a570>] schedule+0x30/0x80
[  601.278497]  [<ffffffff8162d8eb>] rwsem_down_write_failed+0x1fb/0x350
[  601.280240]  [<ffffffff81322f64>] ? call_rwsem_down_read_failed+0x14/0x30
[  601.282058]  [<ffffffff81322f93>] call_rwsem_down_write_failed+0x13/0x20
[  601.283872]  [<ffffffff8162d05f>] ? down_write+0x1f/0x30
[  601.285403]  [<ffffffff81147abe>] vm_munmap+0x2e/0x60
[  601.286924]  [<ffffffff811489fd>] SyS_munmap+0x1d/0x30
[  601.288435]  [<ffffffff8162e9ee>] entry_SYSCALL_64_fastpath+0x12/0x71
[  601.290184] a.out           D ffff8800353b7df0     0  3425   3417 0x00000084
[  601.292108]  ffff8800353b7df0 ffff880078e10c80 ffff8800353b8000 ffff880036032de8
[  601.294165]  ffff880036032e00 ffff8800353b7f58 ffff880078e10c80 ffff8800353b7e08
[  601.296206]  ffffffff8162a570 ffff880078e10c80 ffff8800353b7e68 ffffffff8162d698
[  601.298267] Call Trace:
[  601.299300]  [<ffffffff8162a570>] schedule+0x30/0x80
[  601.300755]  [<ffffffff8162d698>] rwsem_down_read_failed+0xf8/0x150
[  601.302437]  [<ffffffff81322f64>] call_rwsem_down_read_failed+0x14/0x30
[  601.304221]  [<ffffffff8162d032>] ? down_read+0x12/0x20
[  601.305764]  [<ffffffff810583f7>] __do_page_fault+0x357/0x400
[  601.307389]  [<ffffffff810584d0>] do_page_fault+0x30/0x80
[  601.308968]  [<ffffffff81630518>] page_fault+0x28/0x30
[  601.310488] a.out           D ffff88007cf87df0     0  3426   3417 0x00000084
[  601.312380]  ffff88007cf87df0 ffff880078e16400 ffff88007cf88000 ffff880036032de8
[  601.314414]  ffff880036032e00 ffff88007cf87f58 ffff880078e16400 ffff88007cf87e08
[  601.316443]  ffffffff8162a570 ffff880078e16400 ffff88007cf87e68 ffffffff8162d698
[  601.318490] Call Trace:
[  601.319536]  [<ffffffff8162a570>] schedule+0x30/0x80
[  601.321036]  [<ffffffff8162d698>] rwsem_down_read_failed+0xf8/0x150
[  601.322763]  [<ffffffff81322f64>] call_rwsem_down_read_failed+0x14/0x30
[  601.324504]  [<ffffffff8162d032>] ? down_read+0x12/0x20
[  601.326071]  [<ffffffff810583f7>] __do_page_fault+0x357/0x400
[  601.327715]  [<ffffffff810584d0>] do_page_fault+0x30/0x80
[  601.329287]  [<ffffffff81630518>] page_fault+0x28/0x30
[  601.330761] a.out           D ffff8800792dfdf0     0  3427   3417 0x00000084
[  601.332705]  ffff8800792dfdf0 ffff880078e12580 ffff8800792e0000 ffff880036032de8
[  601.334699]  ffff880036032e00 ffff8800792dff58 ffff880078e12580 ffff8800792dfe08
[  601.336750]  ffffffff8162a570 ffff880078e12580 ffff8800792dfe68 ffffffff8162d698
[  601.338794] Call Trace:
[  601.339781]  [<ffffffff8162a570>] schedule+0x30/0x80
[  601.341280]  [<ffffffff8162d698>] rwsem_down_read_failed+0xf8/0x150
[  601.343009]  [<ffffffff81322f64>] call_rwsem_down_read_failed+0x14/0x30
[  601.344813]  [<ffffffff8162d032>] ? down_read+0x12/0x20
[  601.346361]  [<ffffffff810583f7>] __do_page_fault+0x357/0x400
[  601.347990]  [<ffffffff810584d0>] do_page_fault+0x30/0x80
[  601.349521]  [<ffffffff81630518>] page_fault+0x28/0x30
[  601.351044] a.out           D ffff88007743faa8     0  3428   3417 0x00000084
[  601.352942]  ffff88007743faa8 ffff88007bda6400 ffff880077440000 ffff88007743fae0
[  601.354990]  ffff88007fccdfc0 00000001000484e5 0000000000000000 ffff88007743fac0
[  601.357024]  ffffffff8162a570 ffff88007fccdfc0 ffff88007743fb40 ffffffff8162dbed
[  601.359075] Call Trace:
[  601.360096]  [<ffffffff8162a570>] schedule+0x30/0x80
[  601.361540]  [<ffffffff8162dbed>] schedule_timeout+0x11d/0x1c0
[  601.363190]  [<ffffffff810c7e00>] ? cascade+0x90/0x90
[  601.364697]  [<ffffffff8162dce9>] schedule_timeout_uninterruptible+0x19/0x20
[  601.366574]  [<ffffffff8111fc9d>] __alloc_pages_nodemask+0x84d/0xc30
[  601.368332]  [<ffffffff811609a7>] alloc_pages_current+0x87/0x110
[  601.370002]  [<ffffffff811166cf>] __page_cache_alloc+0xaf/0xc0
[  601.371606]  [<ffffffff81119225>] filemap_fault+0x1e5/0x420
[  601.373203]  [<ffffffff81244f39>] xfs_filemap_fault+0x39/0x60
[  601.374798]  [<ffffffff8113d5e7>] __do_fault+0x47/0xd0
[  601.376315]  [<ffffffff81142ec5>] handle_mm_fault+0xe15/0x1370
[  601.377938]  [<ffffffff81322f64>] ? call_rwsem_down_read_failed+0x14/0x30
[  601.379707]  [<ffffffff81058217>] __do_page_fault+0x177/0x400
[  601.381320]  [<ffffffff810584d0>] do_page_fault+0x30/0x80
[  601.382831]  [<ffffffff81630518>] page_fault+0x28/0x30
[  601.384337] a.out           R  running task        0  3419   3417 0x00000080
[  601.386257]  00000000f80745e8 ffff880034ab4400 ffff8800776d3f18 ffff8800776d3f18
[  601.388287]  0000000000000080 0000000000000000 ffff8800776d3ec8 ffffffff81187e72
[  601.390341]  ffff880034ab4400 ffff880034ab4410 0000000000020000 0000000000000000
[  601.392366] Call Trace:
[  601.393388]  [<ffffffff81187e72>] ? __vfs_read+0x32/0xf0
[  601.394952]  [<ffffffff81290aa9>] ? security_file_permission+0xa9/0xc0
[  601.396745]  [<ffffffff8118858d>] ? rw_verify_area+0x4d/0xd0
[  601.398359]  [<ffffffff8118868a>] ? vfs_read+0x7a/0x120
[  601.399897]  [<ffffffff81189560>] ? SyS_pread64+0x90/0xb0
[  601.401429]  [<ffffffff8162e9ee>] ? entry_SYSCALL_64_fastpath+0x12/0x71
----------

I think that I noticed three problems from this reproducer.

(1) While the likeliness of hitting mmap_sem livelock would depend on how
    frequently down_read(&mm->mmap_sem) tasks and down_write(&mm->mmap_sem)
    tasks contend on the OOM victim's mm, we can hit mmap_sem livelock with
    even only one down_read(&mm->mmap_sem) task. On systems where processes
    are monitored using /proc/pid/ interface, we can by chance hit this
    mmap_sem livelock.

(2) The OOM killer tries to kill child process of the memory hog. But the
    child process is not always consuming a lot of memory. The memory
    unzapping kernel thread might not be able to reclaim enough memory
    unless we choose subsequent OOM victims when the first OOM victim task
    got mmap_sem livelock.

(3) I don't know the reason but I can observe that (when there are many
    tasks which got SIGKILL by the OOM killer) many of dying tasks participate
    in a memory allocation competition via page_fault() which cannot make
    forward progress because dying tasks without TIF_MEMDIE are not allowed
    to access the memory reserves.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ