linux-kernel - Re: Still OOM problems with 4.9er/4.10er kernels

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170227090236.GA2789@bbox>
Date:   Mon, 27 Feb 2017 18:02:36 +0900
From:   Minchan Kim <minchan@...nel.org>
To:     Gerhard Wiesinger <lists@...singer.com>
CC:     Michal Hocko <mhocko@...nel.org>, <linux-kernel@...r.kernel.org>,
        <linux-mm@...ck.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: Still OOM problems with 4.9er/4.10er kernels

On Sun, Feb 26, 2017 at 09:40:42AM +0100, Gerhard Wiesinger wrote:
> On 04.01.2017 10:11, Michal Hocko wrote:
> >>The VM stops working (e.g. not pingable) after around 8h (will be restarted
> >>automatically), happened serveral times.
> >>
> >>Had also further OOMs which I sent to Mincham.
> >Could you post them to the mailing list as well, please?
> 
> Still OOMs on dnf update procedure with kernel 4.10: 4.10.0-1.fc26.x86_64 as
> well on 4.9.9-200.fc25.x86_64
> 
> On 4.10er kernels:
> 
> Free swap  = 1137532kB
> 
> cat /etc/sysctl.d/* | grep ^vm
> vm.dirty_background_ratio = 3
> vm.dirty_ratio = 15
> vm.overcommit_memory = 2
> vm.overcommit_ratio = 80
> vm.swappiness=10
> 
> kernel: python invoked oom-killer:
> gfp_mask=0x14201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=0, order=0,
> oom_score_adj=0
> kernel: python cpuset=/ mems_allowed=0
> kernel: CPU: 1 PID: 813 Comm: python Not tainted 4.10.0-1.fc26.x86_64 #1
> kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3
> 04/01/2014
> kernel: Call Trace:
> kernel:  dump_stack+0x63/0x84
> kernel:  dump_header+0x7b/0x1f6
> kernel:  ? do_try_to_free_pages+0x2c5/0x340
> kernel:  oom_kill_process+0x202/0x3d0
> kernel:  out_of_memory+0x2b7/0x4e0
> kernel:  __alloc_pages_slowpath+0x915/0xb80
> kernel:  __alloc_pages_nodemask+0x218/0x2d0
> kernel:  alloc_pages_current+0x93/0x150
> kernel:  __page_cache_alloc+0xcf/0x100
> kernel:  filemap_fault+0x39d/0x800
> kernel:  ? page_add_file_rmap+0xe5/0x200
> kernel:  ? filemap_map_pages+0x2e1/0x4e0
> kernel:  ext4_filemap_fault+0x36/0x50
> kernel:  __do_fault+0x21/0x110
> kernel:  handle_mm_fault+0xdd1/0x1410
> kernel:  ? swake_up+0x42/0x50
> kernel:  __do_page_fault+0x23f/0x4c0
> kernel:  trace_do_page_fault+0x41/0x120
> kernel:  do_async_page_fault+0x51/0xa0
> kernel:  async_page_fault+0x28/0x30
> kernel: RIP: 0033:0x7f0681ad6350
> kernel: RSP: 002b:00007ffcbdd238d8 EFLAGS: 00010246
> kernel: RAX: 00007f0681b0f960 RBX: 0000000000000000 RCX: 7fffffffffffffff
> kernel: RDX: 0000000000000000 RSI: 3ff0000000000000 RDI: 3ff0000000000000
> kernel: RBP: 00007f067461ab40 R08: 0000000000000000 R09: 3ff0000000000000
> kernel: R10: 0000556f1c6d8a80 R11: 0000000000000001 R12: 00007f0676d1a8d0
> kernel: R13: 0000000000000000 R14: 00007f06746168bc R15: 00007f0674385910
> kernel: Mem-Info:
> kernel: active_anon:37423 inactive_anon:37512 isolated_anon:0
>          active_file:462 inactive_file:603 isolated_file:0
>          unevictable:0 dirty:0 writeback:0 unstable:0
>          slab_reclaimable:3538 slab_unreclaimable:4818
>          mapped:859 shmem:9 pagetables:3370 bounce:0
>          free:1650 free_pcp:103 free_cma:0
> kernel: Node 0 active_anon:149380kB inactive_anon:149704kB
> active_file:1848kB inactive_file:3660kB unevictable:0kB isolated(anon):128kB
> isolated(file):0kB mapped:4580kB dirty:0kB writeback:380kB shmem:0kB
> shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 36kB writeback_tmp:0kB
> unstable:0kB pages_scanned:352 all_unreclaimable? no
> kernel: Node 0 DMA free:1484kB min:104kB low:128kB high:152kB
> active_anon:5660kB inactive_anon:6156kB active_file:56kB inactive_file:64kB
> unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB
> slab_reclaimable:444kB slab_unreclaimable:1208kB kernel_stack:32kB
> pagetables:592kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> kernel: lowmem_reserve[]: 0 327 327 327 327
> kernel: Node 0 DMA32 free:5012kB min:2264kB low:2828kB high:3392kB
> active_anon:143580kB inactive_anon:143300kB active_file:2576kB
> inactive_file:2560kB unevictable:0kB writepending:0kB present:376688kB
> managed:353968kB mlocked:0kB slab_reclaimable:13708kB
> slab_unreclaimable:18064kB kernel_stack:2352kB pagetables:12888kB bounce:0kB
> free_pcp:412kB local_pcp:88kB free_cma:0kB
> kernel: lowmem_reserve[]: 0 0 0 0 0
> kernel: Node 0 DMA: 70*4kB (UMEH) 20*8kB (UMEH) 13*16kB (MH) 5*32kB (H)
> 4*64kB (H) 2*128kB (H) 1*256kB (H) 0*512kB 0*1024kB 0*2048kB 0*4096kB =
> 1576kB
> kernel: Node 0 DMA32: 1134*4kB (UMEH) 25*8kB (UMEH) 13*16kB (MH) 7*32kB (H)
> 3*64kB (H) 0*128kB 1*256kB (H) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 5616kB
 
Althogh DMA32 zone has enough free memory, free memory includes H pageblock
which is reserved memory for high-order atomic allocation. That might be
a reason you cannot succeed watermark check for the allocation.

I tried to solve the issue in 4.9 time to use up the reserved memory before
the OOM and merged into 4.10 but I think there is a hole so could you apply
this patch on top of your 4.10? (To be clear, cannot apply it to 4.9)

>From 9779a1c5d32e2edb64da5cdfcd6f9737b94a247a Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@...nel.org>
Date: Mon, 27 Feb 2017 17:39:06 +0900
Subject: [PATCH] mm: use up highatomic before OOM kill

Not-Yet-Signed-off-by: Minchan Kim <minchan@...nel.org>
---
 mm/page_alloc.c | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 614cd0397ce3..e073cca4969e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3549,16 +3549,6 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order,
 		*no_progress_loops = 0;
 	else
 		(*no_progress_loops)++;
-
-	/*
-	 * Make sure we converge to OOM if we cannot make any progress
-	 * several times in the row.
-	 */
-	if (*no_progress_loops > MAX_RECLAIM_RETRIES) {
-		/* Before OOM, exhaust highatomic_reserve */
-		return unreserve_highatomic_pageblock(ac, true);
-	}
-
 	/*
 	 * Keep reclaiming pages while there is a chance this will lead
 	 * somewhere.  If none of the target zones can satisfy our allocation
@@ -3821,6 +3811,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	if (read_mems_allowed_retry(cpuset_mems_cookie))
 		goto retry_cpuset;
 
+	/* Before OOM, exhaust highatomic_reserve */
+	if (unreserve_highatomic_pageblock(ac, true))
+		goto retry;
+
 	/* Reclaim has failed us, start killing things */
 	page = __alloc_pages_may_oom(gfp_mask, order, ac, &did_some_progress);
 	if (page)
-- 
2.7.4