lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 3 Jan 2016 00:47:30 +0900
From:	Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To:	mhocko@...nel.org
Cc:	akpm@...ux-foundation.org, torvalds@...ux-foundation.org,
	hannes@...xchg.org, mgorman@...e.de, rientjes@...gle.com,
	hillf.zj@...baba-inc.com, kamezawa.hiroyu@...fujitsu.com,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/3] OOM detection rework v4

Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Mon 28-12-15 21:08:56, Tetsuo Handa wrote:
> > > Tetsuo Handa wrote:
> > > > I got OOM killers while running heavy disk I/O (extracting kernel source,
> > > > running lxr's genxref command). (Environ: 4 CPUs / 2048MB RAM / no swap / XFS)
> > > > Do you think these OOM killers reasonable? Too weak against fragmentation?
> > > 
> > > Well, current patch invokes OOM killers when more than 75% of memory is used
> > > for file cache (active_file: + inactive_file:). I think this is a surprising
> > > thing for administrators and we want to retry more harder (but not forever,
> > > please).
> > 
> > Here again, it would be good to see what is the comparision between
> > the original and the new behavior. 75% of a page cache is certainly
> > unexpected but those pages might be pinned for other reasons and so
> > unreclaimable and basically IO bound. This is hard to optimize for
> > without causing any undesirable side effects for other loads. I will
> > have a look at the oom reports later but having a comparision would be
> > a great start.
> 
> Prior to "mm, oom: rework oom detection" patch (the original), this stressor
> never invoked the OOM killer. After this patch (the new), this stressor easily
> invokes the OOM killer. Both the original and the new case, active_file: +
> inactive_file: occupies nearly 75%. I think we lost invisible retry logic for
> order > 0 allocation requests.
> 

I retested with below debug printk() patch.

----------
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9d70a80..e433504 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3014,7 +3014,7 @@ static inline bool is_thp_gfp_mask(gfp_t gfp_mask)
 static inline bool
 should_reclaim_retry(gfp_t gfp_mask, unsigned order,
 		     struct alloc_context *ac, int alloc_flags,
-		     bool did_some_progress,
+		     unsigned long did_some_progress,
 		     int no_progress_loops)
 {
 	struct zone *zone;
@@ -3024,8 +3024,10 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order,
 	 * Make sure we converge to OOM if we cannot make any progress
 	 * several times in the row.
 	 */
-	if (no_progress_loops > MAX_RECLAIM_RETRIES)
+	if (no_progress_loops > MAX_RECLAIM_RETRIES) {
+		printk(KERN_INFO "Reached MAX_RECLAIM_RETRIES.\n");
 		return false;
+	}
 
 	/* Do not retry high order allocations unless they are __GFP_REPEAT */
 	if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT))
@@ -3086,6 +3088,8 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order,
 
 			return true;
 		}
+		printk(KERN_INFO "zone=%s reclaimable=%lu available=%lu no_progress_loops=%u did_some_progress=%lu\n",
+		       zone->name, reclaimable, available, no_progress_loops, did_some_progress);
 	}
 
 	return false;
@@ -3273,7 +3277,7 @@ retry:
 		no_progress_loops++;
 
 	if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
-				 did_some_progress > 0, no_progress_loops))
+				 did_some_progress, no_progress_loops))
 		goto retry;
 
 	/* Reclaim has failed us, start killing things */
----------

The output showed that __zone_watermark_ok() returning false on both DMA32 and DMA
zones is the trigger of the OOM killer invocation. Direct reclaim is constantly
reclaiming some pages, but I guess freelist for 2 <= order < MAX_ORDER are empty.
That trigger was introduced by commit 97a16fc82a7c5b0c ("mm, page_alloc: only
enforce watermarks for order-0 allocations"), and "mm, oom: rework oom detection"
patch hits the trigger.

----------
[  154.547143] zone=DMA32 reclaimable=323478 available=325894 no_progress_loops=0 did_some_progress=58
[  154.551119] zone=DMA32 reclaimable=323153 available=325770 no_progress_loops=0 did_some_progress=58
[  154.571983] zone=DMA32 reclaimable=319582 available=322161 no_progress_loops=0 did_some_progress=56
[  154.576121] zone=DMA32 reclaimable=319647 available=322016 no_progress_loops=0 did_some_progress=56
[  154.583523] zone=DMA32 reclaimable=319467 available=321801 no_progress_loops=0 did_some_progress=55
[  154.593948] zone=DMA32 reclaimable=317400 available=320988 no_progress_loops=0 did_some_progress=56
[  154.730880] zone=DMA32 reclaimable=312385 available=313952 no_progress_loops=0 did_some_progress=48
[  154.733226] zone=DMA32 reclaimable=312337 available=313919 no_progress_loops=0 did_some_progress=48
[  154.737270] zone=DMA32 reclaimable=312417 available=313871 no_progress_loops=0 did_some_progress=48
[  154.739569] zone=DMA32 reclaimable=312369 available=313844 no_progress_loops=0 did_some_progress=48
[  154.743195] zone=DMA32 reclaimable=312385 available=313790 no_progress_loops=0 did_some_progress=48
[  154.745534] zone=DMA32 reclaimable=312365 available=313813 no_progress_loops=0 did_some_progress=48
[  154.748431] zone=DMA32 reclaimable=312272 available=313728 no_progress_loops=0 did_some_progress=48
[  154.750973] zone=DMA32 reclaimable=312273 available=313760 no_progress_loops=0 did_some_progress=48
[  154.753503] zone=DMA32 reclaimable=312289 available=313958 no_progress_loops=0 did_some_progress=48
[  154.753584] zone=DMA32 reclaimable=312241 available=313958 no_progress_loops=0 did_some_progress=48
[  154.753660] zone=DMA32 reclaimable=312193 available=313958 no_progress_loops=0 did_some_progress=48
[  154.781574] zone=DMA32 reclaimable=312147 available=314095 no_progress_loops=0 did_some_progress=48
[  154.784281] zone=DMA32 reclaimable=311539 available=314015 no_progress_loops=0 did_some_progress=49
[  154.786639] zone=DMA32 reclaimable=311498 available=314040 no_progress_loops=0 did_some_progress=49
[  154.788761] zone=DMA32 reclaimable=311432 available=314040 no_progress_loops=0 did_some_progress=49
[  154.791047] zone=DMA32 reclaimable=311366 available=314040 no_progress_loops=0 did_some_progress=49
[  154.793388] zone=DMA32 reclaimable=311300 available=314040 no_progress_loops=0 did_some_progress=49
[  154.795802] zone=DMA32 reclaimable=311153 available=314006 no_progress_loops=0 did_some_progress=49
[  154.804685] zone=DMA32 reclaimable=309950 available=313140 no_progress_loops=0 did_some_progress=49
[  154.807039] zone=DMA32 reclaimable=309867 available=313138 no_progress_loops=0 did_some_progress=49
[  154.809440] zone=DMA32 reclaimable=309761 available=313080 no_progress_loops=0 did_some_progress=49
[  154.811583] zone=DMA32 reclaimable=309735 available=313120 no_progress_loops=0 did_some_progress=49
[  154.814090] zone=DMA32 reclaimable=309561 available=313068 no_progress_loops=0 did_some_progress=49
[  154.817381] zone=DMA32 reclaimable=309463 available=313030 no_progress_loops=0 did_some_progress=49
[  154.824387] zone=DMA32 reclaimable=309414 available=313030 no_progress_loops=0 did_some_progress=49
[  154.829582] zone=DMA32 reclaimable=308907 available=312734 no_progress_loops=0 did_some_progress=50
[  154.831562] zone=DMA reclaimable=2 available=1728 no_progress_loops=0 did_some_progress=50
[  154.838499] fork invoked oom-killer: order=2, oom_score_adj=0, gfp_mask=0x27000c0(GFP_KERNEL|GFP_NOTRACK|0x100000)
[  154.841167] fork cpuset=/ mems_allowed=0
[  154.842348] CPU: 1 PID: 9599 Comm: fork Tainted: G        W       4.4.0-rc7-next-20151231+ #273
[  154.844308] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[  154.846654]  0000000000000000 0000000045061c6b ffff88007a5dbb00 ffffffff81398b83
[  154.848559]  0000000000000000 ffff88007a5dbba0 ffffffff811bc81c 0000000000000206
[  154.850488]  ffffffff818104b0 ffff88007a5dbb40 ffffffff810bdd79 0000000000000206
[  154.852386] Call Trace:
[  154.853350]  [<ffffffff81398b83>] dump_stack+0x4b/0x68
[  154.854731]  [<ffffffff811bc81c>] dump_header+0x5b/0x3b0
[  154.856309]  [<ffffffff810bdd79>] ? trace_hardirqs_on_caller+0xf9/0x1c0
[  154.858046]  [<ffffffff810bde4d>] ? trace_hardirqs_on+0xd/0x10
[  154.859593]  [<ffffffff81143d36>] oom_kill_process+0x366/0x540
[  154.861142]  [<ffffffff8114414f>] out_of_memory+0x1ef/0x5a0
[  154.862655]  [<ffffffff8114420d>] ? out_of_memory+0x2ad/0x5a0
[  154.864194]  [<ffffffff81149c72>] __alloc_pages_nodemask+0xda2/0xde0
[  154.865852]  [<ffffffff810bdd00>] ? trace_hardirqs_on_caller+0x80/0x1c0
[  154.867844]  [<ffffffff81149e6c>] alloc_kmem_pages_node+0x4c/0xc0
[  154.868726] zone=DMA32 reclaimable=309003 available=312677 no_progress_loops=0 did_some_progress=48
[  154.868727] zone=DMA reclaimable=2 available=1728 no_progress_loops=0 did_some_progress=48
[  154.875357]  [<ffffffff8106d441>] copy_process.part.31+0x131/0x1b40
[  154.877845]  [<ffffffff8111d8da>] ? __audit_syscall_entry+0xaa/0xf0
[  154.880397]  [<ffffffff8106f01b>] _do_fork+0xdb/0x5d0
[  154.882259]  [<ffffffff8111d8da>] ? __audit_syscall_entry+0xaa/0xf0
[  154.884722]  [<ffffffff810030c1>] ? do_audit_syscall_entry+0x61/0x70
[  154.887201]  [<ffffffff81003254>] ? syscall_trace_enter_phase1+0x134/0x150
[  154.889666]  [<ffffffff81003017>] ? trace_hardirqs_on_thunk+0x17/0x19
[  154.891519]  [<ffffffff8106f594>] SyS_clone+0x14/0x20
[  154.893059]  [<ffffffff816feeb2>] entry_SYSCALL_64_fastpath+0x12/0x76
[  154.894859] Mem-Info:
[  154.895851] active_anon:31807 inactive_anon:2093 isolated_anon:0
[  154.895851]  active_file:242656 inactive_file:67266 isolated_file:0
[  154.895851]  unevictable:0 dirty:8 writeback:0 unstable:0
[  154.895851]  slab_reclaimable:15100 slab_unreclaimable:20839
[  154.895851]  mapped:1681 shmem:2162 pagetables:18491 bounce:0
[  154.895851]  free:4243 free_pcp:343 free_cma:0
[  154.905459] Node 0 DMA free:6908kB min:44kB low:52kB high:64kB active_anon:3408kB inactive_anon:120kB active_file:4kB inactive_file:4kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:64kB shmem:124kB slab_reclaimable:872kB slab_unreclaimable:3032kB kernel_stack:176kB pagetables:328kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[  154.916097] lowmem_reserve[]: 0 1714 1714 1714
[  154.917857] Node 0 DMA32 free:17996kB min:5172kB low:6464kB high:7756kB active_anon:121688kB inactive_anon:8252kB active_file:970620kB inactive_file:269060kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2080640kB managed:1758944kB mlocked:0kB dirty:32kB writeback:0kB mapped:6660kB shmem:8524kB slab_reclaimable:59528kB slab_unreclaimable:80460kB kernel_stack:47312kB pagetables:70972kB unstable:0kB bounce:0kB free_pcp:1356kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[  154.929908] lowmem_reserve[]: 0 0 0 0
[  154.931918] Node 0 DMA: 107*4kB (UME) 72*8kB (ME) 47*16kB (UME) 19*32kB (UME) 9*64kB (ME) 1*128kB (M) 3*256kB (M) 2*512kB (E) 2*1024kB (UM) 0*2048kB 0*4096kB = 6908kB
[  154.937453] Node 0 DMA32: 1113*4kB (UME) 1400*8kB (UME) 116*16kB (UM) 15*32kB (UM) 1*64kB (M) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 18052kB
[  154.941617] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[  154.944167] 312171 total pagecache pages
[  154.945926] 0 pages in swap cache
[  154.947521] Swap cache stats: add 0, delete 0, find 0/0
[  154.949436] Free swap  = 0kB
[  154.950920] Total swap = 0kB
[  154.952531] 524157 pages RAM
[  154.954063] 0 pages HighMem/MovableOnly
[  154.955785] 80445 pages reserved
[  154.957362] 0 pages hwpoisoned
----------
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ