linux-ext4 - Re: 4.7.0, cp -al causes OOM

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160812074340.GC3639@dhcp22.suse.cz>
Date:	Fri, 12 Aug 2016 09:43:40 +0200
From:	Michal Hocko <mhocko@...nel.org>
To:	arekm@...en.pl
Cc:	linux-ext4@...r.kernel.org, linux-mm@...r.kernel.org
Subject: Re: 4.7.0, cp -al causes OOM

Hi,

On Fri 12-08-16 09:01:41, Arkadiusz Miskiewicz wrote:
> 
> Hello.
> 
> I have a system with 4x2TB SATA disks, split into few partitions. Celeron G530,
> 8GB of ram, 20GB of swap. It's just basic system (so syslog,
> cron, udevd, irqbalance) + my cp tests and nothing more. kernel 4.7.0
> 
> There is software raid 5 partition on sd[abcd]4 and ext4 created with -T news
> option.
> 
> Using deadline I/O scheduler.
> 
> For testing I have 400GB of tiny files on it (about 6.4mln inodes) in mydir.
> I did "cp -al mydir copy{1,2,...,10}" 10x in parallel and that ended up
> with 5 of cp being killed by OOM while other 5x finished.
> 
> Even two in parallel seem to be enough for OOM to kick in:
> rm -rf copy1; cp -al mydir copy1
> rm -rf copy2; cp -al mydir copy2

Ouch

> I would expect 8GB of ram to be enough for just rm/cp. Ideas?
> 
> Note that I first tested the same thing with xfs (hence you can see
> " task xfsaild/md2:661 blocked for more than 120 seconds." and xfs
> related stacktraces in dmesg) and 10x cp managed to finish without
> OOM. Later I did test with ext4 which caused OOMs. I guess it is
> probably not some generic memory management problem but that's only my
> guess.

I suspect the compaction is not able to migrate FS buffers to form
higher order pages.

[...]
> [87259.568301] bash invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0

This is a kernel stack allocation (so order-2 request)

[...]
> [87259.568369] active_anon:439065 inactive_anon:146385 isolated_anon:0
>                 active_file:201920 inactive_file:122369 isolated_file:0

This is around 3.5G of memory for file/anonymous pages which is ~43% of
RAM. Considering that the free memory is quite low this means that the
majority of the memory is consumed by somebody else.

>                 unevictable:0 dirty:26675 writeback:0 unstable:0
>                 slab_reclaimable:966564 slab_unreclaimable:79528

OK, so the slab objects eat 50% of memory. I would check /proc/slabinfo
who has eaten that memory. Large portion of the slab is reclaimable but
I suspect that it can easily prevent memory compaction to succeed.

>                 mapped:2236 shmem:1 pagetables:1759 bounce:0
>                 free:30651 free_pcp:0 free_cma:0
[...]
> [87259.568395] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15360kB
> [87259.568403] Node 0 DMA32: 11467*4kB (UME) 1525*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 58068kB
> [87259.568411] Node 0 Normal: 9927*4kB (UMEH) 1119*8kB (UMH) 19*16kB (H) 8*32kB (H) 2*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 49348kB

As you can see there are barely some high order pages available. There
are few in the atomic reserves which is a bit surprising because I would
expect they would get released under a heavy memory pressure. I will
double check that part.

Anyway I suspect the primary reason is that the compaction cannot make
forward progress. Before 4.7 the OOM detection didn't bother to take
the compaction feedback into account and just blindly retried as long as
there was a reclaim progress. This was basically unbounded in time and
without any guarantee of a success... /proc/vmstat snapshots before you
start your load and after the OOM killer might tell us more.

Anyway filling up memory with so many slab objects sounds suspicious on
its own. I guess that the fact you have huge number of files plays an
important role. This is something for ext4 people to answer.

[...]
> [99888.398968] kthreadd invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0
[...]
> [99888.399036] Mem-Info:
> [99888.399040] active_anon:195818 inactive_anon:195891 isolated_anon:0
>                 active_file:294335 inactive_file:23747 isolated_file:0

LRU pages got down to 34%...

>                 unevictable:0 dirty:38741 writeback:2 unstable:0
>                 slab_reclaimable:1079860 slab_unreclaimable:157162

while slab memory increased to 59%

[...]

> [99888.399066] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15360kB
> [99888.399075] Node 0 DMA32: 14370*4kB (UME) 1809*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 71952kB
> [99888.399082] Node 0 Normal: 12172*4kB (UMEH) 165*8kB (UMEH) 23*16kB (H) 9*32kB (H) 2*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 50792kB

high order reserves still block some order-2+ blocks.

[...]

> [103315.505488] kthreadd invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0
[...]
> [103315.505554] Mem-Info:
> [103315.505559] active_anon:154510 inactive_anon:154514 isolated_anon:0
>                  active_file:317774 inactive_file:43364 isolated_file:0

and the LRU pages go even more down to 32%

>                  unevictable:0 dirty:11801 writeback:5212 unstable:0
>                  slab_reclaimable:1112194 slab_unreclaimable:166028

while slab grows above 60%

[...]
> [104400.507680] Mem-Info:
> [104400.507684] active_anon:129371 inactive_anon:129450 isolated_anon:0
>                  active_file:316704 inactive_file:55666 isolated_file:0

LRU 30%

>                  unevictable:0 dirty:29991 writeback:0 unstable:0
>                  slab_reclaimable:1145618 slab_unreclaimable:171545

slab 63%

[...]

> [114824.060378] Mem-Info:
> [114824.060403] active_anon:170168 inactive_anon:170168 isolated_anon:0
>                  active_file:192892 inactive_file:133384 isolated_file:0

LRU 32%

>                  unevictable:0 dirty:37109 writeback:1 unstable:0
>                  slab_reclaimable:1176088 slab_unreclaimable:109598

slab 61%

[...]

That being said it is really unusual to see such a large kernel memory
foot print. The slab memory consumption grows but it doesn't seem to be
a memory leak at first glance. Anyway such a large in-kernel consumption
can severely affect forming higher order memory blocks. I believe we can
do slightly better wrt high atomic reserves but that doesn't sound like
a core problem here. I believe ext4 should look at what is going on
there as well.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html