linux-kernel - Re: OOM detection regressions since 4.7

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160829150703.GH2968@dhcp22.suse.cz>
Date:   Mon, 29 Aug 2016 17:07:04 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     Olaf Hering <olaf@...fle.de>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Markus Trippelsdorf <markus@...ppelsdorf.de>,
        Arkadiusz Miskiewicz <a.miskiewicz@...il.com>,
        Ralf-Peter Rohbeck <Ralf-Peter.Rohbeck@...ntum.com>,
        Jiri Slaby <jslaby@...e.com>,
        Greg KH <gregkh@...uxfoundation.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        Joonsoo Kim <js1304@...il.com>, linux-mm@...ck.org,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: OOM detection regressions since 4.7

On Mon 29-08-16 16:52:03, Olaf Hering wrote:
> On Thu, Aug 25, Olaf Hering wrote:
> 
> > On Thu, Aug 25, Michal Hocko wrote:
> > 
> > > Any luck with the testing of this patch?
> 
> I ran rc3 for a few hours on Friday amd FireFox was not killed.
> Now rc3 is running for a day with the usual workload and FireFox is
> still running.

Is the patch
(http://lkml.kernel.org/r/20160823074339.GB23577@dhcp22.suse.cz) applied?

> Today I noticed the nfsserver was disabled, probably since months already.
> Starting it gives a OOM, not sure if this is new with 4.7+.
> Full dmesg attached.
> [93348.306369] modprobe: page allocation failure: order:4, mode:0x26040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK)

ok so order-4 (COSTLY allocation) has failed because

[...]
> [93348.313778] Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15908kB
> [93348.313803] Node 0 DMA32: 13633*4kB (UME) 8035*8kB (UME) 890*16kB (UME) 10*32kB (U) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 133372kB
> [93348.313822] Node 0 Normal: 14003*4kB (UME) 25*8kB (UME) 2*16kB (UM) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 56244kB

the memory is too fragmented for such a large allocation. Failing
order-4 requests is not so severe because we do not invoke the oom
killer if they fail. Especially without GFP_REPEAT we do not even try
too hard. Recent oom detection changes shouldn't change this behavior.

-- 
Michal Hocko
SUSE Labs