linux-kernel - Re: dd: page allocation failure: order:0, mode:0x1080020(GFP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20171130142352.3iva2g5ygn4byh7r@dhcp22.suse.cz>
Date:   Thu, 30 Nov 2017 15:23:52 +0100
From:   Michal Hocko <mhocko@...nel.org>
To:     Fengguang Wu <fengguang.wu@...el.com>
Cc:     linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Johannes Weiner <hannes@...xchg.org>,
        linux-kernel@...r.kernel.org, lkp@...org
Subject: Re: dd: page allocation failure: order:0,
 mode:0x1080020(GFP_ATOMIC), nodemask=(null)

On Thu 30-11-17 22:01:03, Wu Fengguang wrote:
> On Thu, Nov 30, 2017 at 02:50:16PM +0100, Michal Hocko wrote:
> > On Thu 30-11-17 21:38:40, Wu Fengguang wrote:
> > > Hello,
> > > 
> > > It looks like a regression in 4.15.0-rc1 -- the test case simply run a
> > > set of parallel dd's and there seems no reason to run into memory problem.
> > > 
> > > It occurs in 1 out of 4 tests.
> > 
> > This is an atomic allocations. So the failure really depends on the
> > state of the free memory and that can vary between runs depending on
> > timing I guess. So I am not really sure this is a regression. But maybe
> > there is something reclaim related going on here.
> 
> Yes, it does depend on how the drivers rely on atomic allocations.
> I just wonder if any changes make the pressure more tight than before.
> It may not even be a MM change -- in theory drivers might also use atomic
> allocations more aggressively than before.
> 
[...]
> Attached the JSON format per-second vmstat records.
> It feels more readable than the raw dumps.

Well from a quick check it seems that there is just a legit memory
pressure where kswapd doesn't keep up with the allocation pace. If we
just check the overal kswapd reclaim efficiency
(proc-vmstat.pgsteal_kswapd/proc-vmstat.pgscan_kswapd)
13311631/1331957
then we are ~99% which means that kswad made a good work to reclaim and
didn't stumble over anything. We reclaim _a lot_ from the direct reclaim
context which means that kswapd doesn't keep up at all
(proc-vmstat.pgsteal_direct/proc-vmstat.pgscan_direct)
107767391/108058968 again ~99% but 8x the kswapd of what kswapd does.

If you look at diffs in pgsteal numbers, we are at 2M pages reclaimed
per second for the direct reclaim and ~200k p/s from kswapd. kswapd is
naturally slower as it is a single thread compared to many reclaimers
hitting the direct reclaim at once.

allocstall numbers show that this was not a single peak but rather a
continual direct reclaim storm, starting with dozens of direct reclaim
invocations per second reaching to hundres on zone Normal and even worse
on the movable zone where we are in thousands (just have a look at diffs
between respective numbres).

So from a quick look it really seems like a heavy memory pressure rather
than some reclaim defficiency to me.
-- 
Michal Hocko
SUSE Labs