linux-kernel - Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170116110934.7zopy3ecg2lfadkd@techsingularity.net>
Date:   Mon, 16 Jan 2017 11:09:34 +0000
From:   Mel Gorman <mgorman@...hsingularity.net>
To:     Trevor Cordes <trevor@...nopolis.ca>
Cc:     Michal Hocko <mhocko@...nel.org>, linux-kernel@...r.kernel.org,
        Joonsoo Kim <iamjoonsoo.kim@....com>,
        Minchan Kim <minchan@...nel.org>,
        Rik van Riel <riel@...riel.com>,
        Srikar Dronamraju <srikar@...ux.vnet.ibm.com>
Subject: Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)

On Sun, Jan 15, 2017 at 12:27:52AM -0600, Trevor Cordes wrote:
> On 2017-01-12 Michal Hocko wrote:
> > On Wed 11-01-17 16:52:32, Trevor Cordes wrote:
> > [...]
> > > I'm not sure how I can tell if my bug is because of memcgs so here
> > > is a full first oom example (attached).  
> > 
> > 4.7 kernel doesn't contain 71c799f4982d ("mm: add per-zone lru list
> > stat") so the OOM report will not tell us whether the Normal zone
> > doesn't age active lists, unfortunatelly.
> 
> I compiled the patch Mel provided into the stock F23 kernel
> 4.8.13-100.fc23.i686+PAE and it ran for 2 nights.  It didn't oom the
> first night, but did the second night.  So the bug persists even with
> that patch.  However, it does *seem* a bit "better" since it took 2
> nights (usually takes only one, but maybe 10% of the time it does take
> two) before oom'ing, *and* it allowed my reboot script to reboot it
> cleanly when it saw the oom (which happens only 25% of the time).
> 
> I'm attaching the 4.8.13 oom message which should have the memcg info
> (71c799f4982d) you are asking for above?  Hopefully?
> 

It shows that there are an extremely large number of reclaimable slab
pages in the lower zones. Other pages have been reclaimed as normal but
the failure to reclaim slab pages causes a high-order allocation to
fail.

> > You can easily check whether this is memcg related by trying to run
> > the same workload with cgroup_disable=memory kernel command line
> > parameter. This will put all the memcg specifics out of the way.
> 
> I will try booting now into cgroup_disable=memory to see if that helps
> at all.  I'll reply back in 48 hours, or when it oom's, whichever comes
> first.
> 

Thanks.

> Also, should I bother trying the latest git HEAD to see if that solves
> anything?  Thanks!

That's worth trying. If that also fails then could you try the following
hack to encourage direct reclaim to reclaim slab when buffers are over
the limit please?

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 532a2a750952..46aac487b89a 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2684,6 +2684,7 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
 				continue;
 
 			if (sc->priority != DEF_PRIORITY &&
+			    !buffer_heads_over_limit &&
 			    !pgdat_reclaimable(zone->zone_pgdat))
 				continue;	/* Let kswapd poll it */
 

-- 
Mel Gorman
SUSE Labs