linux-kernel - Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170119034850.0b7d504c@pog.tecnopolis.ca>
Date:   Thu, 19 Jan 2017 03:48:50 -0600
From:   Trevor Cordes <trevor@...nopolis.ca>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     Mel Gorman <mgorman@...hsingularity.net>,
        linux-kernel@...r.kernel.org, Joonsoo Kim <iamjoonsoo.kim@....com>,
        Minchan Kim <minchan@...nel.org>,
        Rik van Riel <riel@...riel.com>,
        Srikar Dronamraju <srikar@...ux.vnet.ibm.com>
Subject: Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)

On 2017-01-17 Michal Hocko wrote:
> On Tue 17-01-17 14:21:14, Mel Gorman wrote:
> > On Tue, Jan 17, 2017 at 02:52:28PM +0100, Michal Hocko wrote:  
> > > On Mon 16-01-17 11:09:34, Mel Gorman wrote:
> > > [...]  
> > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > index 532a2a750952..46aac487b89a 100644
> > > > --- a/mm/vmscan.c
> > > > +++ b/mm/vmscan.c
> > > > @@ -2684,6 +2684,7 @@ static void shrink_zones(struct zonelist
> > > > *zonelist, struct scan_control *sc) continue;
> > > >  
> > > >  			if (sc->priority != DEF_PRIORITY &&
> > > > +			    !buffer_heads_over_limit &&
> > > >  			    !pgdat_reclaimable(zone->zone_pgdat))
> > > >  				continue;	/* Let kswapd
> > > > poll it */  
> > > 
> > > I think we should rather remove pgdat_reclaimable here. This
> > > sounds like a wrong layer to decide whether we want to reclaim
> > > and how much. 
> > 
> > I had considered that but it'd also be important to add the other
> > 32-bit patches you have posted to see the impact. Because of the
> > ratio of LRU pages to slab pages, it may not have an impact but
> > it'd need to be eliminated.  
> 
> OK, Trevor you can pull from
> git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git tree
> fixes/highmem-node-fixes branch. This contains the current mmotm tree
> + the latest highmem fixes. I also do not expect this would help much
> in your case but as Mel've said we should rule that out at least.

Hi!  The git tree above version oom'd after < 24 hours (3:02am) so
it doesn't solve the bug.  If you need a oom messages dump let me know.

Let me know what to try next, guys, and I'll test it out.

> > Before prototyping such a thing, I'd like to hear the outcome of
> > this heavy hack and then add your 32-bit patches onto the list. If
> > the problem is still there then I'd next look at taking slab pages
> > into account in pgdat_reclaimable() instead of an outright removal
> > that has a much wider impact. If that doesn't work then I'll
> > prototype a heavy-handed forced slab reclaim when lower zones are
> > almost all slab pages.

I don't think I've tried the "heavy hack" patch yet?  It's not in the
mhocko tree I just tried?  Should I try the heavy hack on top of mhocko
git or on vanilla or what?

I also want to mention that these PAE boxes suffer from another
problem/bug that I've worked around for almost a year now.  For some
reason it keeps gnawing at me that it might be related.  The disk I/O
goes to pot on this/these PAE boxes after a certain amount of disk
writes (like some unknown number of GB, around 10-ish maybe).  Like
writes go from 500MB/s to 10MB/s!! Reboot and it's magically 500MB/s
again.  I detail this here:
https://muug.ca/pipermail/roundtable/2016-June/004669.html
My fix was to mem=XG where X is <8 (like 4 or 6) to force the PAE
kernel to be more sane about highmem choices.  I never filed a bug
because I read a ton of stuff saying Linus hates PAE, don't use over
4G, blah blah.  But the other fix is to:
set /proc/sys/vm/highmem_is_dirtyable to 1

I'm not bringing this up to get attention to a new bug, I bring this up
because it smells like it might be related.  If something slowly eats
away at the box's vm to the point that I/O gets horribly slow, perhaps
it's related to the slab and high/lomem issue we have here?  And if
related, it may help to solve the oom bug.  If I'm way off base here,
just ignore my tangent!

The funny thing is I thought mem=XG where X<8 solved the problem, but
it doesn't!  It greatly mitigates it, but I still get subtle slowdown
that gets worse over time (like weeks instead of days).  I now use the
highmem_is_dirtyable on most boxes and that seems to solve it for good
in combo with mem=XG.  Let me note, however, that I have NOT set
highmem_is_dirtyable=1 on the test box I am using for all of this
building/testing, as I wanted the config to stay static while I work
through this oom bug.  (I'm real curious to see if
highmem_is_dirtyable=1 would have any impact on the oom though!)
Thanks!