lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 24 Jan 2017 13:51:30 +0100
From:   Michal Hocko <mhocko@...nel.org>
To:     Trevor Cordes <trevor@...nopolis.ca>
Cc:     Mel Gorman <mgorman@...hsingularity.net>,
        linux-kernel@...r.kernel.org, Joonsoo Kim <iamjoonsoo.kim@....com>,
        Minchan Kim <minchan@...nel.org>,
        Rik van Riel <riel@...riel.com>,
        Srikar Dronamraju <srikar@...ux.vnet.ibm.com>
Subject: Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)

On Fri 20-01-17 00:35:44, Trevor Cordes wrote:
> On 2017-01-19 Michal Hocko wrote:
> > On Thu 19-01-17 03:48:50, Trevor Cordes wrote:
> > > On 2017-01-17 Michal Hocko wrote:  
> > > > On Tue 17-01-17 14:21:14, Mel Gorman wrote:  
> > > > > On Tue, Jan 17, 2017 at 02:52:28PM +0100, Michal Hocko
> > > > > wrote:    
> > > > > > On Mon 16-01-17 11:09:34, Mel Gorman wrote:
> > > > > > [...]    
> > > > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > > > > index 532a2a750952..46aac487b89a 100644
> > > > > > > --- a/mm/vmscan.c
> > > > > > > +++ b/mm/vmscan.c
> > > > > > > @@ -2684,6 +2684,7 @@ static void shrink_zones(struct
> > > > > > > zonelist *zonelist, struct scan_control *sc) continue;
> > > > > > >  
> > > > > > >  			if (sc->priority != DEF_PRIORITY &&
> > > > > > > +			    !buffer_heads_over_limit &&
> > > > > > >  			    !pgdat_reclaimable(zone->zone_pgdat))
> > > > > > >  				continue;	/* Let
> > > > > > > kswapd poll it */    
> > > > > > 
> > > > > > I think we should rather remove pgdat_reclaimable here. This
> > > > > > sounds like a wrong layer to decide whether we want to reclaim
> > > > > > and how much.   
> > > > > 
> > > > > I had considered that but it'd also be important to add the
> > > > > other 32-bit patches you have posted to see the impact. Because
> > > > > of the ratio of LRU pages to slab pages, it may not have an
> > > > > impact but it'd need to be eliminated.    
> > > > 
> > > > OK, Trevor you can pull from
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git tree
> > > > fixes/highmem-node-fixes branch. This contains the current mmotm
> > > > tree
> > > > + the latest highmem fixes. I also do not expect this would help
> > > > much in your case but as Mel've said we should rule that out at
> > > > least.  
> > > 
> > > Hi!  The git tree above version oom'd after < 24 hours (3:02am) so
> > > it doesn't solve the bug.  If you need a oom messages dump let me
> > > know.  
> > 
> > Yes please.
> 
> The first oom from that night attached.  Note, the oom wasn't as dire
> with your mhocko/4.9.0+ as it usually is with stock 4.8.x: my oom
> detector and reboot script was able to do its thing cleanly before the
> system became unusable.

Just for reference. This oom was due to bug with the active LRU aging
fixed in the Linus tree (b4536f0c829c ("mm, memcg: fix the active list
aging for lowmem requests when memcg is enabled") 4.10-rc4)

Jan 19 03:02:19 firewallfsi kernel: [85602.858232] Normal free:3436kB min:3532kB low:4412kB high:5292kB active_anon:4kB inactive_anon:8kB active_file:193340kB inactive_file:120kB unevictable
:0kB writepending:2516kB present:892920kB managed:816932kB mlocked:0kB slab_reclaimable:522292kB slab_unreclaimable:46724kB kernel_stack:2560kB pagetables:0kB bounce:0kB free_pcp:3468kB loca
l_pcp:176kB free_cma:0kB

Look at how all the reclaimable memory is on the inactive_file...
-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists