lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170203183654.1821addc@pog.tecnopolis.ca>
Date:   Fri, 3 Feb 2017 18:36:54 -0600
From:   Trevor Cordes <trevor@...nopolis.ca>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     Mel Gorman <mgorman@...hsingularity.net>,
        linux-kernel@...r.kernel.org, Joonsoo Kim <iamjoonsoo.kim@....com>,
        Minchan Kim <minchan@...nel.org>,
        Rik van Riel <riel@...riel.com>,
        Srikar Dronamraju <srikar@...ux.vnet.ibm.com>
Subject: Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)

On 2017-02-01 Michal Hocko wrote:
> On Wed 01-02-17 03:29:28, Trevor Cordes wrote:
> > On 2017-01-30 Michal Hocko wrote:  
> [...]
> > > Testing with Valinall rc6 released just yesterday would be a good
> > > fit. There are some more fixes sitting on mmotm on top and maybe
> > > we want some of them in finall 4.10. Anyway all those pending
> > > changes should be merged in the next merge window - aka 4.11  
> > 
> > After 30 hours of running vanilla 4.10.0-rc6, the box started to go
> > bonkers at 3am, so vanilla does not fix the bug :-(  But, the bug
> > hit differently this time, the box just bogged down like crazy and
> > gave really weird top output.  Starting nano would take 10s, then
> > would run full speed, then when saving a file would take 5s.
> > Starting any prog not in cache took equally as long.  
> 
> Could you try with to_test/linus-tree/oom_hickups branch on the same
> git tree? I have cherry-picked "mm, vmscan: consider eligible zones in
> get_scan_count" which might be the missing part.

I ran to_test/linus-tree/oom_hickups branch (4.10.0-rc6+) for 50 hours
and it does NOT have the bug!  No problems at all so far.

So I think whatever to_test/linus-tree/oom_hickups has that since-4.9
has that vanilla 4.10-rc6 does *not* have is indeed the fix.

For my reference, and I know you guys aren't distro-specific, what is
the best way to get this fix into Fedora 24 (currently 4.9)?  Can it be
backported or made as a patch they can apply to 4.9?  Or 4.10?  If this
fix only goes into 4.11 then I fear we'll never see it in Fedora and us
rhbz guys will not have a stock-Fedora fix for this until F25 or F26.
Again, I'm not trying to force this out of scope, I'm just wondering
about the logistics in these situations.

Once again, thanks to all for your great work and help!  P.S. I'll try
a couple of the other ideas Mel had about ramping the RAM back up, etc.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ