lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170115002752.4559f668@pog.tecnopolis.ca>
Date:   Sun, 15 Jan 2017 00:27:52 -0600
From:   Trevor Cordes <trevor@...nopolis.ca>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     Mel Gorman <mgorman@...hsingularity.net>,
        linux-kernel@...r.kernel.org, Joonsoo Kim <iamjoonsoo.kim@....com>,
        Minchan Kim <minchan@...nel.org>,
        Rik van Riel <riel@...riel.com>,
        Srikar Dronamraju <srikar@...ux.vnet.ibm.com>
Subject: Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)

On 2017-01-12 Michal Hocko wrote:
> On Wed 11-01-17 16:52:32, Trevor Cordes wrote:
> [...]
> > I'm not sure how I can tell if my bug is because of memcgs so here
> > is a full first oom example (attached).  
> 
> 4.7 kernel doesn't contain 71c799f4982d ("mm: add per-zone lru list
> stat") so the OOM report will not tell us whether the Normal zone
> doesn't age active lists, unfortunatelly.

I compiled the patch Mel provided into the stock F23 kernel
4.8.13-100.fc23.i686+PAE and it ran for 2 nights.  It didn't oom the
first night, but did the second night.  So the bug persists even with
that patch.  However, it does *seem* a bit "better" since it took 2
nights (usually takes only one, but maybe 10% of the time it does take
two) before oom'ing, *and* it allowed my reboot script to reboot it
cleanly when it saw the oom (which happens only 25% of the time).

I'm attaching the 4.8.13 oom message which should have the memcg info
(71c799f4982d) you are asking for above?  Hopefully?

> You can easily check whether this is memcg related by trying to run
> the same workload with cgroup_disable=memory kernel command line
> parameter. This will put all the memcg specifics out of the way.

I will try booting now into cgroup_disable=memory to see if that helps
at all.  I'll reply back in 48 hours, or when it oom's, whichever comes
first.

Also, should I bother trying the latest git HEAD to see if that solves
anything?  Thanks!

Download attachment "oom2" of type "application/octet-stream" (23540 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ