lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 27 Dec 2016 12:27:53 +0100
From:   Michal Hocko <mhocko@...nel.org>
To:     Nils Holland <nholland@...ys.org>
Cc:     Mel Gorman <mgorman@...e.de>, Johannes Weiner <hannes@...xchg.org>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        Chris Mason <clm@...com>, David Sterba <dsterba@...e.cz>,
        linux-btrfs@...r.kernel.org
Subject: Re: [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on)

On Tue 27-12-16 12:23:13, Nils Holland wrote:
> On Tue, Dec 27, 2016 at 09:08:38AM +0100, Michal Hocko wrote:
> > On Mon 26-12-16 19:57:03, Nils Holland wrote:
> > > On Mon, Dec 26, 2016 at 01:48:40PM +0100, Michal Hocko wrote:
> > > > On Fri 23-12-16 23:26:00, Nils Holland wrote:
> > > > > On Fri, Dec 23, 2016 at 03:47:39PM +0100, Michal Hocko wrote:
> > > > > > 
> > > > > > Nils, even though this is still highly experimental, could you give it a
> > > > > > try please?
> > > > > 
> > > > > Yes, no problem! So I kept the very first patch you sent but had to
> > > > > revert the latest version of the debugging patch (the one in
> > > > > which you added the "mm_vmscan_inactive_list_is_low" event) because
> > > > > otherwise the patch you just sent wouldn't apply. Then I rebooted with
> > > > > memory cgroups enabled again, and the first thing that strikes the eye
> > > > > is that I get this during boot:
> > > > > 
> > > > > [    1.568174] ------------[ cut here ]------------
> > > > > [    1.568327] WARNING: CPU: 0 PID: 1 at mm/memcontrol.c:1032 mem_cgroup_update_lru_size+0x118/0x130
> > > > > [    1.568543] mem_cgroup_update_lru_size(f4406400, 2, 1): lru_size 0 but not empty
> > > > 
> > > > Ohh, I can see what is wrong! a) there is a bug in the accounting in
> > > > my patch (I double account) and b) the detection for the empty list
> > > > cannot work after my change because per node zone will not match per
> > > > zone statistics. The updated patch is below. So I hope my brain already
> > > > works after it's been mostly off last few days...
> > > 
> > > I tried the updated patch, and I can confirm that the warning during
> > > boot is gone. Also, I've tried my ordinary procedure to reproduce my
> > > testcase, and I can say that a kernel with this new patch also works
> > > fine and doesn't produce OOMs or similar issues.
> > > 
> > > I had the previous version of the patch in use on a machine non-stop
> > > for the last few days during normal day-to-day workloads and didn't
> > > notice any issues. Now I'll keep a machine running during the next few
> > > days with this patch, and in case I notice something that doesn't look
> > > normal, I'll of course report back!
> > 
> > Thanks for your testing! Can I add your
> > Tested-by: Nils Holland <nholland@...ys.org>
> 
> Yes, I think so! The patch has now been running for 16 hours on my two
> machines, and that's an uptime that was hard to achieve since 4.8 for
> me. ;-) So my tests clearly suggest that the patch is good! :-)

OK, thanks a lot for your testing! I will wait few more days before I
send it to Andrew.

-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists