lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170406085529.GF5497@dhcp22.suse.cz>
Date:   Thu, 6 Apr 2017 10:55:30 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     Hugh Dickins <hughd@...gle.com>
Cc:     Mel Gorman <mgorman@...hsingularity.net>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Tejun Heo <tj@...nel.org>, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org
Subject: Re: Is it safe for kthreadd to drain_all_pages?

On Wed 05-04-17 13:59:49, Hugh Dickins wrote:
> Hi Mel,
> 
> I suspect that it's not safe for kthreadd to drain_all_pages();
> but I haven't studied flush_work() etc, so don't really know what
> I'm talking about: hoping that you will jump to a realization.
> 
> 4.11-rc has been giving me hangs after hours of swapping load.  At
> first they looked like memory leaks ("fork: Cannot allocate memory");
> but for no good reason I happened to do "cat /proc/sys/vm/stat_refresh"
> before looking at /proc/meminfo one time, and the stat_refresh stuck
> in D state, waiting for completion of flush_work like many kworkers.
> kthreadd waiting for completion of flush_work in drain_all_pages().
> 
> But I only noticed that pattern later: originally tried to bisect
> rc1 before rc2 came out, but underestimated how long to wait before
> deciding a stage good - I thought 12 hours, but would now say 2 days.
> Too late for bisection, I suspect your drain_all_pages() changes.

Yes, this is a fallout from Mel's changes. I was about to say that
my follow up fixes which made this flushing to the single WQ with rescuer
fixed that but it seems that
http://www.ozlabs.org/~akpm/mmotm/broken-out/mm-move-pcp-and-lru-pcp-drainging-into-single-wq.patch
didn't make it to the Linus tree. Could you re-test with this one?
While your change is obviously correct I think the above should address
it as well and it is more generic. If it works then I will ask Andrew to
send the above to Linus (along with its follow up
mm-move-pcp-and-lru-pcp-drainging-into-single-wq-fix.patch)
-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ