linux-kernel - Re: [PATCH] mm, vmscan: Do not wait for page writeback for GFP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <alpine.LSU.2.11.1508121833220.3648@eggly.anvils>
Date:	Wed, 12 Aug 2015 19:13:08 -0700 (PDT)
From:	Hugh Dickins <hughd@...gle.com>
To:	Nikolay Borisov <kernel@...p.com>
cc:	Hugh Dickins <hughd@...gle.com>, Michal Hocko <mhocko@...e.cz>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Theodore Ts'o <tytso@....edu>,
	Johannes Weiner <hannes@...xchg.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Dave Chinner <david@...morbit.com>, Marian Marinov <mm@...com>,
	linux-mm@...ck.org, LKML <linux-kernel@...r.kernel.org>,
	linux-ext4@...r.kernel.org
Subject: Re: [PATCH] mm, vmscan: Do not wait for page writeback for GFP_NOFS
 allocations

On Fri, 7 Aug 2015, Nikolay Borisov wrote:
> On 08/04/2015 09:32 AM, Hugh Dickins wrote:
> > 
> > And I've done quite a bit of testing.  The loads that hung at the
> > weekend have been running nicely for 24 hours now, no problem with the
> > writeback hang and no problem with the dcache ENOTDIR issue.  Though
> > I've no idea of what recent VM change turned this into a hot issue.
> > 
> 
> Are these production loads you are referring to that have been able to
> reproduce the issue or are they some synthetic ones which? So far I
> haven't been able to reproduce the issue using artifical loads so I'm
> interested in incorporating this into my test set setup if it's available?

Not production loads, no, just an artificial load.  But not very good
at reproducing the hang: variable, but took hours, and only showed up
on one faster machine; I had to run the load for 2 days, then again 2
days, to feel confident that this hang was fixed.

And I'm sorry, but describing it in full detail is not something I find
time to do, in days or in years - partly because once I try to detail it,
I need to simplify this and streamline that, and it turns into something
else.  As happened when I sent it, offlist, to someone in 2009: I looked
back at that with a view to forwarding to you, but a lot of the details
don't match what I reverted to or advanced to doing since.

Broadly, it's a pair of repeated make -j20 kernel builds, one in tmpfs,
one in ext4 over loop over tmpfs, in limited memory 700M with 1.5G swap.
And to test this particular hang, it needed to use memcg (of what's now
branded an "insane" variety, CONFIG_CGROUP_WRITEBACK=n): I was using 1G
not 700M ram for this, but 300M memcg limit and 250M soft limit on each
memcg that was hosting one of the pair of repeated builds.  It can be
difficult to tune the right balance, swapping heavily but not OOMing:
it's a 2.6.24 tree I've gone back to building, because that's so much
smaller than current, with a greater proportion active in the build.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/