linux-kernel - Re: [PATCH 3/5] mm: vmscan: remove old flusher wakeup from direct reclaim path

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170126185027.GB30636@cmpxchg.org>
Date:   Thu, 26 Jan 2017 13:50:27 -0500
From:   Johannes Weiner <hannes@...xchg.org>
To:     Mel Gorman <mgorman@...e.de>
Cc:     Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, kernel-team@...com
Subject: Re: [PATCH 3/5] mm: vmscan: remove old flusher wakeup from direct
 reclaim path

On Thu, Jan 26, 2017 at 10:05:09AM +0000, Mel Gorman wrote:
> On Mon, Jan 23, 2017 at 01:16:39PM -0500, Johannes Weiner wrote:
> > Direct reclaim has been replaced by kswapd reclaim in pretty much all
> > common memory pressure situations, so this code most likely doesn't
> > accomplish the described effect anymore. The previous patch wakes up
> > flushers for all reclaimers when we encounter dirty pages at the tail
> > end of the LRU. Remove the crufty old direct reclaim invocation.
> > 
> > Signed-off-by: Johannes Weiner <hannes@...xchg.org>
> 
> In general I like this. I worried first that if kswapd is blocked
> writing pages that it won't reach the wakeup_flusher_threads but the
> previous patch handles it.
> 
> Now though, it occurs to me with the last patch that we always writeout
> the world when flushing threads. This may not be a great idea. Consider
> for example if there is a heavy writer of short-lived tmp files. In such a
> case, it is possible for the files to be truncated before they even hit the
> disk. However, if there are multiple "writeout the world" calls, these may
> now be hitting the disk. Furthermore, multiplle kswapd and direct reclaimers
> could all be requested to writeout the world and each request unplugs.
> 
> Is it possible to maintain the property of writing back pages relative
> to the numbers of pages scanned or have you determined already that it's
> not necessary?

That's what I started out with - waking the flushers for nr_taken. I
was using a silly test case that wrote < dirty background limit and
then allocated a burst of anon memory. When the dirty data is linear,
the bigger IO requests are beneficial. They don't exhaust struct
request (like kswapd 4k IO routinely does, and SWAP_CLUSTER_MAX is
only 32), and they require less frequent plugging.

Force-flushing temporary files under memory pressure is a concern -
although the most recently dirtied files would get queued last, giving
them still some time to get truncated - but I'm wary about splitting
the flush requests too aggressively when we DO sustain throngs of
dirty pages hitting the reclaim scanners.

I didn't test this with the real workload that gave us problems yet,
though, because deploying enough machines to get a good sample size
takes 1-2 days and to run through the full load spectrum another 4-5.
So it's harder to fine-tune these patches.

But this is a legit concern. I'll try to find out what happens when we
reduce the wakeups to nr_taken.

Given the problem these patches address, though, would you be okay
with keeping this patch in -mm? We're too far into 4.10 to merge it
upstream now, and I should have data on more precise wakeups before
the next merge window.

Thanks