lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170126101916.tmqa3hswtxfa6nsj@suse.de>
Date:   Thu, 26 Jan 2017 10:19:16 +0000
From:   Mel Gorman <mgorman@...e.de>
To:     Johannes Weiner <hannes@...xchg.org>
Cc:     Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, kernel-team@...com
Subject: Re: [PATCH 5/5] mm: vmscan: move dirty pages out of the way until
 they're flushed

On Mon, Jan 23, 2017 at 01:16:41PM -0500, Johannes Weiner wrote:
> We noticed a performance regression when moving hadoop workloads from
> 3.10 kernels to 4.0 and 4.6. This is accompanied by increased pageout
> activity initiated by kswapd as well as frequent bursts of allocation
> stalls and direct reclaim scans. Even lowering the dirty ratios to the
> equivalent of less than 1% of memory would not eliminate the issue,
> suggesting that dirty pages concentrate where the scanner is looking.
> 

Note that some of this is also impacted by
bbddabe2e436aa7869b3ac5248df5c14ddde0cbf because it can have the effect
of dirty pages reaching the end of the LRU sooner if they are being
written. It's not impossible that hadoop is rewriting the same files,
hitting the end of the LRU due to no reads and then throwing reclaim
into a hole.

I've seen a few cases where random write only workloads regressed and it
was based on whether the random number generator was selecting the same
pages. With that commit, the LRU was effectively LIFO.

Similarly, I'd seen a case where a databases whose working set was
larger than the shared memory area regressed because the spill-over from
the database buffer to RAM was not being preserved because it was all
rights. That said, the same patch prevents the database being swapped so
it's not all bad but there have been consequences.

I don't have a problem with the patch although would prefer to have seen
more data for the series. However, I'm not entirely convinced that
thrash detection was the only problem. I think not activating pages on
write was a contributing factor although this patch looks better than
considering reverting bbddabe2e436aa7869b3ac5248df5c14ddde0cbf.

-- 
Mel Gorman
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ