lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 4 Apr 2017 18:29:52 -0400
From:   Johannes Weiner <hannes@...xchg.org>
To:     Andrew Morton <akpm@...ux-foundation.org>
Cc:     Rik van Riel <riel@...hat.com>, Mel Gorman <mgorman@...e.de>,
        Michal Hocko <mhocko@...e.com>,
        Vladimir Davydov <vdavydov.dev@...il.com>, linux-mm@...ck.org,
        cgroups@...r.kernel.org, linux-kernel@...r.kernel.org,
        kernel-team@...com
Subject: Re: [PATCH] mm: vmscan: fix IO/refault regression in cache
 workingset transition

On Tue, Apr 04, 2017 at 03:07:03PM -0700, Andrew Morton wrote:
> On Tue,  4 Apr 2017 18:00:52 -0400 Johannes Weiner <hannes@...xchg.org> wrote:
> 
> > Since 59dc76b0d4df ("mm: vmscan: reduce size of inactive file list")
> > we noticed bigger IO spikes during changes in cache access patterns.
> > 
> > The patch in question shrunk the inactive list size to leave more room
> > for the current workingset in the presence of streaming IO. However,
> > workingset transitions that previously happened on the inactive list
> > are now pushed out of memory and incur more refaults to complete.
> > 
> > This patch disables active list protection when refaults are being
> > observed. This accelerates workingset transitions, and allows more of
> > the new set to establish itself from memory, without eating into the
> > ability to protect the established workingset during stable periods.
> > 
> > Fixes: 59dc76b0d4df ("mm: vmscan: reduce size of inactive file list")
> > Signed-off-by: Johannes Weiner <hannes@...xchg.org>
> > Cc: <stable@...r.kernel.org> # 4.7+
> 
> That's a pretty large patch and the problem has been there for a year. 
> I'm not sure that it's 4.11 material, let alone -stable.  Care to
> explain further?

The problem statement is a little terse, my apologies.

The workloads that were measurably affected for us were hit pretty bad
by it, with refault/majfault rates doubling and tripling during cache
transitions, and the machines sustaining half-hour periods of 100% IO
utilization, where they'd previously have sub-minute peaks at 60-90%.

Stateful services that handle user data tend to be more conservative
with kernel upgrades. As a result we hit most page cache issues with
some delay, as was the case here.

The severity seemed to warrant a stable tag, but I agree that holding
out until 4.11.1 is probably better, given the invasiveness of this.

Powered by blists - more mailing lists