lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Tue, 4 Apr 2017 18:29:52 -0400 From: Johannes Weiner <hannes@...xchg.org> To: Andrew Morton <akpm@...ux-foundation.org> Cc: Rik van Riel <riel@...hat.com>, Mel Gorman <mgorman@...e.de>, Michal Hocko <mhocko@...e.com>, Vladimir Davydov <vdavydov.dev@...il.com>, linux-mm@...ck.org, cgroups@...r.kernel.org, linux-kernel@...r.kernel.org, kernel-team@...com Subject: Re: [PATCH] mm: vmscan: fix IO/refault regression in cache workingset transition On Tue, Apr 04, 2017 at 03:07:03PM -0700, Andrew Morton wrote: > On Tue, 4 Apr 2017 18:00:52 -0400 Johannes Weiner <hannes@...xchg.org> wrote: > > > Since 59dc76b0d4df ("mm: vmscan: reduce size of inactive file list") > > we noticed bigger IO spikes during changes in cache access patterns. > > > > The patch in question shrunk the inactive list size to leave more room > > for the current workingset in the presence of streaming IO. However, > > workingset transitions that previously happened on the inactive list > > are now pushed out of memory and incur more refaults to complete. > > > > This patch disables active list protection when refaults are being > > observed. This accelerates workingset transitions, and allows more of > > the new set to establish itself from memory, without eating into the > > ability to protect the established workingset during stable periods. > > > > Fixes: 59dc76b0d4df ("mm: vmscan: reduce size of inactive file list") > > Signed-off-by: Johannes Weiner <hannes@...xchg.org> > > Cc: <stable@...r.kernel.org> # 4.7+ > > That's a pretty large patch and the problem has been there for a year. > I'm not sure that it's 4.11 material, let alone -stable. Care to > explain further? The problem statement is a little terse, my apologies. The workloads that were measurably affected for us were hit pretty bad by it, with refault/majfault rates doubling and tripling during cache transitions, and the machines sustaining half-hour periods of 100% IO utilization, where they'd previously have sub-minute peaks at 60-90%. Stateful services that handle user data tend to be more conservative with kernel upgrades. As a result we hit most page cache issues with some delay, as was the case here. The severity seemed to warrant a stable tag, but I agree that holding out until 4.11.1 is probably better, given the invasiveness of this.
Powered by blists - more mailing lists