lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160607162311.GG9978@cmpxchg.org>
Date:	Tue, 7 Jun 2016 12:23:11 -0400
From:	Johannes Weiner <hannes@...xchg.org>
To:	Tim Chen <tim.c.chen@...ux.intel.com>
Cc:	linux-mm@...ck.org, linux-kernel@...r.kernel.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	Rik van Riel <riel@...hat.com>, Mel Gorman <mgorman@...e.de>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Andi Kleen <andi@...stfloor.org>,
	Michal Hocko <mhocko@...e.cz>, kernel-team@...com
Subject: Re: [PATCH 10/10] mm: balance LRU lists based on relative thrashing

Hi Tim,

On Mon, Jun 06, 2016 at 04:50:23PM -0700, Tim Chen wrote:
> On Mon, 2016-06-06 at 15:48 -0400, Johannes Weiner wrote:
> > To tell inactive from active refaults, a page flag is introduced that
> > marks pages that have been on the active list in their lifetime. This
> > flag is remembered in the shadow page entry on reclaim, and restored
> > when the page refaults. It is also set on anonymous pages during
> > swapin. When a page with that flag set is added to the LRU, the LRU
> > balance is adjusted for the IO cost of reclaiming the thrashing list.
> 
> Johannes,
> 
> It seems like you are saying that the shadow entry is also present
> for anonymous pages that are swapped out.  But once a page is swapped
> out, its entry is removed from the radix tree and we won't be able
> to store the shadow page entry as for file mapped page 
> in __remove_mapping.  Or are you thinking of modifying
> the current code to keep the radix tree entry? I may be missing something
> so will appreciate if you can clarify.

Sorry if this was ambiguously phrased.

You are correct, there are no shadow entries for anonymous evictions,
only page cache evictions. All swap-ins are treated as "eligible"
refaults and push back against cache, whereas cache only pushes
against anon if the cache workingset is determined to fit into memory.

That implies a fixed hierarchy where the VM always tries to fit the
anonymous workingset into memory first and the page cache second. If
the anonymous set is bigger than memory, the algorithm won't stop
counting IO cost from anonymous refaults and pressuring page cache.

[ Although you can set the effective cost of these refaults to 0
  (swappiness = 200) and reduce effective cache to a minimum -
  possibly to a level where LRU rotations consume most of it.
  But yeah. ]

So the current code works well when we assume that cache workingsets
might exceed memory, but anonymous workingsets don't.

For SSDs and non-DIMM pmem devices this assumption is fine, because
nobody wants half their frequent anonymous memory accesses to be major
faults. Anonymous workingsets will continue to target RAM size there.

Secondary memory types, which userspace can continue to map directly
after "swap out", are a different story. That might need workingset
estimation for anonymous pages. But it would have to build on top of
this series here. These patches are about eliminating or mitigating IO
by swapping idle or colder anon pages when the cache is thrashing.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ