lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CAGsJ_4xKuY4P2e4KFJ4pA0Q53b+tOn5ki3An0ZiciH08ZBhr+w@mail.gmail.com>
Date: Tue, 24 Sep 2024 10:38:37 +1200
From: Barry Song <21cnbao@...il.com>
To: Minchan Kim <minchan@...nel.org>
Cc: David Hildenbrand <david@...hat.com>, akpm@...ux-foundation.org, linux-mm@...ck.org, 
	mhocko@...e.com, fengbaopeng@...or.com, gaoxu2@...or.com, 
	hailong.liu@...o.com, kaleshsingh@...gle.com, linux-kernel@...r.kernel.org, 
	lokeshgidra@...gle.com, ngeoffray@...gle.com, shli@...com, surenb@...gle.com, 
	yipengxiang@...or.com, yuzhao@...gle.com, Barry Song <v-songbaohua@...o.com>
Subject: Re: [PATCH RFC] mm: mglru: provide a separate list for lazyfree anon folios

On Tue, Sep 24, 2024 at 10:19 AM Minchan Kim <minchan@...nel.org> wrote:
>
> On Fri, Sep 20, 2024 at 01:23:57PM +1200, Barry Song wrote:
> > On Wed, Sep 18, 2024 at 12:02 AM David Hildenbrand <david@...hat.com> wrote:
> > >
> > > On 14.09.24 08:37, Barry Song wrote:
> > > > From: Barry Song <v-songbaohua@...o.com>
> > > >
> > > > This follows up on the discussion regarding Gaoxu's work[1]. It's
> > > > unclear if there's still interest in implementing a separate LRU
> > > > list for lazyfree folios, but I decided to explore it out of
> > > > curiosity.
> > > >
> > > > According to Lokesh, MADV_FREE'd anon folios are expected to be
> > > > released earlier than file folios. One option, as implemented
> > > > by Gao Xu, is to place lazyfree anon folios at the tail of the
> > > > file's `min_seq` generation. However, this approach results in
> > > > lazyfree folios being released in a LIFO manner, which conflicts
> > > > with LRU behavior, as noted by Michal.
> > > >
> > > > To address this, this patch proposes maintaining a separate list
> > > > for lazyfree anon folios while keeping them classified under the
> > > > "file" LRU type to minimize code changes. These lazyfree anon
> > > > folios will still be counted as file folios and share the same
> > > > generation with regular files. In the eviction path, the lazyfree
> > > > list will be prioritized for scanning before the actual file
> > > > LRU list.
> > > >
> > >
> > > What's the downside of another LRU list? Do we have any experience on that?
> >
> > Essentially, the goal is to address the downsides of using a single LRU list for
> > files and lazyfree anonymous pages - seriously more files re-faults.
> >
> > I'm not entirely clear on the downsides of having an additional LRU
> > list. While it
> > does increase complexity, it doesn't seem to be significant.
>
> It's not a catastrophic[1]. I prefer the idea of an additional LRU
> because it offers flexibility for various potential use cases[2].
>
> orthgonal topic(but may be interest for someone)
>
> My main interest in a new LRU list is to enable the system to maintain a
> quickly reclaimable memory pool and expose the size to the admin with
> a knob to decide how many memory pool they want.
>
> This pool would consist of clean, unmapped pages from both the page cache
> and/or the swap cache. This would allow the system to reclaim memory quickly
> when free memory is low, at the cost of minor fault overhead.

My current implementation only handles the MADV_FREE anonymous case. If they
are placed in a single LRU, they should be able to be reclaimed very
quickly, simply
discarded without needing to be swapped out.

I've been thinking about the issue of unmapped pagecache recently.
These unmapped
pagecaches can be reclaimed much faster than mapped ones, especially
when the latter
have a high mapcount and incur significant rmap costs. However, many
pagecaches are
inherently unmapped (e.g., from syscall read). If they are placed in a
single LRU, the
challenge would be comparing the age of unmapped pagecache with mapped ones.
Currently, with the mglru tier mechanism, frequently accessed unmapped
pagecaches
have a chance to be placed in a spot where they are harder to reclaim.

personally I am quite interested in putting unmapped pagecache
together as right now
reclamation could be like this:

lru list:
unmapped pagecache(A) - mapped pagecached(B) - unmapped pagecache(C) - mapped
pagecached with huge mapcount(D)

A and C can be reclaimed with zero cost but they have to wait for D and B.

But the question is that if make two lists:

list1: A - C
list2: B - D

How can we ensure that A and C won't experience many refaults, even though
reclaiming them would be cost-free? Or that B and D might actually be
colder than
A and C?

If this isn't an issue, I'd be very interested in implementing it. Any thoughts?

>
> [1] https://lore.kernel.org/linux-kernel//1448006568-16031-15-git-send-email-minchan@kernel.org/
> [2] https://lkml.org/lkml/2012/6/19/24

Thanks
Barry

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ