[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZxgHzUHcWvSNqXo2@google.com>
Date: Tue, 22 Oct 2024 13:15:09 -0700
From: Minchan Kim <minchan.kim@...il.com>
To: Barry Song <21cnbao@...il.com>
Cc: Minchan Kim <minchan@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>, yuzhao@...gle.com,
linux-mm@...ck.org, david@...hat.com, fengbaopeng@...or.com,
gaoxu2@...or.com, hailong.liu@...o.com, kaleshsingh@...gle.com,
linux-kernel@...r.kernel.org, lokeshgidra@...gle.com,
mhocko@...e.com, ngeoffray@...gle.com, shli@...com,
surenb@...gle.com, v-songbaohua@...o.com, yipengxiang@...or.com,
Gao Xu <gaoxu2@...onor.com>
Subject: Re: [PATCH v2] mm: mglru: provide a separate list for lazyfree anon
folios
Hi Barry,
Sorry for slow response.
On Fri, Oct 18, 2024 at 06:12:01PM +1300, Barry Song wrote:
> On Fri, Oct 18, 2024 at 6:58 AM Minchan Kim <minchan@...nel.org> wrote:
> >
> > On Thu, Oct 17, 2024 at 06:59:09PM +1300, Barry Song wrote:
> > > On Thu, Oct 17, 2024 at 11:58 AM Andrew Morton
> > > <akpm@...ux-foundation.org> wrote:
> > > >
> > > > On Wed, 16 Oct 2024 16:30:30 +1300 Barry Song <21cnbao@...il.com> wrote:
> > > >
> > > > > To address this, this patch proposes maintaining a separate list
> > > > > for lazyfree anon folios while keeping them classified under the
> > > > > "file" LRU type to minimize code changes.
> > > >
> > > > Thanks. I'll await input from other MGLRU developers before adding
> > > > this for testing.
> > >
> > > Thanks!
> > >
> > > Hi Minchan, Yu,
> > >
> > > Any comments? I understand that Minchan may have a broader plan
> > > to "enable the system to maintain a quickly reclaimable memory
> > > pool and provide a knob for admins to control its size." While I
> > > have no objection to that plan, I believe improving MADV_FREE
> > > performance is a more urgent priority and a low-hanging fruit at this
> > > stage.
> >
> > Hi Barry,
> >
> > I have no idea why my email didn't send well before. I sent following
> > reply on Sep 24. Hope it works this time.
>
> Hi Minchan,
>
> I guess not. Your *this* email ended up in my spam folder of gmail, and
> my oppo.com account still hasn’t received it. Any idea why?
In the end, that's my problem and don't know when it can be fixed.
Anyway, hope again this time works.
>
> >
> > ====== &< ======
> >
> > My proposal involves the following:
> >
> > 1. Introduce an "easily reclaimable" LRU list. This list would hold pages
> > that can be quickly freed without significant overhead.
>
> I assume you plan to keep both lazyfree anon pages and 'reclaimed'
> file folios (reclaimed in the normal LRU lists but still in the easily-
> reclaimable list) in this 'easily reclaimable' LRU list. However, I'm
> not sure this will work, as this patch aims to help reclaim lazyfree
> anon pages before file folios to reduce both file and anon refaults.
> If we place 'reclaimed' file folios and lazyfree anon folios in the
> same list, we may need to revisit how to reclaim lazyfree anon folios
> before reclaiming the 'reclaimed' file folios.
Those reclaimed folio was already *decision-made* but just couldn't due to
the *impelementation issue*. So, that's strong candidate to be reclaimed
as long as there is no access since then rather other candidates.
>
> >
> > 2. Implement a parameter to control the size of this list. This allows for
> > system tuning based on available memory and performance requirements.
>
> If we include only 'reclaimed' file folios in this 'easily
> reclaimable' LRU list, the
> parameter makes sense. However, if we also add lazyfree folios to the list, the
> parameter becomes less meaningful since we can't predict how many
> lazyfree anon folios user space might have. I still feel lazyfree anon folios
> are different with "reclaimed" file folios (I mean reclaimed from normal
> lists but still in 'easily-reclaimable' list).
I thought the ez-reclamable LRU doesn't need to be accurate since we can
put other folios later(e.g., fadvise_dontneed but couldn't at that time)
>
> >
> > 3. Modify kswapd behavior to utilize this list. When kswapd is awakened due
> > to memory pressure, it should attempt to drop those pages first to refill
> > free pages up to the high watermark by first reclaiming.
> >
> > 4. Before kswapd goes to sleep, it should scan the tail of the LRU list and
> > move cold pages to the easily reclaimable list, unmapping them from the
> > page table.
> >
> > 5. Whenever page cache hit, move the page into evictable LRU.
> >
> > This approach allows the system to maintain a pool of readily available
> > memory, mitigating the "aging" problem. The trade-off is the potential for
> > minor page faults and LRU movement ovehreads if these pages in ez_reclaimable
> > LRU are accessed again.
>
> I believe you're aware of an implementation from Samsung that uses
> cleancache. Although it was dropped from the mainline kernel, it still
> exists in the Android kernel. Samsung's rbincache, based on cleancache,
> maintains a reserved memory region for holding reclaimed file folios.
> Instead of LRU movement, rbincache uses memcpy to transfer data between
> the pool and the page cache.
>
> >
> > Furthermore, we could put some asynchrnous writeback pages(e.g., swap
> > out or writeback the fs pages) into the list, too.
> > Currently, what we are doing is rotate those pages back to head of LRU
> > and once writeback is done, move the page to the tail of LRU again.
> > We can simply put the page into ez_reclaimable LRU without rotating
> > back and forth.
>
> If this is about establishing a pool of easily reclaimable file folios, I
> fully support the idea and am eager to try it, especially for Android,
> where there are certainly strong use cases. However, I suspect it may
> be controversial and could take months to gain acceptance. Therefore,
> I’d prefer we first focus on landing a smaller change to address the
> madv_free performance issue and treat that idea as a separate
> incremental patch set.
I don't want to block the improvement, Barry.
The reason I suggested another LRU was actullay to prevent divergent
between MGLRU and split-LRU and show the same behavior introducing
additional logic in the central place.
I don't think that's desire that a usespace hint showed different
priority depending on admin config.
Personally, I belive that would be better to introudce a knob to
change MADV_FREE's behavior for both LRU algorithms at the same time
instead of only one even though we will see the LRU inversion issue.
>
> My current patch specifically targets the issue of reclaiming lazyfree
> anon folios before reclaiming file folios. It appears your proposal is
> independent (though related) work, and I don't believe it should delay
> resolving the madv_free issue. Additionally, that pool doesn’t effectively
> address the reclamation priority between files and lazyfree anon folios.
>
> In conclusion:
>
> 1. I agree that the pool is valuable, and I’d like to develop it as an
> incremental patch set. However, this is a significant step that will
> require considerable time.
> 2. It could be quite tricky to include both lazyfree anon folios and
> reclaimed file folios (which are reclaimed in normal lists but not in
> the 'easily-reclaimable' list) in the same LRU list. I’d prefer to
> start by replacing Samsung's rbincache to reduce file folio I/O if we
> decide to implement the pool.
> 3. I believe we should first focus on landing this fix patch for the
> madv_free performance issue.
>
> What are your thoughts? I spoke with Yu, and he would like to hear
> your opinion.
Sure, I don't want to block any improvement but please think one more
one more about my concern and just go with your ideas if everyone
except me don't concern it.
Thank you.
Powered by blists - more mailing lists