[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231222154037.62823-1-henry.hj@antgroup.com>
Date: Fri, 22 Dec 2023 23:40:33 +0800
From: "Henry Huang" <henry.hj@...group.com>
To: rientjes@...gle.com
Cc: <akpm@...ux-foundation.org>,
"Henry Huang" <henry.hj@...group.com>,
"谈鉴锋" <henry.tjf@...group.com>,
<linux-kernel@...r.kernel.org>,
<linux-mm@...ck.org>,
"朱辉(茶水)" <teawater@...group.com>,
<yuanchu@...gle.com>,
<yuzhao@...gle.com>
Subject: Re: [RFC v2] mm: Multi-Gen LRU: fix use mm/page_idle/bitmap
Thanks for replying.
On Fri, Dec 22, 2023 at 13:14 PM David Rientjes wrote:
> - is the lack of predeterministic charging a problem for you? Are you
> initially faulting it in a manner that charges it to the "right" memcg
> and the refault of it after periodic reclaim can causing the charge to
> appear "randomly," i.e. to whichever process happened to access it
> next?
Actually at begin, all pages got charged to cgroup A, but with memory pressure
or after proactive reclaim. Some pages would be dropped or swapped.
Task in cgroup B visit this shared memory before task in cgroup A,
would make these pages charged to cgroup B.
This is common in our enviorment.
> - are pages ever shared between different memcg hierarchies? You
> mentioned sharing between processes in A and A/B, but I'm wondering
> if there is sharing between two different memcg hierarchies where root
> is the only common ancestor?
Yes, there is a another really common case:
If docker graph driver is overlayfs, different docker containers use the
same image, or share same low layers, would share file cache of public bin or
lib(i.e libc.so).
> - do you anticipate a shorter scan period at some point? Proactively
> reclaiming all memory colder than one hour is a long time :) Are you
> concerned at all about the cost of doing your current idle bit
> harvesting approach becoming too expensive if you significantly reduce
> the scan period?
We don't want the owner of the application to feel a significant
performance downgrade when using swap. There is a high risk to reclaim pages
which idle age are less than 1 hour. We have internal test and
data analysis to support it.
We disabled global swappiness and memcg swapinness.
Only proactive reclaim can swap anon pages.
What's more, we see that mglru has a more efficient way to scan pte access bit.
We perferred to use mglru scan help us scan and select idle pages.
> - is proactive reclaim being driven by writing to memory.reclaim, by
> enforcing a smaller memory.high, or something else?
Because all pages info and idle age are stored in userspace, kernel can't get
these information directly. We have a private patch include a new reclaim interface
to support reclaim pages with specific pfns.
--
2.43.0
Powered by blists - more mailing lists