linux-kernel - Re: [RFC v2] mm: Multi-Gen LRU: fix use mm/page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20231222154037.62823-1-henry.hj@antgroup.com>
Date: Fri, 22 Dec 2023 23:40:33 +0800
From: "Henry Huang" <henry.hj@...group.com>
To: rientjes@...gle.com
Cc:  <akpm@...ux-foundation.org>,
  "Henry Huang" <henry.hj@...group.com>,
  "谈鉴锋" <henry.tjf@...group.com>,
   <linux-kernel@...r.kernel.org>,
   <linux-mm@...ck.org>,
  "朱辉(茶水)" <teawater@...group.com>,
   <yuanchu@...gle.com>,
   <yuzhao@...gle.com>
Subject: Re: [RFC v2] mm: Multi-Gen LRU: fix use mm/page_idle/bitmap

Thanks for replying.

On Fri, Dec 22, 2023 at 13:14 PM David Rientjes wrote:
> - is the lack of predeterministic charging a problem for you?  Are you
>   initially faulting it in a manner that charges it to the "right" memcg
>   and the refault of it after periodic reclaim can causing the charge to
>   appear "randomly," i.e. to whichever process happened to access it 
>   next?

Actually at begin, all pages got charged to cgroup A, but with memory pressure
or after proactive reclaim. Some pages would be dropped or swapped.
Task in cgroup B visit this shared memory before task in cgroup A,
would make these pages charged to cgroup B.

This is common in our enviorment.

> - are pages ever shared between different memcg hierarchies?  You 
>   mentioned sharing between processes in A and A/B, but I'm wondering
>   if there is sharing between two different memcg hierarchies where root
>   is the only common ancestor?

Yes, there is a another really common case:
If docker graph driver is overlayfs, different docker containers use the
same image, or share same low layers, would share file cache of public bin or
lib(i.e libc.so).

> - do you anticipate a shorter scan period at some point?  Proactively
>   reclaiming all memory colder than one hour is a long time :)  Are you
>   concerned at all about the cost of doing your current idle bit 
>   harvesting approach becoming too expensive if you significantly reduce
>   the scan period?

We don't want the owner of the application to feel a significant
performance downgrade when using swap. There is a high risk to reclaim pages
which idle age are less than 1 hour. We have internal test and
data analysis to support it.

We disabled global swappiness and memcg swapinness.
Only proactive reclaim can swap anon pages.

What's more, we see that mglru has a more efficient way to scan pte access bit.
We perferred to use mglru scan help us scan and select idle pages.

> - is proactive reclaim being driven by writing to memory.reclaim, by
>   enforcing a smaller memory.high, or something else?

Because all pages info and idle age are stored in userspace, kernel can't get
these information directly. We have a private patch include a new reclaim interface
to support reclaim pages with specific pfns.

-- 
2.43.0