lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJj2-QG3jJcA=71n5imx+OjhMapPMN-1bfT5XQRjswxOPG9MvA@mail.gmail.com>
Date: Wed, 10 Jan 2024 11:24:02 -0800
From: Yuanchu Xie <yuanchu@...gle.com>
To: Henry Huang <henry.hj@...group.com>
Cc: rientjes@...gle.com, akpm@...ux-foundation.org, 
	谈鉴锋 <henry.tjf@...group.com>, 
	linux-kernel@...r.kernel.org, linux-mm@...ck.org, 
	朱辉(茶水) <teawater@...group.com>, yuzhao@...gle.com
Subject: Re: [RFC v2] mm: Multi-Gen LRU: fix use mm/page_idle/bitmap

On Fri, Dec 22, 2023 at 7:40 AM Henry Huang <henry.hj@...group.com> wrote:
> > - are pages ever shared between different memcg hierarchies?  You
> >   mentioned sharing between processes in A and A/B, but I'm wondering
> >   if there is sharing between two different memcg hierarchies where root
> >   is the only common ancestor?
>
> Yes, there is a another really common case:
> If docker graph driver is overlayfs, different docker containers use the
> same image, or share same low layers, would share file cache of public bin or
> lib(i.e libc.so).
Does this present a problem with setting memcg limits or OOMs? It
seems like deterministically charging shared pages would be highly
desirable. Mina Almasry previously proposed a memcg= mount option to
implement deterministic charging[1], but it wasn't a generic sharing
mechanism. Nonetheless, the problem remains, and it would be
interesting to learn if this presents any issues for you.

[1] https://lore.kernel.org/linux-mm/20211120045011.3074840-1-almasrymina@google.com/
>
> > - do you anticipate a shorter scan period at some point?  Proactively
> >   reclaiming all memory colder than one hour is a long time :)  Are you
> >   concerned at all about the cost of doing your current idle bit
> >   harvesting approach becoming too expensive if you significantly reduce
> >   the scan period?
>
> We don't want the owner of the application to feel a significant
> performance downgrade when using swap. There is a high risk to reclaim pages
> which idle age are less than 1 hour. We have internal test and
> data analysis to support it.
>
> We disabled global swappiness and memcg swapinness.
> Only proactive reclaim can swap anon pages.
>
> What's more, we see that mglru has a more efficient way to scan pte access bit.
> We perferred to use mglru scan help us scan and select idle pages.
I'm working on a kernel driver/per-memcg interface to perform aging
with MGLRU, including configuration for the MGLRU page scanning
optimizations. I suspect scanning the PTE accessed bits for pages
charged to a foreign memcg ad-hoc has some performance implications,
and the more general solution is to charge in a predetermined way,
which makes the scanning on behalf of the foreign memcg a bit cleaner.
This is possible nonetheless, but a bit hacky. Let me know you have
any ideas.
>
> > - is proactive reclaim being driven by writing to memory.reclaim, by
> >   enforcing a smaller memory.high, or something else?
>
> Because all pages info and idle age are stored in userspace, kernel can't get
> these information directly. We have a private patch include a new reclaim interface
> to support reclaim pages with specific pfns.
Thanks for sharing! It's been enlightening to learn about different
prod environments.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ