linux-kernel - Re: [PATCH v4 0/4] mm: ZSWAP swap-out of mTHP folios

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKEwX=P8o+hLsdQw_xKymgteLXsBPfDf4kGVKdgE=PNj=b0cMw@mail.gmail.com>
Date: Tue, 20 Aug 2024 17:13:39 -0400
From: Nhat Pham <nphamcs@...il.com>
To: "Sridhar, Kanchana P" <kanchana.p.sridhar@...el.com>
Cc: "Huang, Ying" <ying.huang@...el.com>, 
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "linux-mm@...ck.org" <linux-mm@...ck.org>, 
	"hannes@...xchg.org" <hannes@...xchg.org>, "yosryahmed@...gle.com" <yosryahmed@...gle.com>, 
	"ryan.roberts@....com" <ryan.roberts@....com>, "21cnbao@...il.com" <21cnbao@...il.com>, 
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>, "Zou, Nanhai" <nanhai.zou@...el.com>, 
	"Feghali, Wajdi K" <wajdi.k.feghali@...el.com>, "Gopal, Vinodh" <vinodh.gopal@...el.com>
Subject: Re: [PATCH v4 0/4] mm: ZSWAP swap-out of mTHP folios

On Mon, Aug 19, 2024 at 11:01 PM Sridhar, Kanchana P
<kanchana.p.sridhar@...el.com> wrote:
>
> Hi Ying,
>
> I confirmed that in the case of usemem, all calls to [1] occur from the code path in [3].
> However, my takeaway from this is that the more reclaim that results in zswap_store(),
> for e.g., from mTHP folios, there is higher likelihood of overage recorded per-process in
> current->memcg_nr_pages_over_high, which could potentially be causing each
> process to reclaim memory, even if it is possible that the swapout from a few of
> the 70 processes could have brought the parent cgroup under the limit.

Yeah IIUC, the memory increase from zswap store happens
immediately/synchronously (swap_writepage() -> zswap_store() ->
obj_cgroup_charge_zswap()), before the memory saving kicks in. This is
a non-issue for swap - the memory saving doesn't happen right away,
but it also doesn't increase memory usage (well, as you pointed out,
obj_cgroup_charge_zswap() doesn't even happen).

And yes, this is compounded a) if you're in a high concurrency regime,
where all tasks in the same cgroup, under memory pressure, all go into
reclaim. and b) for larger folios, where we compress multiple pages
before the saving happens. I wonder how bad the effect is tho - could
you quantify the reclamation amount that happens per zswap store
somehow with tracing magic?

Also, I wonder if there is a "charge delta" mechanism, where we
directly uncharge by (page size - zswap object size), to avoid the
temporary double charging... Sort of like what folio migration is
doing now v.s what it used to do. Seems complicated - not even sure if
it's possible TBH.

>
> Please do let me know if you have any other questions. Appreciate your feedback
> and comments.
>
> Thanks,
> Kanchana