linux-kernel - Re: [PATCH RFC] mm: mitigate large folios usage and swap thrashing for nearly full memcg

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJD7tkbO0SVUfhHQ46rONy45e8FmoWESegtTLz561aPy2N-Uhw@mail.gmail.com>
Date: Wed, 30 Oct 2024 14:10:17 -0700
From: Yosry Ahmed <yosryahmed@...gle.com>
To: Usama Arif <usamaarif642@...il.com>
Cc: Barry Song <21cnbao@...il.com>, akpm@...ux-foundation.org, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org, Barry Song <v-songbaohua@...o.com>, 
	Kanchana P Sridhar <kanchana.p.sridhar@...el.com>, David Hildenbrand <david@...hat.com>, 
	Baolin Wang <baolin.wang@...ux.alibaba.com>, Chris Li <chrisl@...nel.org>, 
	"Huang, Ying" <ying.huang@...el.com>, Kairui Song <kasong@...cent.com>, 
	Ryan Roberts <ryan.roberts@....com>, Johannes Weiner <hannes@...xchg.org>, 
	Michal Hocko <mhocko@...nel.org>, Roman Gushchin <roman.gushchin@...ux.dev>, 
	Shakeel Butt <shakeel.butt@...ux.dev>, Muchun Song <muchun.song@...ux.dev>
Subject: Re: [PATCH RFC] mm: mitigate large folios usage and swap thrashing
 for nearly full memcg

[..]
> >>> A crucial component is still missing—managing the compression and decompression
> >>> of multiple pages as a larger block. This could significantly reduce
> >>> system time and
> >>> potentially resolve the kernel build issue within a small memory
> >>> cgroup, even with
> >>> swap thrashing.
> >>>
> >>> I’ll send an update ASAP so you can rebase for zswap.
> >>
> >> Did you mean https://lore.kernel.org/all/20241021232852.4061-1-21cnbao@gmail.com/?
> >> Thats wont benefit zswap, right?
> >
> > That's right. I assume we can also make it work with zswap?
>
> Hopefully yes. Thats mainly why I was looking at that series, to try and find
> a way to do something similar for zswap.

I would prefer for these things to be done separately. We still need
to evaluate the compression/decompression of large blocks. I am mainly
concerned about having to decompress a large chunk to fault in one
page.

The obvious problems are fault latency, and wasted work having to
consistently decompress the large chunk to take one page from it. We
also need to decide if we'd rather split it after decompression and
compress the parts that we didn't swap in separately.

This can cause problems beyond the fault latency. Imagine the case
where the system is under memory pressure, so we fallback to order-0
swapin to avoid reclaim. Now we want to decompress a chunk that used
to be 64K.

We need to allocate 64K of contiguous memory for a temporary
allocation to be able to fault a 4K page. Now we either need to:
- Go into reclaim, which we were trying to avoid to begin with.
- Dip into reserves to allocate the 64K as it's a temporary
allocation. This is probably risky because under memory pressure, many
CPUs may be doing this concurrently.