lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJD7tkZzFRm4qBGJi+EBNqG-3btS9azJaCxdqsQvTnEaPhZzBQ@mail.gmail.com>
Date: Wed, 30 Oct 2024 14:31:51 -0700
From: Yosry Ahmed <yosryahmed@...gle.com>
To: Barry Song <21cnbao@...il.com>
Cc: Usama Arif <usamaarif642@...il.com>, akpm@...ux-foundation.org, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org, Barry Song <v-songbaohua@...o.com>, 
	Kanchana P Sridhar <kanchana.p.sridhar@...el.com>, David Hildenbrand <david@...hat.com>, 
	Baolin Wang <baolin.wang@...ux.alibaba.com>, Chris Li <chrisl@...nel.org>, 
	"Huang, Ying" <ying.huang@...el.com>, Kairui Song <kasong@...cent.com>, 
	Ryan Roberts <ryan.roberts@....com>, Johannes Weiner <hannes@...xchg.org>, 
	Michal Hocko <mhocko@...nel.org>, Roman Gushchin <roman.gushchin@...ux.dev>, 
	Shakeel Butt <shakeel.butt@...ux.dev>, Muchun Song <muchun.song@...ux.dev>
Subject: Re: [PATCH RFC] mm: mitigate large folios usage and swap thrashing
 for nearly full memcg

On Wed, Oct 30, 2024 at 2:21 PM Barry Song <21cnbao@...il.com> wrote:
>
> On Thu, Oct 31, 2024 at 10:10 AM Yosry Ahmed <yosryahmed@...gle.com> wrote:
> >
> > [..]
> > > >>> A crucial component is still missing—managing the compression and decompression
> > > >>> of multiple pages as a larger block. This could significantly reduce
> > > >>> system time and
> > > >>> potentially resolve the kernel build issue within a small memory
> > > >>> cgroup, even with
> > > >>> swap thrashing.
> > > >>>
> > > >>> I’ll send an update ASAP so you can rebase for zswap.
> > > >>
> > > >> Did you mean https://lore.kernel.org/all/20241021232852.4061-1-21cnbao@gmail.com/?
> > > >> Thats wont benefit zswap, right?
> > > >
> > > > That's right. I assume we can also make it work with zswap?
> > >
> > > Hopefully yes. Thats mainly why I was looking at that series, to try and find
> > > a way to do something similar for zswap.
> >
> > I would prefer for these things to be done separately. We still need
> > to evaluate the compression/decompression of large blocks. I am mainly
> > concerned about having to decompress a large chunk to fault in one
> > page.
> >
> > The obvious problems are fault latency, and wasted work having to
> > consistently decompress the large chunk to take one page from it. We
> > also need to decide if we'd rather split it after decompression and
> > compress the parts that we didn't swap in separately.
> >
> > This can cause problems beyond the fault latency. Imagine the case
> > where the system is under memory pressure, so we fallback to order-0
> > swapin to avoid reclaim. Now we want to decompress a chunk that used
> > to be 64K.
>
> Yes, this could be an issue.
>
> We had actually tried to utilize several buffers for those partial
> swap-in cases,
> where the decompressed data was held in anticipation of the upcoming
> swap-in. This approach could address the majority of partial swap-ins for
> fallback scenarios.
>
> >
> > We need to allocate 64K of contiguous memory for a temporary
> > allocation to be able to fault a 4K page. Now we either need to:
> > - Go into reclaim, which we were trying to avoid to begin with.
> > - Dip into reserves to allocate the 64K as it's a temporary
> > allocation. This is probably risky because under memory pressure, many
> > CPUs may be doing this concurrently.
>
> This has been addressed by using contiguous memory that is prepared on
> a per-CPU basis., search the below:
> "alloc_pages() might fail, so we don't depend on allocation:"
> https://lore.kernel.org/all/20241021232852.4061-1-21cnbao@gmail.com/

Thanks. I think this is reasonable but it makes it difficult to
increase the size of the chunk.

I would still prefer for both series to remain separate. If we want to
wait for the large folio zswap loads until your series goes in to
offset the thrashing that's fine, but I really think we should try to
address the thrashing on its own.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ