lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <SJ0PR11MB5678AEDB9E47BB6267D5885CC9962@SJ0PR11MB5678.namprd11.prod.outlook.com>
Date: Thu, 29 Aug 2024 19:38:05 +0000
From: "Sridhar, Kanchana P" <kanchana.p.sridhar@...el.com>
To: Nhat Pham <nphamcs@...il.com>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>, "hannes@...xchg.org"
	<hannes@...xchg.org>, "yosryahmed@...gle.com" <yosryahmed@...gle.com>,
	"ryan.roberts@....com" <ryan.roberts@....com>, "Huang, Ying"
	<ying.huang@...el.com>, "21cnbao@...il.com" <21cnbao@...il.com>,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>, "Zou, Nanhai"
	<nanhai.zou@...el.com>, "Feghali, Wajdi K" <wajdi.k.feghali@...el.com>,
	"Gopal, Vinodh" <vinodh.gopal@...el.com>, Usama Arif
	<usamaarif642@...il.com>, Chengming Zhou <chengming.zhou@...ux.dev>,
	"Sridhar, Kanchana P" <kanchana.p.sridhar@...el.com>
Subject: RE: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios

Hi Nhat,

> -----Original Message-----
> From: Nhat Pham <nphamcs@...il.com>
> Sent: Thursday, August 29, 2024 10:11 AM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@...el.com>
> Cc: linux-kernel@...r.kernel.org; linux-mm@...ck.org;
> hannes@...xchg.org; yosryahmed@...gle.com; ryan.roberts@....com;
> Huang, Ying <ying.huang@...el.com>; 21cnbao@...il.com; akpm@...ux-
> foundation.org; Zou, Nanhai <nanhai.zou@...el.com>; Feghali, Wajdi K
> <wajdi.k.feghali@...el.com>; Gopal, Vinodh <vinodh.gopal@...el.com>;
> Usama Arif <usamaarif642@...il.com>; Chengming Zhou
> <chengming.zhou@...ux.dev>
> Subject: Re: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios
> 
> On Wed, Aug 28, 2024 at 5:06 PM Sridhar, Kanchana P
> <kanchana.p.sridhar@...el.com> wrote:
> >
> >
> > > -----Original Message-----
> > > From: Nhat Pham <nphamcs@...il.com>
> > > Sent: Wednesday, August 28, 2024 2:35 PM
> > > To: Sridhar, Kanchana P <kanchana.p.sridhar@...el.com>
> > > Cc: linux-kernel@...r.kernel.org; linux-mm@...ck.org;
> > > hannes@...xchg.org; yosryahmed@...gle.com;
> ryan.roberts@....com;
> > > Huang, Ying <ying.huang@...el.com>; 21cnbao@...il.com; akpm@...ux-
> > > foundation.org; Zou, Nanhai <nanhai.zou@...el.com>; Feghali, Wajdi K
> > > <wajdi.k.feghali@...el.com>; Gopal, Vinodh <vinodh.gopal@...el.com>
> > > Subject: Re: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios
> > >
> > > On Wed, Aug 28, 2024 at 2:35 AM Kanchana P Sridhar
> > > <kanchana.p.sridhar@...el.com> wrote:
> > > >
> > > > Hi All,
> > > >
> > > > This patch-series enables zswap_store() to accept and store mTHP
> > > > folios. The most significant contribution in this series is from the
> > > > earlier RFC submitted by Ryan Roberts [1]. Ryan's original RFC has been
> > > > migrated to v6.11-rc3 in patch 2/4 of this series.
> > > >
> > > > [1]: [RFC PATCH v1] mm: zswap: Store large folios without splitting
> > > >      https://lore.kernel.org/linux-mm/20231019110543.3284654-1-
> > > ryan.roberts@....com/T/#u
> > > >
> > > > Additionally, there is an attempt to modularize some of the functionality
> > > > in zswap_store(), to make it more amenable to supporting any-order
> > > > mTHPs. For instance, the function zswap_store_entry() stores a
> > > zswap_entry
> > > > in the xarray. Likewise, zswap_delete_stored_offsets() can be used to
> > > > delete all offsets corresponding to a higher order folio stored in zswap.
> > > >
> > >
> > > Will this have any conflict with mTHP swap work? Especially with mTHP
> > > swap-in and zswap writeback.
> > >
> > > My understanding is from zswap's perspective, the large folio is
> > > broken apart into independent subpages, correct? What happens when
> we
> > > have partially written back mTHP (i.e some subpages are in zswap
> > > still, whereas others are written back to swap). Would this
> > > automatically prevent mTHP swapin?
> >
> > That is a good point. To begin with, this patch-series would make the default
> > behavior for mTHP swapout/storage and swapin for ZSWAP to be on par
> with
> > ZRAM. From zswap's perspective, imo this is a significant step forward
> towards
> > realizing cold memory storage with mTHP folios. However, it is only a
> starting
> > point that makes the behavior uniform across zswap/zram. Initially,
> workloads
> > would see a one-time benefit with reclaim being able to swapout mTHP
> > folios without splitting, to zswap. If the mTHPs were cold memory, then we
> > would have derived latency gains towards memory savings (with zswap).
> >
> > However, if the mTHP were part of "not so cold" memory, this would result
> > in a one-way mTHP conversion to 4K folios. Depending on workloads and
> their
> > access patterns, we could either see individual 4K folios being swapped in,
> > or entire chunks if not the entire (original) mTHP needing to be swapped in.
> >
> > It should be noted that this is more of a performance vs. cold memory
> > preservation trade-off that needs to drive mTHP reclaim, storage, swapin
> and
> > writeback policy. Different workloads could require different policies.
> However,
> > even though this patch is only a starting point, it is still functionally correct
> > by being equivalent to zram-mTHP, and compatible with the rest of mm and
> > swap as far as mTHP. Another important functionality/data consistency
> decision
> > I made in this patch series is error handling during zswap_store() of mTHP:
> > in case of any errors, all swap offsets for the mTHP are deleted from the
> > zswap xarray/zpool, since we know that the mTHP will now have to be
> stored
> > in the backing swap device. IOW, an mTHP is either entirely stored in zswap,
> > or entirely not stored in zswap.
> >
> > To answer your question, we would need to come up with what the
> semantics
> > would need to be for zswap zpool storage granularity, swapin granularity,
> > readahead granularity and writeback wrt mTHP and how the overall swap
> > sub-system needs to "preserve" mTHP vs. splitting mTHP into 4K/lower-
> order
> > folios during swapout. Once we have a good understanding of these policies,
> > we could implement them in zswap. Alternately, develop an abstraction that
> is
> > one level above zswap/zram and makes things easier and shareable
> between
> > zswap and zram. By this, I mean fundamental assumptions such as
> consecutive
> > swap offsets (for instance). To some extent, this implies that an mTHP as a
> > swap entity is defined by consecutiveness of swap offsets. Maybe the policy
> > to keep mTHPs in the system over extended duration might be to assemble
> > them dynamically based on swapin_readahead() decisions (which is based
> on
> > workload access patterns). In other words, mTHPs could be a useful
> abstraction
> > that can be static or even dynamic based on working set characteristics, and
> > cold memory preservation. This is quite a complex topic imho.
> >
> > As we know, Barry Song and Chuanhua Han have started the discussion on
> > this in their zram mTHP swapin series [1].
> 
> Yeah I'm a bit more concerned with the correctness aspect. As long as
> it's not buggy, then we can implement mTHP zswapout first, and force
> individual subpage (z)swapin for now (since we cannot control
> writeback from writing individual subpages).

Absolutely, this sounds like the way to go!

> 
> We can discuss strategy to harmonize mTHP, zswap (with writeback) as
> we go along.

Sounds great :)

> 
> BTW, I think we're not cc-ing Chengming? Is the get_maintainers script
> not working properly... Let me manually add him in - please include
> him in future submission and responses, as he is also a zswap reviewer
> :)

I think when I ran get_maintainers.pl, I was in v6.10. For sure, will include
Chengming in future submissions and responses :)

> 
> Also cc-ing Usama who is interested in this work.

Sounds great.

Thanks,
Kanchana

> 
> >
> > [1] https://lore.kernel.org/all/20240821074541.516249-3-
> hanchuanhua@...o.com/T/#u
> >
> > Thanks,
> > Kanchana

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ