linux-kernel - RE: [PATCH v8 6/8] mm: zswap: Support large folios in zswap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <SJ0PR11MB567870784D380DE5EDB29AEBC9762@SJ0PR11MB5678.namprd11.prod.outlook.com>
Date: Mon, 30 Sep 2024 17:55:44 +0000
From: "Sridhar, Kanchana P" <kanchana.p.sridhar@...el.com>
To: Yosry Ahmed <yosryahmed@...gle.com>, Johannes Weiner <hannes@...xchg.org>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>, "nphamcs@...il.com"
	<nphamcs@...il.com>, "chengming.zhou@...ux.dev" <chengming.zhou@...ux.dev>,
	"usamaarif642@...il.com" <usamaarif642@...il.com>, "shakeel.butt@...ux.dev"
	<shakeel.butt@...ux.dev>, "ryan.roberts@....com" <ryan.roberts@....com>,
	"Huang, Ying" <ying.huang@...el.com>, "21cnbao@...il.com"
	<21cnbao@...il.com>, "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	"Zou, Nanhai" <nanhai.zou@...el.com>, "Feghali, Wajdi K"
	<wajdi.k.feghali@...el.com>, "Gopal, Vinodh" <vinodh.gopal@...el.com>,
	"Sridhar, Kanchana P" <kanchana.p.sridhar@...el.com>
Subject: RE: [PATCH v8 6/8] mm: zswap: Support large folios in zswap_store().

> -----Original Message-----
> From: Sridhar, Kanchana P <kanchana.p.sridhar@...el.com>
> Sent: Sunday, September 29, 2024 2:15 PM
> To: Yosry Ahmed <yosryahmed@...gle.com>; Johannes Weiner
> <hannes@...xchg.org>
> Cc: linux-kernel@...r.kernel.org; linux-mm@...ck.org;
> nphamcs@...il.com; chengming.zhou@...ux.dev;
> usamaarif642@...il.com; shakeel.butt@...ux.dev; ryan.roberts@....com;
> Huang, Ying <ying.huang@...el.com>; 21cnbao@...il.com; akpm@...ux-
> foundation.org; Zou, Nanhai <nanhai.zou@...el.com>; Feghali, Wajdi K
> <wajdi.k.feghali@...el.com>; Gopal, Vinodh <vinodh.gopal@...el.com>;
> Sridhar, Kanchana P <kanchana.p.sridhar@...el.com>
> Subject: RE: [PATCH v8 6/8] mm: zswap: Support large folios in zswap_store().
> 
> > -----Original Message-----
> > From: Yosry Ahmed <yosryahmed@...gle.com>
> > Sent: Saturday, September 28, 2024 11:11 AM
> > To: Johannes Weiner <hannes@...xchg.org>
> > Cc: Sridhar, Kanchana P <kanchana.p.sridhar@...el.com>; linux-
> > kernel@...r.kernel.org; linux-mm@...ck.org; nphamcs@...il.com;
> > chengming.zhou@...ux.dev; usamaarif642@...il.com;
> > shakeel.butt@...ux.dev; ryan.roberts@....com; Huang, Ying
> > <ying.huang@...el.com>; 21cnbao@...il.com; akpm@...ux-
> foundation.org;
> > Zou, Nanhai <nanhai.zou@...el.com>; Feghali, Wajdi K
> > <wajdi.k.feghali@...el.com>; Gopal, Vinodh <vinodh.gopal@...el.com>
> > Subject: Re: [PATCH v8 6/8] mm: zswap: Support large folios in
> zswap_store().
> >
> > On Sat, Sep 28, 2024 at 7:15 AM Johannes Weiner <hannes@...xchg.org>
> > wrote:
> > >
> > > On Fri, Sep 27, 2024 at 08:42:16PM -0700, Yosry Ahmed wrote:
> > > > On Fri, Sep 27, 2024 at 7:16 PM Kanchana P Sridhar
> > > > >  {
> > > > > +       struct page *page = folio_page(folio, index);
> > > > >         swp_entry_t swp = folio->swap;
> > > > > -       pgoff_t offset = swp_offset(swp);
> > > > >         struct xarray *tree = swap_zswap_tree(swp);
> > > > > +       pgoff_t offset = swp_offset(swp) + index;
> > > > >         struct zswap_entry *entry, *old;
> > > > > -       struct obj_cgroup *objcg = NULL;
> > > > > -       struct mem_cgroup *memcg = NULL;
> > > > > -
> > > > > -       VM_WARN_ON_ONCE(!folio_test_locked(folio));
> > > > > -       VM_WARN_ON_ONCE(!folio_test_swapcache(folio));
> > > > > +       int type = swp_type(swp);
> > > >
> > > > Why do we need type? We use it when initializing entry->swpentry to
> > > > reconstruct the swp_entry_t we already have.
> > >
> > > It's not the same entry. folio->swap points to the head entry, this
> > > function has to store swap entries with the offsets of each subpage.
> >
> > Duh, yeah, thanks.
> >
> > >
> > > Given the name of this function, it might be better to actually pass a
> > > page pointer to it; do the folio_page() inside zswap_store().
> > >
> > > Then do
> > >
> > >                 entry->swpentry = page_swap_entry(page);
> > >
> > > below.
> >
> > That is indeed clearer.
> >
> > Although this will be adding yet another caller of page_swap_entry()
> > that already has the folio, yet it calls page_swap_entry() for each
> > page in the folio, which calls page_folio() inside.
> >
> > I wonder if we should add (or replace page_swap_entry()) with a
> > folio_swap_entry(folio, index) helper. This can also be done as a
> > follow up.
> 
> Thanks Johannes and Yosry for these comments. I was thinking about
> this some more. In its current form, zswap_store_page() is called in the
> context
> of the folio by passing in a [folio, index]. This implies a key assumption about
> the existing zswap_store() large folios functionality, i.e., we do the per-page
> store for the page at a "index * PAGE_SIZE" within the folio, and not for any
> arbitrary page. Further, we need the folio for folio_nid(); but this can also be
> computed from the page. Another reason why I thought the existing signature
> might be preferable is because it seems like it enables getting the entry's
> swp_entry_t with fewer computes. Could calling page_swap_entry() add
> more computes; which if it is the case, could potentially add up (say 512
> times)

I went ahead and quantified this with the v8 signature of zswap_store_page()
and the suggested changes for this function to take a page and use
page_swap_entry(). I ran usemem with 2M pmd-mappable folios enabled.
The results indicate that the page_swap_entry() implementation is slightly
better in throughput and latency:

v8:                             run1       run2       run3    average
---------------------------------------------------------------------
Total throughput (KB/s):   6,483,835  6,396,760  6,349,532  6,410,042
Average throughput (KB/s):   216,127    213,225	   211,651    213,889
elapsed time (sec):           107.75     107.06	    109.99     108.87
sys time (sec):             2,476.43   2,453.99	  2,551.52   2,513.98
---------------------------------------------------------------------


page_swap_entry():              run1       run2       run3    average
---------------------------------------------------------------------
Total throughput (KB/s):   6,462,954  6,396,134  6,418,076  6,425,721
Average throughput (KB/s):   215,431    213,204	   213,935    214,683
elapsed time (sec):           108.67     109.46	    107.91     108.29
sys time (sec):             2,473.65   2,493.33	  2,507.82   2,490.74
---------------------------------------------------------------------------

Based on this, I will go ahead and implement the change suggested
by Johannes and submit a v9.

Thanks,
Kanchana

> 
> I would appreciate your thoughts on whether these are valid considerations,
> and can proceed accordingly.
> 
> >
> > >
> > > > >         obj_cgroup_put(objcg);
> > > > > -       if (zswap_pool_reached_full)
> > > > > -               queue_work(shrink_wq, &zswap_shrink_work);
> > > > > -check_old:
> > > > > +       return false;
> > > > > +}
> > > > > +
> > > > > +bool zswap_store(struct folio *folio)
> > > > > +{
> > > > > +       long nr_pages = folio_nr_pages(folio);
> > > > > +       swp_entry_t swp = folio->swap;
> > > > > +       struct xarray *tree = swap_zswap_tree(swp);
> > > > > +       pgoff_t offset = swp_offset(swp);
> > > > > +       struct obj_cgroup *objcg = NULL;
> > > > > +       struct mem_cgroup *memcg = NULL;
> > > > > +       struct zswap_pool *pool;
> > > > > +       size_t compressed_bytes = 0;
> > > >
> > > > Why size_t? entry->length is int.
> > >
> > > In light of Willy's comment, I think size_t is a good idea.
> >
> > Agreed.
> 
> Thanks Yosry, Matthew and Johannes for the resolution on this!
> 
> Thanks,
> Kanchana