[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <SJ0PR11MB56785027ED6FCF673A84CEE6C96A2@SJ0PR11MB5678.namprd11.prod.outlook.com>
Date: Thu, 26 Sep 2024 16:40:06 +0000
From: "Sridhar, Kanchana P" <kanchana.p.sridhar@...el.com>
To: Yosry Ahmed <yosryahmed@...gle.com>
CC: Johannes Weiner <hannes@...xchg.org>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "linux-mm@...ck.org" <linux-mm@...ck.org>,
"nphamcs@...il.com" <nphamcs@...il.com>, "chengming.zhou@...ux.dev"
<chengming.zhou@...ux.dev>, "usamaarif642@...il.com"
<usamaarif642@...il.com>, "shakeel.butt@...ux.dev" <shakeel.butt@...ux.dev>,
"ryan.roberts@....com" <ryan.roberts@....com>, "Huang, Ying"
<ying.huang@...el.com>, "21cnbao@...il.com" <21cnbao@...il.com>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>, "Zou, Nanhai"
<nanhai.zou@...el.com>, "Feghali, Wajdi K" <wajdi.k.feghali@...el.com>,
"Gopal, Vinodh" <vinodh.gopal@...el.com>, "Sridhar, Kanchana P"
<kanchana.p.sridhar@...el.com>
Subject: RE: [PATCH v7 6/8] mm: zswap: Support mTHP swapout in zswap_store().
> -----Original Message-----
> From: Yosry Ahmed <yosryahmed@...gle.com>
> Sent: Wednesday, September 25, 2024 9:52 PM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@...el.com>
> Cc: Johannes Weiner <hannes@...xchg.org>; linux-kernel@...r.kernel.org;
> linux-mm@...ck.org; nphamcs@...il.com; chengming.zhou@...ux.dev;
> usamaarif642@...il.com; shakeel.butt@...ux.dev; ryan.roberts@....com;
> Huang, Ying <ying.huang@...el.com>; 21cnbao@...il.com; akpm@...ux-
> foundation.org; Zou, Nanhai <nanhai.zou@...el.com>; Feghali, Wajdi K
> <wajdi.k.feghali@...el.com>; Gopal, Vinodh <vinodh.gopal@...el.com>
> Subject: Re: [PATCH v7 6/8] mm: zswap: Support mTHP swapout in
> zswap_store().
>
> [..]
> >
> > One thing I realized while reworking the patches for the batched checks is:
> > within zswap_store_page(), we set the entry->objcg and entry->pool before
> > adding it to the xarray. Given this, wouldn't it be safer to get the objcg
> > and pool reference per sub-page, locally in zswap_store_page(), rather than
> > obtaining batched references at the end if the store is successful? If we
> want
> > zswap_store_page() to be self-contained and correct as far as the entry
> > being created and added to the xarray, it seems like the right thing to do?
> > I am a bit apprehensive about the entry being added to the xarray without
> > a reference obtained on the objcg and pool, because any page-
> faults/writeback
> > that occur on sub-pages added to the xarray before the entire folio has been
> > stored, would run into issues.
>
> We definitely should not obtain references to the pool and objcg after
> initializing the entries with them. We can obtain all references in
> zswap_store() before zswap_store_page(). IOW, the batching in this
> case should be done before the per-page operations, not after.
Thanks Yosry. IIUC, we should obtain all references to the objcg and to the
zswap_pool at the start of zswap_store.
In the case of error on any sub-page, we will unwind state for potentially
only the stored pages or the entire folio if it happened to already be in zswap
and is being re-written. We might need some additional book-keeping to
keep track of which sub-pages were found in the xarray and zswap_entry_free()
got called (nr_sb). Assuming I define a new "obj_cgroup_put_many()", I would need
to call this with (folio_nr_pages() - nr_sb).
As far as zswap_pool_get(), there is some added complexity if we want to
keep the existing implementation that calls "percpu_ref_tryget()", and assuming
this is extended to provide a new "zswap_pool_get_many()" that calls
"percpu_ref_tryget_many()". Is there a reason we use percpu_ref_tryget() instead
of percpu_ref_get()? Reason I ask is, with tryget(), if for some reason the pool->ref
is 0, no further increments will be made. If so, upon unwinding state in
zswap_store(), I would need to special-case to catch this before calling a new
"zswap_pool_put_many()".
Things could be a little simpler if zswap_pool_get() can use "percpu_ref_get()"
which will always increment the refcount. Since the zswap pool->ref is initialized
to "1", this seems Ok, but I don't know if there will be unintended consequences.
Can you please advise on what is the simplest/cleanest approach:
1) Proceed with the above changes without changing percpu_ref_tryget in
zswap_pool_get. Needs special-casing in zswap_store to detect pool->ref
being "0" before calling zswap_pool_put[_many].
2) Modify zswap_pool_get/zswap_pool_get_many to use percpu_ref_get_many
and avoid special-casing to detect pool->ref being "0" before calling
zswap_pool_put[_many].
3) Keep the approach in v7 where obj_cgroup_get/put is localized to
zswap_store_page for both success and error conditions, and any unwinding
state in zswap_store will take care of dropping references obtained from
prior successful writes (from this or prior invocations of zswap_store).
Thanks,
Kanchana
>
> >
> > Just wanted to run this by you. The rest of the batched charging, atomic
> > and stat updates should be Ok.
> >
> > Thanks,
> > Kanchana
> >
> > >
> > > Thanks,
> > > Kanchana
Powered by blists - more mailing lists