lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <SJ0PR11MB5678DCB6B67A454C83A1501DC96A2@SJ0PR11MB5678.namprd11.prod.outlook.com>
Date: Thu, 26 Sep 2024 19:36:18 +0000
From: "Sridhar, Kanchana P" <kanchana.p.sridhar@...el.com>
To: Yosry Ahmed <yosryahmed@...gle.com>
CC: Johannes Weiner <hannes@...xchg.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "linux-mm@...ck.org" <linux-mm@...ck.org>,
	"nphamcs@...il.com" <nphamcs@...il.com>, "chengming.zhou@...ux.dev"
	<chengming.zhou@...ux.dev>, "usamaarif642@...il.com"
	<usamaarif642@...il.com>, "shakeel.butt@...ux.dev" <shakeel.butt@...ux.dev>,
	"ryan.roberts@....com" <ryan.roberts@....com>, "Huang, Ying"
	<ying.huang@...el.com>, "21cnbao@...il.com" <21cnbao@...il.com>,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>, "Zou, Nanhai"
	<nanhai.zou@...el.com>, "Feghali, Wajdi K" <wajdi.k.feghali@...el.com>,
	"Gopal, Vinodh" <vinodh.gopal@...el.com>, "Sridhar, Kanchana P"
	<kanchana.p.sridhar@...el.com>
Subject: RE: [PATCH v7 6/8] mm: zswap: Support mTHP swapout in zswap_store().

> -----Original Message-----
> From: Yosry Ahmed <yosryahmed@...gle.com>
> Sent: Thursday, September 26, 2024 10:35 AM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@...el.com>
> Cc: Johannes Weiner <hannes@...xchg.org>; linux-kernel@...r.kernel.org;
> linux-mm@...ck.org; nphamcs@...il.com; chengming.zhou@...ux.dev;
> usamaarif642@...il.com; shakeel.butt@...ux.dev; ryan.roberts@....com;
> Huang, Ying <ying.huang@...el.com>; 21cnbao@...il.com; akpm@...ux-
> foundation.org; Zou, Nanhai <nanhai.zou@...el.com>; Feghali, Wajdi K
> <wajdi.k.feghali@...el.com>; Gopal, Vinodh <vinodh.gopal@...el.com>
> Subject: Re: [PATCH v7 6/8] mm: zswap: Support mTHP swapout in
> zswap_store().
> 
> On Thu, Sep 26, 2024 at 10:29 AM Sridhar, Kanchana P
> <kanchana.p.sridhar@...el.com> wrote:
> >
> > > -----Original Message-----
> > > From: Yosry Ahmed <yosryahmed@...gle.com>
> > > Sent: Thursday, September 26, 2024 10:20 AM
> > > To: Sridhar, Kanchana P <kanchana.p.sridhar@...el.com>
> > > Cc: Johannes Weiner <hannes@...xchg.org>; linux-
> kernel@...r.kernel.org;
> > > linux-mm@...ck.org; nphamcs@...il.com; chengming.zhou@...ux.dev;
> > > usamaarif642@...il.com; shakeel.butt@...ux.dev;
> ryan.roberts@....com;
> > > Huang, Ying <ying.huang@...el.com>; 21cnbao@...il.com; akpm@...ux-
> > > foundation.org; Zou, Nanhai <nanhai.zou@...el.com>; Feghali, Wajdi K
> > > <wajdi.k.feghali@...el.com>; Gopal, Vinodh <vinodh.gopal@...el.com>
> > > Subject: Re: [PATCH v7 6/8] mm: zswap: Support mTHP swapout in
> > > zswap_store().
> > >
> > > On Thu, Sep 26, 2024 at 9:40 AM Sridhar, Kanchana P
> > > <kanchana.p.sridhar@...el.com> wrote:
> > > >
> > > > > -----Original Message-----
> > > > > From: Yosry Ahmed <yosryahmed@...gle.com>
> > > > > Sent: Wednesday, September 25, 2024 9:52 PM
> > > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@...el.com>
> > > > > Cc: Johannes Weiner <hannes@...xchg.org>; linux-
> > > kernel@...r.kernel.org;
> > > > > linux-mm@...ck.org; nphamcs@...il.com;
> chengming.zhou@...ux.dev;
> > > > > usamaarif642@...il.com; shakeel.butt@...ux.dev;
> > > ryan.roberts@....com;
> > > > > Huang, Ying <ying.huang@...el.com>; 21cnbao@...il.com;
> akpm@...ux-
> > > > > foundation.org; Zou, Nanhai <nanhai.zou@...el.com>; Feghali, Wajdi K
> > > > > <wajdi.k.feghali@...el.com>; Gopal, Vinodh
> <vinodh.gopal@...el.com>
> > > > > Subject: Re: [PATCH v7 6/8] mm: zswap: Support mTHP swapout in
> > > > > zswap_store().
> > > > >
> > > > > [..]
> > > > > >
> > > > > > One thing I realized while reworking the patches for the batched
> checks
> > > is:
> > > > > > within zswap_store_page(), we set the entry->objcg and entry->pool
> > > before
> > > > > > adding it to the xarray. Given this, wouldn't it be safer to get the
> objcg
> > > > > > and pool reference per sub-page, locally in zswap_store_page(),
> rather
> > > than
> > > > > > obtaining batched references at the end if the store is successful? If
> we
> > > > > want
> > > > > > zswap_store_page() to be self-contained and correct as far as the
> entry
> > > > > > being created and added to the xarray, it seems like the right thing to
> > > do?
> > > > > > I am a bit apprehensive about the entry being added to the xarray
> > > without
> > > > > > a reference obtained on the objcg and pool, because any page-
> > > > > faults/writeback
> > > > > > that occur on sub-pages added to the xarray before the entire folio
> has
> > > been
> > > > > > stored, would run into issues.
> > > > >
> > > > > We definitely should not obtain references to the pool and objcg after
> > > > > initializing the entries with them. We can obtain all references in
> > > > > zswap_store() before zswap_store_page(). IOW, the batching in this
> > > > > case should be done before the per-page operations, not after.
> > > >
> > > > Thanks Yosry. IIUC, we should obtain all references to the objcg and to
> the
> > > > zswap_pool at the start of zswap_store.
> > > >
> > > > In the case of error on any sub-page, we will unwind state for potentially
> > > > only the stored pages or the entire folio if it happened to already be in
> > > zswap
> > > > and is being re-written. We might need some additional book-keeping to
> > > > keep track of which sub-pages were found in the xarray and
> > > zswap_entry_free()
> > > > got called (nr_sb). Assuming I define a new "obj_cgroup_put_many()", I
> > > would need
> > > > to call this with (folio_nr_pages() - nr_sb).
> > > >
> > > > As far as zswap_pool_get(), there is some added complexity if we want
> to
> > > > keep the existing implementation that calls "percpu_ref_tryget()", and
> > > assuming
> > > > this is extended to provide a new "zswap_pool_get_many()" that calls
> > > > "percpu_ref_tryget_many()". Is there a reason we use
> percpu_ref_tryget()
> > > instead
> > > > of percpu_ref_get()? Reason I ask is, with tryget(), if for some reason the
> > > pool->ref
> > > > is 0, no further increments will be made. If so, upon unwinding state in
> > > > zswap_store(), I would need to special-case to catch this before calling a
> > > new
> > > > "zswap_pool_put_many()".
> > > >
> > > > Things could be a little simpler if zswap_pool_get() can use
> > > "percpu_ref_get()"
> > > > which will always increment the refcount. Since the zswap pool->ref is
> > > initialized
> > > > to "1", this seems Ok, but I don't know if there will be unintended
> > > consequences.
> > > >
> > > > Can you please advise on what is the simplest/cleanest approach:
> > > >
> > > > 1) Proceed with the above changes without changing percpu_ref_tryget
> in
> > > >      zswap_pool_get. Needs special-casing in zswap_store to detect pool-
> > > >ref
> > > >     being "0" before calling zswap_pool_put[_many].
> > >
> > > My assumption is that we can reorder the code such that if
> > > zswap_pool_get_many() fails we don't call zswap_pool_put_many() to
> > > begin with (e.g. jump to a label after zswap_pool_put_many()).
> >
> > However, the pool refcount could change between the start and end of
> > zswap_store.
> 
> I am not sure what you mean. If zswap_pool_get_many() fails then we
> just do not call zswap_pool_put_many() at all and abort.

I guess I was thinking of a scenario where zswap_pool_get_many() returns
true; subsequently, the pool refcount reaches 0 before the zswap_pool_put_many().
I just realized this shouldn’t happen, so I think we are Ok. Will think about this
some more while creating the follow-up patch.

> 
> >
> > >
> > > > 2) Modify zswap_pool_get/zswap_pool_get_many to use
> > > percpu_ref_get_many
> > > >     and avoid special-casing to detect pool->ref being "0" before calling
> > > >     zswap_pool_put[_many].
> > >
> > > I don't think we can simply switch the tryget to a get, as I believe
> > > we can race with the pool being destroyed.
> >
> > That was my initial thought as well, but I figured this couldn't happen
> > since the pool->ref is initialized to "1", and based on the existing
> > implementation. In any case, I can understand the intent of the use
> > of "tryget"; it is just that it adds to the considerations for reference
> > batching.
> 
> The initial ref can be dropped in __zswap_param_set() if a new pool is
> created (see the call to ercpu_ref_kill(()).

I see.. this makes sense, thanks Yosry!

> 
> >
> > >
> > > > 3) Keep the approach in v7 where obj_cgroup_get/put is localized to
> > > >     zswap_store_page for both success and error conditions, and any
> > > unwinding
> > > >     state in zswap_store will take care of dropping references obtained
> from
> > > >     prior successful writes (from this or prior invocations of zswap_store).
> > >
> > > I am also fine with doing that and doing the reference batching as a follow
> up.
> >
> > I think so too! We could try and improve upon (3) with reference batching
> > in a follow-up patch.
> 
> SGTM.

Thanks, will proceed!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ