lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 19 Jan 2024 13:31:46 -0800
From: Chris Li <chrisl@...nel.org>
To: Yosry Ahmed <yosryahmed@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, linux-kernel@...r.kernel.org, 
	linux-mm@...ck.org, Wei Xu <weixugc@...gle.com>, 
	Yu Zhao <yuzhao@...gle.com>, Greg Thelen <gthelen@...gle.com>, 
	Chun-Tse Shao <ctshao@...gle.com>, Suren Baghdasaryan <surenb@...gle.com>, 
	Brain Geffon <bgeffon@...gle.com>, Minchan Kim <minchan@...nel.org>, Michal Hocko <mhocko@...e.com>, 
	Mel Gorman <mgorman@...hsingularity.net>, Huang Ying <ying.huang@...el.com>, 
	Nhat Pham <nphamcs@...il.com>, Johannes Weiner <hannes@...xchg.org>, Kairui Song <kasong@...cent.com>, 
	Zhongkun He <hezhongkun.hzk@...edance.com>, Kemeng Shi <shikemeng@...weicloud.com>, 
	Barry Song <v-songbaohua@...o.com>, "Matthew Wilcox (Oracle)" <willy@...radead.org>, 
	"Liam R. Howlett" <Liam.Howlett@...cle.com>, Joel Fernandes <joel@...lfernandes.org>, 
	Chengming Zhou <zhouchengming@...edance.com>
Subject: Re: [PATCH 2/2] mm: zswap.c: remove RB tree

On Fri, Jan 19, 2024 at 11:37 AM Yosry Ahmed <yosryahmed@...gle.com> wrote:
> > > I think using the xas_* APIs can be avoided here. The only reason we
> > > need it is that we want to check if there's an existing entry first,
> > > and return -EEXIST. However, in that case, the caller will replace it
> > > anyway (and do some operations on the dupentry):
> >
> > We might be able to for the insert case if we don't mind changing the
> > code behavior a bit. My original intent is to keep close to the
> > original zswap code and not stir the pot too much for the xarray
> > replacement. We can always make more adjustment once the RB tree is
> > gone.
>
> I don't see how this changes code behavior though. The current code in
> zswap_store() will do the following:

I am referring to the log and update counter happening after the zswap
mapping was updated. Maybe nobody actually cares about that behavior
difference. In my mind, there is a difference.

>
> - Hold the tree lock to make sure no one else modifies it.
> - Try to insert, check if there is already a dupentry at the index and
> return -EEXIST.
> - Warn, increment zswap_duplicate_entry, and invalidate the dupentry.
> - Try to insert again (this should always succeed since we are holding
> the lock).
>
> What I am proposing is:
> - zswap_xa_insert() is a thin wrapper around xa_store() (or we can
> remove it completely).
> - zswap_store() does the following:
>   - Use zswap_xa_insert() and check if there is a returned dupentry.
>   - Warn, increment zswap_duplicate_entry, and invalidate the dupentry.
>
> Either way, we always place the entry we have in the tree, and if
> there is a dupentry we warn and invalidate it. If anything, the latter
> is more straightforward.
>
> Am I missing something?

No, that is what I would do if I have to use the xa_* API.


>
> > >
> > > >  }
> > > >
> > > >  static bool zswap_erase(struct zswap_tree *tree, struct zswap_entry *entry)
> > > >  {
> > > > +       struct zswap_entry *e;
> > > >         pgoff_t offset = swp_offset(entry->swpentry);
> > > > -       if (!RB_EMPTY_NODE(&entry->rbnode)) {
> > > > -               struct zswap_entry *old;
> > > > -               old = xa_erase(&tree->xarray, offset);
> > > > -               BUG_ON(old != entry);
> > > > -               rb_erase(&entry->rbnode, &tree->rbroot);
> > > > -               RB_CLEAR_NODE(&entry->rbnode);
> > > > -               return true;
> > > > -       }
> > > > -       return false;
> > > > +       XA_STATE(xas, &tree->xarray, offset);
> > > > +
> > > > +       do {
> > > > +               xas_lock_irq(&xas);
> > > > +               do {
> > > > +                       e = xas_load(&xas);
> > > > +               } while (xas_retry(&xas, e));
> > > > +               if (xas_valid(&xas) && e != entry) {
> > > > +                       xas_unlock_irq(&xas);
> > > > +                       return false;
> > > > +               }
> > > > +               xas_store(&xas, NULL);
> > > > +               xas_unlock_irq(&xas);
> > > > +       } while (xas_nomem(&xas, GFP_KERNEL));
> > > > +       return !xas_error(&xas);
> > > >  }
> > >
> > > Same here, I think we just want:
> > >
> > > return !!xa_erase(..);
> >
> > For the erase case it is tricky.
> > The current zswap code does not erase an entry if the tree entry at
> > the same offset has been changed. It should be fine if the new entry
> > is NULL. Basically some race to remove the entry already. However, if
> > the entry is not NULL, then force resetting it to NULL will change
> > behavior compared to the current.
>
> I see, very good point. I think we can use xa_cmpxchg() and pass in NULL?
>
That is certainly possible. Thanks for bringing it up.
Let me try to combine the tree->lock with xarray lock first. If
xa_cmpxchg() can simplify the result there, I will use it.


> Handling large folios in zswap is a much larger topic that involves a
> lot more than this xa_* vs. xas_* apis dispute. Let's not worry about
> this for now.

Ack. One more reason to use the XAS interface is that zswap currently
does multiple lookups on typical zswap_load(). It finds entries by
offset, for the entry (lookup one). Then after folio install to swap
cache, it deletes the entry, it will performan another lookup to
delete the entry (look up two). Using XAS might be able to cache the
node location for the second lookup to avoid the full node walk. That
is not in my current patch and can be a later improvement patch as
well.

Chris

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ