lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAF8kJuMwF=s-28OPFbCfJf-f3jsfYmyiP6pSBjj8ZgkGmbT9ZA@mail.gmail.com>
Date: Thu, 18 Jan 2024 21:43:39 -0800
From: Chris Li <chrisl@...nel.org>
To: Yosry Ahmed <yosryahmed@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, linux-kernel@...r.kernel.org, 
	linux-mm@...ck.org, Wei Xu <weixugc@...gle.com>, 
	Yu Zhao <yuzhao@...gle.com>, Greg Thelen <gthelen@...gle.com>, 
	Chun-Tse Shao <ctshao@...gle.com>, Suren Baghdasaryan <surenb@...gle.com>, 
	Brain Geffon <bgeffon@...gle.com>, Minchan Kim <minchan@...nel.org>, Michal Hocko <mhocko@...e.com>, 
	Mel Gorman <mgorman@...hsingularity.net>, Huang Ying <ying.huang@...el.com>, 
	Nhat Pham <nphamcs@...il.com>, Johannes Weiner <hannes@...xchg.org>, Kairui Song <kasong@...cent.com>, 
	Zhongkun He <hezhongkun.hzk@...edance.com>, Kemeng Shi <shikemeng@...weicloud.com>, 
	Barry Song <v-songbaohua@...o.com>, "Matthew Wilcox (Oracle)" <willy@...radead.org>, 
	"Liam R. Howlett" <Liam.Howlett@...cle.com>, Joel Fernandes <joel@...lfernandes.org>, 
	Chengming Zhou <zhouchengming@...edance.com>
Subject: Re: [PATCH 2/2] mm: zswap.c: remove RB tree

Hi Yosry,

On Wed, Jan 17, 2024 at 10:35 PM Yosry Ahmed <yosryahmed@...gle.com> wrote:
>
> > @@ -493,45 +471,47 @@ static struct zswap_entry *zswap_search(struct zswap_tree *tree, pgoff_t offset)
> >  static int zswap_insert(struct zswap_tree *tree, struct zswap_entry *entry,
> >                         struct zswap_entry **dupentry)
> >  {
> > -       struct rb_root *root = &tree->rbroot;
> > -       struct rb_node **link = &root->rb_node, *parent = NULL;
> > -       struct zswap_entry *myentry, *old;
> > -       pgoff_t myentry_offset, entry_offset = swp_offset(entry->swpentry);
> > -
> > -
> > -       while (*link) {
> > -               parent = *link;
> > -               myentry = rb_entry(parent, struct zswap_entry, rbnode);
> > -               myentry_offset = swp_offset(myentry->swpentry);
> > -               if (myentry_offset > entry_offset)
> > -                       link = &(*link)->rb_left;
> > -               else if (myentry_offset < entry_offset)
> > -                       link = &(*link)->rb_right;
> > -               else {
> > -                       old = xa_load(&tree->xarray, entry_offset);
> > -                       BUG_ON(old != myentry);
> > -                       *dupentry = myentry;
> > +       struct zswap_entry *e;
> > +       pgoff_t offset = swp_offset(entry->swpentry);
> > +       XA_STATE(xas, &tree->xarray, offset);
> > +
> > +       do {
> > +               xas_lock_irq(&xas);
> > +               do {
> > +                       e = xas_load(&xas);
> > +                       if (xa_is_zero(e))
> > +                               e = NULL;
> > +               } while (xas_retry(&xas, e));
> > +               if (xas_valid(&xas) && e) {
> > +                       xas_unlock_irq(&xas);
> > +                       *dupentry = e;
> >                         return -EEXIST;
> >                 }
> > -       }
> > -       rb_link_node(&entry->rbnode, parent, link);
> > -       rb_insert_color(&entry->rbnode, root);
> > -       old = xa_store(&tree->xarray, entry_offset, entry, GFP_KERNEL);
> > -       return 0;
> > +               xas_store(&xas, entry);
> > +               xas_unlock_irq(&xas);
> > +       } while (xas_nomem(&xas, GFP_KERNEL));
> > +       return xas_error(&xas);
>
> I think using the xas_* APIs can be avoided here. The only reason we
> need it is that we want to check if there's an existing entry first,
> and return -EEXIST. However, in that case, the caller will replace it
> anyway (and do some operations on the dupentry):

We might be able to for the insert case if we don't mind changing the
code behavior a bit. My original intent is to keep close to the
original zswap code and not stir the pot too much for the xarray
replacement. We can always make more adjustment once the RB tree is
gone.

> while (zswap_rb_insert(&tree->rbroot, entry, &dupentry) == -EEXIST) {
>         WARN_ON(1);
>         zswap_duplicate_entry++;
>         zswap_invalidate_entry(tree, dupentry);
> }
>
> So I think we can do something like this in zswap_insert() instead:
>
> dupentry = xa_store(..);
> if (WARN_ON(dupentry)) {
>         zswap_duplicate_entry++;
>         zswap_invalidate_entry(tree, dupentry);
> }
>
> WDYT?
>
> >  }
> >
> >  static bool zswap_erase(struct zswap_tree *tree, struct zswap_entry *entry)
> >  {
> > +       struct zswap_entry *e;
> >         pgoff_t offset = swp_offset(entry->swpentry);
> > -       if (!RB_EMPTY_NODE(&entry->rbnode)) {
> > -               struct zswap_entry *old;
> > -               old = xa_erase(&tree->xarray, offset);
> > -               BUG_ON(old != entry);
> > -               rb_erase(&entry->rbnode, &tree->rbroot);
> > -               RB_CLEAR_NODE(&entry->rbnode);
> > -               return true;
> > -       }
> > -       return false;
> > +       XA_STATE(xas, &tree->xarray, offset);
> > +
> > +       do {
> > +               xas_lock_irq(&xas);
> > +               do {
> > +                       e = xas_load(&xas);
> > +               } while (xas_retry(&xas, e));
> > +               if (xas_valid(&xas) && e != entry) {
> > +                       xas_unlock_irq(&xas);
> > +                       return false;
> > +               }
> > +               xas_store(&xas, NULL);
> > +               xas_unlock_irq(&xas);
> > +       } while (xas_nomem(&xas, GFP_KERNEL));
> > +       return !xas_error(&xas);
> >  }
>
> Same here, I think we just want:
>
> return !!xa_erase(..);

For the erase case it is tricky.
The current zswap code does not erase an entry if the tree entry at
the same offset has been changed. It should be fine if the new entry
is NULL. Basically some race to remove the entry already. However, if
the entry is not NULL, then force resetting it to NULL will change
behavior compared to the current.

>From reading the zswap code, I am not sure this is in theory
impossible to happen: Some one races to remove the zswap entry then
reclaim converts that page back to the zswap, with a new entry. By the
time the zswap_erase() tries to erase the current entry, the entry has
changed to a new entry. It seems the right thing to do in this
unlikely case is that we should skip the force erase and drop the
current entry, assuming someone else has already invalidated it. Don't
touch the entry in the tree, we assume it is a newer generation.

If we can reason the above is impossible to happen, then the force
erase and reset the entry to NULL should be fine(Famous last word).
Noted that this is a behavior change, I would like to seperate it out
from the drop in replacement patch(keep the original behavior)

Chris




>
> >
> >  static struct zpool *zswap_find_zpool(struct zswap_entry *entry)
> > @@ -583,7 +563,6 @@ static void zswap_entry_put(struct zswap_tree *tree,
> >
> >         WARN_ON_ONCE(refcount < 0);
> >         if (refcount == 0) {
> > -               WARN_ON_ONCE(!RB_EMPTY_NODE(&entry->rbnode));
> >                 zswap_free_entry(entry);
> >         }
>
> nit: the braces are no longer needed here
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ