lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+CLi1jZAqfjvDiWcQKJ_R02110Zyk=t2nyov2BCZnVm0B3muQ@mail.gmail.com>
Date:   Thu, 22 Jun 2023 08:32:46 +0200
From:   Domenico Cerasuolo <cerasuolodomenico@...il.com>
To:     Yosry Ahmed <yosryahmed@...gle.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Hyeonggon Yoo <42.hyeyoo@...il.com>,
        Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
        Seth Jennings <sjenning@...hat.com>,
        Dan Streetman <ddstreet@...e.org>,
        Vitaly Wool <vitaly.wool@...sulko.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Nhat Pham <nphamcs@...il.com>, Yu Zhao <yuzhao@...gle.com>,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm: zswap: fix double invalidate with exclusive loads

On Wed, Jun 21, 2023 at 11:23 PM Yosry Ahmed <yosryahmed@...gle.com> wrote:
>
> On Wed, Jun 21, 2023 at 12:36 PM Domenico Cerasuolo
> <cerasuolodomenico@...il.com> wrote:
> >
> > On Wed, Jun 21, 2023 at 7:26 PM Yosry Ahmed <yosryahmed@...gle.com> wrote:
> > >
> > > On Wed, Jun 21, 2023 at 3:20 AM Domenico Cerasuolo
> > > <cerasuolodomenico@...il.com> wrote:
> > > >
> > > > On Wed, Jun 21, 2023 at 11:30 AM Yosry Ahmed <yosryahmed@...gle.com> wrote:
> > > > >
> > > > > If exclusive loads are enabled for zswap, we invalidate the entry before
> > > > > returning from zswap_frontswap_load(), after dropping the local
> > > > > reference. However, the tree lock is dropped during decompression after
> > > > > the local reference is acquired, so the entry could be invalidated
> > > > > before we drop the local ref. If this happens, the entry is freed once
> > > > > we drop the local ref, and zswap_invalidate_entry() tries to invalidate
> > > > > an already freed entry.
> > > > >
> > > > > Fix this by:
> > > > > (a) Making sure zswap_invalidate_entry() is always called with a local
> > > > >     ref held, to avoid being called on a freed entry.
> > > > > (b) Making sure zswap_invalidate_entry() only drops the ref if the entry
> > > > >     was actually on the rbtree. Otherwise, another invalidation could
> > > > >     have already happened, and the initial ref is already dropped.
> > > > >
> > > > > With these changes, there is no need to check that there is no need to
> > > > > make sure the entry still exists in the tree in zswap_reclaim_entry()
> > > > > before invalidating it, as zswap_reclaim_entry() will make this check
> > > > > internally.
> > > > >
> > > > > Fixes: b9c91c43412f ("mm: zswap: support exclusive loads")
> > > > > Reported-by: Hyeonggon Yoo <42.hyeyoo@...il.com>
> > > > > Signed-off-by: Yosry Ahmed <yosryahmed@...gle.com>
> > > > > ---
> > > > >  mm/zswap.c | 21 ++++++++++++---------
> > > > >  1 file changed, 12 insertions(+), 9 deletions(-)
> > > > >
> > > > > diff --git a/mm/zswap.c b/mm/zswap.c
> > > > > index 87b204233115..62195f72bf56 100644
> > > > > --- a/mm/zswap.c
> > > > > +++ b/mm/zswap.c
> > > > > @@ -355,12 +355,14 @@ static int zswap_rb_insert(struct rb_root *root, struct zswap_entry *entry,
> > > > >         return 0;
> > > > >  }
> > > > >
> > > > > -static void zswap_rb_erase(struct rb_root *root, struct zswap_entry *entry)
> > > > > +static bool zswap_rb_erase(struct rb_root *root, struct zswap_entry *entry)
> > > > >  {
> > > > >         if (!RB_EMPTY_NODE(&entry->rbnode)) {
> > > > >                 rb_erase(&entry->rbnode, root);
> > > > >                 RB_CLEAR_NODE(&entry->rbnode);
> > > > > +               return true;
> > > > >         }
> > > > > +       return false;
> > > > >  }
> > > > >
> > > > >  /*
> > > > > @@ -599,14 +601,16 @@ static struct zswap_pool *zswap_pool_find_get(char *type, char *compressor)
> > > > >         return NULL;
> > > > >  }
> > > > >
> > > > > +/*
> > > > > + * If the entry is still valid in the tree, drop the initial ref and remove it
> > > > > + * from the tree. This function must be called with an additional ref held,
> > > > > + * otherwise it may race with another invalidation freeing the entry.
> > > > > + */
> > > >
> > > > On re-reading this comment there's one thing I'm not sure I get, do we
> > > > really need to hold an additional local ref to call this? As far as I
> > > > understood, once we check that the entry was in the tree before putting
> > > > the initial ref, there's no need for an additional local one.
> > >
> > > I believe it is, but please correct me if I am wrong. Consider the
> > > following scenario:
> > >
> > > // Initially refcount is at 1
> > >
> > > CPU#1:                                  CPU#2:
> > > spin_lock(tree_lock)
> > > zswap_entry_get() // 2 refs
> > > spin_unlock(tree_lock)
> > >                                             spin_lock(tree_lock)
> > >                                             zswap_invalidate_entry() // 1 ref
> > >                                             spin_unlock(tree_lock)
> > > zswap_entry_put() // 0 refs
> > > zswap_invalidate_entry() // problem
> > >
> > > That last zswap_invalidate_entry() call in CPU#1 is problematic. The
> > > entry would have already been freed. If we check that the entry is on
> > > the tree by checking RB_EMPTY_NODE(&entry->rbnode), then we are
> > > reading already freed and potentially re-used memory.
> > >
> > > We would need to search the tree to make sure the same entry still
> > > exists in the tree (aka what zswap_reclaim_entry() currently does).
> > > This is not ideal in the fault path to have to do the lookups twice.
> >
> > Thanks for the clarification, it is indeed needed in that case. I was just
> > wondering if the wording of the comment is exact, in that before calling
> > zswap_invalidate_entry one has to ensure that the entry has not been freed, not
> > specifically by holding an additional reference, if a lookup can serve the same
> > purpose.
>
>
> I am wondering if the scenario below is possible, in which case a
> lookup would not be enough. Let me try to clarify. Let's assume in
> zswap_reclaim_entry() we drop the local ref early (before we
> invalidate the entry), and rely on the lookup to ensure that the entry
> was not freed:
>
> - On CPU#1, in zswap_reclaim_entry() we release the lock during IO.
> Let's assume we drop the local ref here and rely on the lookup to make
> sure the zswap entry wasn't freed.
> - On CPU#2, invalidates the swap entry. The zswap entry is freed
> (returned to the slab allocator).
> - On CPU#2, we try to reclaim another page, and allocates the same
> swap slot (same type and offset).
> - On CPU#2, a zswap entry is allocated, and the slab allocator happens
> to hand us the same zswap_entry we just freed.
> - On CPU#1, after IO is done, we lookup the tree to make sure that the
> zswap entry was not freed. We find the same zswap entry (same address)
> at the same offset, so we assume it was not freed.
> - On CPU#1, we invalidate the zswap entry that was actually used by CPU#2.
>
> I am not entirely sure if this is possible, perhaps locking in the
> swap layer will prevent the swap entry reuse, but it seems like
> relying on the lookup can be fragile, and we should rely on the local
> ref instead to reliably prevent freeing/reuse of the zswap entry.
>
> Please correct me if I missed something.

I think it is, we definitely need an additional reference to pin down the entry.
Sorry if I was being pedantic, my original doubt was only about the wording of
the comment, where it says that an additional reference must be held. I was
wondering if it was strictly needed, and now I see that it is :)

>
> >
> >
> > >
> > > Also, in zswap_reclaim_entry(), would it be possible if we call
> > > zswap_invalidate_entry() after we drop the local ref that the swap
> > > entry has been reused for a different page? I didn't look closely, but
> > > if yes, then the slab allocator may have repurposed the zswap_entry
> > > and we may find the entry in the tree for the same offset, even though
> > > it is referring to a different page now. This sounds practically
> > > unlikely but perhaps theoretically possible.
> >
> > I'm not sure I understood the scenario, in zswap_reclaim_entry we keep a local
> > reference until the end in order to avoid a free.
>
>
> Right, I was just trying to reason about what might happen if we call
> zswap_invalidate_entry() after dropping the local ref, as I mentioned
> above.
>
>
> >
> >
> > >
> > > I think it's more reliable to call zswap_invalidate_entry() on an
> > > entry that we know is valid before dropping the local ref. Especially
> > > that it's easy to do today by just moving a few lines around.
> > >
> > >
> > >
> > >
> > > >
> > > > >  static void zswap_invalidate_entry(struct zswap_tree *tree,
> > > > >                                    struct zswap_entry *entry)
> > > > >  {
> > > > > -       /* remove from rbtree */
> > > > > -       zswap_rb_erase(&tree->rbroot, entry);
> > > > > -
> > > > > -       /* drop the initial reference from entry creation */
> > > > > -       zswap_entry_put(tree, entry);
> > > > > +       if (zswap_rb_erase(&tree->rbroot, entry))
> > > > > +               zswap_entry_put(tree, entry);
> > > > >  }
> > > > >
> > > > >  static int zswap_reclaim_entry(struct zswap_pool *pool)
> > > > > @@ -659,8 +663,7 @@ static int zswap_reclaim_entry(struct zswap_pool *pool)
> > > > >          * swapcache. Drop the entry from zswap - unless invalidate already
> > > > >          * took it out while we had the tree->lock released for IO.
> > > > >          */
> > > > > -       if (entry == zswap_rb_search(&tree->rbroot, swpoffset))
> > > > > -               zswap_invalidate_entry(tree, entry);
> > > > > +       zswap_invalidate_entry(tree, entry);
> > > > >
> > > > >  put_unlock:
> > > > >         /* Drop local reference */
> > > > > @@ -1466,7 +1469,6 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset,
> > > > >                 count_objcg_event(entry->objcg, ZSWPIN);
> > > > >  freeentry:
> > > > >         spin_lock(&tree->lock);
> > > > > -       zswap_entry_put(tree, entry);
> > > > >         if (!ret && zswap_exclusive_loads_enabled) {
> > > > >                 zswap_invalidate_entry(tree, entry);
> > > > >                 *exclusive = true;
> > > > > @@ -1475,6 +1477,7 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset,
> > > > >                 list_move(&entry->lru, &entry->pool->lru);
> > > > >                 spin_unlock(&entry->pool->lru_lock);
> > > > >         }
> > > > > +       zswap_entry_put(tree, entry);
> > > > >         spin_unlock(&tree->lock);
> > > > >
> > > > >         return ret;
> > > > > --
> > > > > 2.41.0.162.gfafddb0af9-goog
> > > > >

Reviewed-by: Domenico Cerasuolo <cerasuolodomenico@...il.com>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ