linux-kernel - Re: [PATCH RFC 3/4] lockref: rework CMPXCHG

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7ff040d4a0fb1634d3dc9282da014165a347dbb2.camel@kernel.org>
Date: Sat, 03 Aug 2024 06:59:24 -0400
From: Jeff Layton <jlayton@...nel.org>
To: Mateusz Guzik <mjguzik@...il.com>
Cc: Alexander Viro <viro@...iv.linux.org.uk>, Christian Brauner
	 <brauner@...nel.org>, Jan Kara <jack@...e.cz>, Andrew Morton
	 <akpm@...ux-foundation.org>, Josef Bacik <josef@...icpanda.com>, 
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC 3/4] lockref: rework CMPXCHG_LOOP to handle
 contention better

On Sat, 2024-08-03 at 11:09 +0200, Mateusz Guzik wrote:
> On Sat, Aug 3, 2024 at 6:44 AM Mateusz Guzik <mjguzik@...il.com> wrote:
> > 
> > On Fri, Aug 02, 2024 at 05:45:04PM -0400, Jeff Layton wrote:
> > > In a later patch, we want to change the open(..., O_CREAT) codepath to
> > > avoid taking the inode->i_rwsem for write when the dentry already exists.
> > > When we tested that initially, the performance devolved significantly
> > > due to contention for the parent's d_lockref spinlock.
> > > 
> > > There are two problems with lockrefs today: First, once any concurrent
> > > task takes the spinlock, they all end up taking the spinlock, which is
> > > much more costly than a single cmpxchg operation. The second problem is
> > > that once any task fails to cmpxchg 100 times, it falls back to the
> > > spinlock. The upshot there is that even moderate contention can cause a
> > > fallback to serialized spinlocking, which worsens performance.
> > > 
> > > This patch changes CMPXCHG_LOOP in 2 ways:
> > > 
> > > First, change the loop to spin instead of falling back to a locked
> > > codepath when the spinlock is held. Once the lock is released, allow the
> > > task to continue trying its cmpxchg loop as before instead of taking the
> > > lock. Second, don't allow the cmpxchg loop to give up after 100 retries.
> > > Just continue infinitely.
> > > 
> > > This greatly reduces contention on the lockref when there are large
> > > numbers of concurrent increments and decrements occurring.
> > > 
> > 
> > This was already tried by me and it unfortunately can reduce performance.
> > 
> 
> Oh wait I misread the patch based on what I tried there. Spinning
> indefinitely waiting for the lock to be free is a no-go as it loses
> the forward progress guarantee (and it is possible to get the lock
> being continuously held). Only spinning up to an arbitrary point wins
> some in some tests and loses in others.
> 

I'm a little confused about the forward progress guarantee here. Does
that exist today at all? ISTM that falling back to spin_lock() after a
certain number of retries doesn't guarantee any forward progress. You
can still just end up spinning on the lock forever once that happens,
no?

> Either way, as described below, chances are decent that:
> 1. there is an easy way to not lockref_get/put on the parent if the
> file is already there, dodging the problem
> .. and even if that's not true
> 2. lockref can be ditched in favor of atomics. apart from some minor
> refactoring this all looks perfectly doable and I have a wip. I will
> try to find the time next week to sort it out
> 

Like I said in the earlier mail, I don't think we can stay in RCU mode
because of the audit_inode call. I'm definitely interested in your WIP
though!

> > Key problem is that in some corner cases the lock can be continuously
> > held and be queued on, making the fast path always fail and making all
> > the spins actively waste time (and notably pull on the cacheline).
> > 
> > See this for more details:
> > https://lore.kernel.org/oe-lkp/lv7ykdnn2nrci3orajf7ev64afxqdw2d65bcpu2mfaqbkvv4ke@hzxat7utjnvx/
> > 
> > However, I *suspect* in the case you are optimizing here (open + O_CREAT
> > of an existing file) lockref on the parent can be avoided altogether
> > with some hackery and that's what should be done here.
> > 
> > When it comes to lockref in vfs in general, most uses can be elided with
> > some hackery (see the above thread) which is in early WIP (the LSMs are
> > a massive headache).
> > 
> > For open calls which *do* need to take a real ref the hackery does not
> > help of course.
> > 
> > This is where I think decoupling ref from the lock is the best way
> > forward. For that to work the dentry must hang around after the last
> > unref (already done thanks to RCU and dput even explicitly handles that
> > already!) and there needs to be a way to block new refs atomically --
> > can be done with cmpxchg from a 0-ref state to a flag blocking new refs
> > coming in. I have that as a WIP as well.
> > 
> > 
> > > Signed-off-by: Jeff Layton <jlayton@...nel.org>
> > > ---
> > >  lib/lockref.c | 85 ++++++++++++++++++++++-------------------------------------
> > >  1 file changed, 32 insertions(+), 53 deletions(-)
> > > 
> > > diff --git a/lib/lockref.c b/lib/lockref.c
> > > index 2afe4c5d8919..b76941043fe9 100644
> > > --- a/lib/lockref.c
> > > +++ b/lib/lockref.c
> > > @@ -8,22 +8,25 @@
> > >   * Note that the "cmpxchg()" reloads the "old" value for the
> > >   * failure case.
> > >   */
> > > -#define CMPXCHG_LOOP(CODE, SUCCESS) do {                                     \
> > > -     int retry = 100;                                                        \
> > > -     struct lockref old;                                                     \
> > > -     BUILD_BUG_ON(sizeof(old) != 8);                                         \
> > > -     old.lock_count = READ_ONCE(lockref->lock_count);                        \
> > > -     while (likely(arch_spin_value_unlocked(old.lock.rlock.raw_lock))) {     \
> > > -             struct lockref new = old;                                       \
> > > -             CODE                                                            \
> > > -             if (likely(try_cmpxchg64_relaxed(&lockref->lock_count,          \
> > > -                                              &old.lock_count,               \
> > > -                                              new.lock_count))) {            \
> > > -                     SUCCESS;                                                \
> > > -             }                                                               \
> > > -             if (!--retry)                                                   \
> > > -                     break;                                                  \
> > > -     }                                                                       \
> > > +#define CMPXCHG_LOOP(CODE, SUCCESS) do {                                             \
> > > +     struct lockref old;                                                             \
> > > +     BUILD_BUG_ON(sizeof(old) != 8);                                                 \
> > > +     old.lock_count = READ_ONCE(lockref->lock_count);                                \
> > > +     for (;;) {                                                                      \
> > > +             struct lockref new = old;                                               \
> > > +                                                                                     \
> > > +             if (likely(arch_spin_value_unlocked(old.lock.rlock.raw_lock))) {        \
> > > +                     CODE                                                            \
> > > +                     if (likely(try_cmpxchg64_relaxed(&lockref->lock_count,          \
> > > +                                                      &old.lock_count,               \
> > > +                                                      new.lock_count))) {            \
> > > +                             SUCCESS;                                                \
> > > +                     }                                                               \
> > > +             } else {                                                                \
> > > +                     cpu_relax();                                                    \
> > > +                     old.lock_count = READ_ONCE(lockref->lock_count);                \
> > > +             }                                                                       \
> > > +     }                                                                               \
> > >  } while (0)
> > > 
> > >  #else
> > > @@ -46,10 +49,8 @@ void lockref_get(struct lockref *lockref)
> > >       ,
> > >               return;
> > >       );
> > > -
> > > -     spin_lock(&lockref->lock);
> > > -     lockref->count++;
> > > -     spin_unlock(&lockref->lock);
> > > +     /* should never get here */
> > > +     WARN_ON_ONCE(1);
> > >  }
> > >  EXPORT_SYMBOL(lockref_get);
> > > 
> > > @@ -60,8 +61,6 @@ EXPORT_SYMBOL(lockref_get);
> > >   */
> > >  int lockref_get_not_zero(struct lockref *lockref)
> > >  {
> > > -     int retval;
> > > -
> > >       CMPXCHG_LOOP(
> > >               new.count++;
> > >               if (old.count <= 0)
> > > @@ -69,15 +68,9 @@ int lockref_get_not_zero(struct lockref *lockref)
> > >       ,
> > >               return 1;
> > >       );
> > > -
> > > -     spin_lock(&lockref->lock);
> > > -     retval = 0;
> > > -     if (lockref->count > 0) {
> > > -             lockref->count++;
> > > -             retval = 1;
> > > -     }
> > > -     spin_unlock(&lockref->lock);
> > > -     return retval;
> > > +     /* should never get here */
> > > +     WARN_ON_ONCE(1);
> > > +     return -1;
> > >  }
> > >  EXPORT_SYMBOL(lockref_get_not_zero);
> > > 
> > > @@ -88,8 +81,6 @@ EXPORT_SYMBOL(lockref_get_not_zero);
> > >   */
> > >  int lockref_put_not_zero(struct lockref *lockref)
> > >  {
> > > -     int retval;
> > > -
> > >       CMPXCHG_LOOP(
> > >               new.count--;
> > >               if (old.count <= 1)
> > > @@ -97,15 +88,9 @@ int lockref_put_not_zero(struct lockref *lockref)
> > >       ,
> > >               return 1;
> > >       );
> > > -
> > > -     spin_lock(&lockref->lock);
> > > -     retval = 0;
> > > -     if (lockref->count > 1) {
> > > -             lockref->count--;
> > > -             retval = 1;
> > > -     }
> > > -     spin_unlock(&lockref->lock);
> > > -     return retval;
> > > +     /* should never get here */
> > > +     WARN_ON_ONCE(1);
> > > +     return -1;
> > >  }
> > >  EXPORT_SYMBOL(lockref_put_not_zero);
> > > 
> > > @@ -125,6 +110,8 @@ int lockref_put_return(struct lockref *lockref)
> > >       ,
> > >               return new.count;
> > >       );
> > > +     /* should never get here */
> > > +     WARN_ON_ONCE(1);
> > >       return -1;
> > >  }
> > >  EXPORT_SYMBOL(lockref_put_return);
> > > @@ -171,8 +158,6 @@ EXPORT_SYMBOL(lockref_mark_dead);
> > >   */
> > >  int lockref_get_not_dead(struct lockref *lockref)
> > >  {
> > > -     int retval;
> > > -
> > >       CMPXCHG_LOOP(
> > >               new.count++;
> > >               if (old.count < 0)
> > > @@ -180,14 +165,8 @@ int lockref_get_not_dead(struct lockref *lockref)
> > >       ,
> > >               return 1;
> > >       );
> > > -
> > > -     spin_lock(&lockref->lock);
> > > -     retval = 0;
> > > -     if (lockref->count >= 0) {
> > > -             lockref->count++;
> > > -             retval = 1;
> > > -     }
> > > -     spin_unlock(&lockref->lock);
> > > -     return retval;
> > > +     /* should never get here */
> > > +     WARN_ON_ONCE(1);
> > > +     return -1;
> > >  }
> > >  EXPORT_SYMBOL(lockref_get_not_dead);
> > > 
> > > --
> > > 2.45.2
> > > 
> 
> 
> 

-- 
Jeff Layton <jlayton@...nel.org>