[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7ff040d4a0fb1634d3dc9282da014165a347dbb2.camel@kernel.org>
Date: Sat, 03 Aug 2024 06:59:24 -0400
From: Jeff Layton <jlayton@...nel.org>
To: Mateusz Guzik <mjguzik@...il.com>
Cc: Alexander Viro <viro@...iv.linux.org.uk>, Christian Brauner
<brauner@...nel.org>, Jan Kara <jack@...e.cz>, Andrew Morton
<akpm@...ux-foundation.org>, Josef Bacik <josef@...icpanda.com>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC 3/4] lockref: rework CMPXCHG_LOOP to handle
contention better
On Sat, 2024-08-03 at 11:09 +0200, Mateusz Guzik wrote:
> On Sat, Aug 3, 2024 at 6:44 AM Mateusz Guzik <mjguzik@...il.com> wrote:
> >
> > On Fri, Aug 02, 2024 at 05:45:04PM -0400, Jeff Layton wrote:
> > > In a later patch, we want to change the open(..., O_CREAT) codepath to
> > > avoid taking the inode->i_rwsem for write when the dentry already exists.
> > > When we tested that initially, the performance devolved significantly
> > > due to contention for the parent's d_lockref spinlock.
> > >
> > > There are two problems with lockrefs today: First, once any concurrent
> > > task takes the spinlock, they all end up taking the spinlock, which is
> > > much more costly than a single cmpxchg operation. The second problem is
> > > that once any task fails to cmpxchg 100 times, it falls back to the
> > > spinlock. The upshot there is that even moderate contention can cause a
> > > fallback to serialized spinlocking, which worsens performance.
> > >
> > > This patch changes CMPXCHG_LOOP in 2 ways:
> > >
> > > First, change the loop to spin instead of falling back to a locked
> > > codepath when the spinlock is held. Once the lock is released, allow the
> > > task to continue trying its cmpxchg loop as before instead of taking the
> > > lock. Second, don't allow the cmpxchg loop to give up after 100 retries.
> > > Just continue infinitely.
> > >
> > > This greatly reduces contention on the lockref when there are large
> > > numbers of concurrent increments and decrements occurring.
> > >
> >
> > This was already tried by me and it unfortunately can reduce performance.
> >
>
> Oh wait I misread the patch based on what I tried there. Spinning
> indefinitely waiting for the lock to be free is a no-go as it loses
> the forward progress guarantee (and it is possible to get the lock
> being continuously held). Only spinning up to an arbitrary point wins
> some in some tests and loses in others.
>
I'm a little confused about the forward progress guarantee here. Does
that exist today at all? ISTM that falling back to spin_lock() after a
certain number of retries doesn't guarantee any forward progress. You
can still just end up spinning on the lock forever once that happens,
no?
> Either way, as described below, chances are decent that:
> 1. there is an easy way to not lockref_get/put on the parent if the
> file is already there, dodging the problem
> .. and even if that's not true
> 2. lockref can be ditched in favor of atomics. apart from some minor
> refactoring this all looks perfectly doable and I have a wip. I will
> try to find the time next week to sort it out
>
Like I said in the earlier mail, I don't think we can stay in RCU mode
because of the audit_inode call. I'm definitely interested in your WIP
though!
> > Key problem is that in some corner cases the lock can be continuously
> > held and be queued on, making the fast path always fail and making all
> > the spins actively waste time (and notably pull on the cacheline).
> >
> > See this for more details:
> > https://lore.kernel.org/oe-lkp/lv7ykdnn2nrci3orajf7ev64afxqdw2d65bcpu2mfaqbkvv4ke@hzxat7utjnvx/
> >
> > However, I *suspect* in the case you are optimizing here (open + O_CREAT
> > of an existing file) lockref on the parent can be avoided altogether
> > with some hackery and that's what should be done here.
> >
> > When it comes to lockref in vfs in general, most uses can be elided with
> > some hackery (see the above thread) which is in early WIP (the LSMs are
> > a massive headache).
> >
> > For open calls which *do* need to take a real ref the hackery does not
> > help of course.
> >
> > This is where I think decoupling ref from the lock is the best way
> > forward. For that to work the dentry must hang around after the last
> > unref (already done thanks to RCU and dput even explicitly handles that
> > already!) and there needs to be a way to block new refs atomically --
> > can be done with cmpxchg from a 0-ref state to a flag blocking new refs
> > coming in. I have that as a WIP as well.
> >
> >
> > > Signed-off-by: Jeff Layton <jlayton@...nel.org>
> > > ---
> > > lib/lockref.c | 85 ++++++++++++++++++++++-------------------------------------
> > > 1 file changed, 32 insertions(+), 53 deletions(-)
> > >
> > > diff --git a/lib/lockref.c b/lib/lockref.c
> > > index 2afe4c5d8919..b76941043fe9 100644
> > > --- a/lib/lockref.c
> > > +++ b/lib/lockref.c
> > > @@ -8,22 +8,25 @@
> > > * Note that the "cmpxchg()" reloads the "old" value for the
> > > * failure case.
> > > */
> > > -#define CMPXCHG_LOOP(CODE, SUCCESS) do { \
> > > - int retry = 100; \
> > > - struct lockref old; \
> > > - BUILD_BUG_ON(sizeof(old) != 8); \
> > > - old.lock_count = READ_ONCE(lockref->lock_count); \
> > > - while (likely(arch_spin_value_unlocked(old.lock.rlock.raw_lock))) { \
> > > - struct lockref new = old; \
> > > - CODE \
> > > - if (likely(try_cmpxchg64_relaxed(&lockref->lock_count, \
> > > - &old.lock_count, \
> > > - new.lock_count))) { \
> > > - SUCCESS; \
> > > - } \
> > > - if (!--retry) \
> > > - break; \
> > > - } \
> > > +#define CMPXCHG_LOOP(CODE, SUCCESS) do { \
> > > + struct lockref old; \
> > > + BUILD_BUG_ON(sizeof(old) != 8); \
> > > + old.lock_count = READ_ONCE(lockref->lock_count); \
> > > + for (;;) { \
> > > + struct lockref new = old; \
> > > + \
> > > + if (likely(arch_spin_value_unlocked(old.lock.rlock.raw_lock))) { \
> > > + CODE \
> > > + if (likely(try_cmpxchg64_relaxed(&lockref->lock_count, \
> > > + &old.lock_count, \
> > > + new.lock_count))) { \
> > > + SUCCESS; \
> > > + } \
> > > + } else { \
> > > + cpu_relax(); \
> > > + old.lock_count = READ_ONCE(lockref->lock_count); \
> > > + } \
> > > + } \
> > > } while (0)
> > >
> > > #else
> > > @@ -46,10 +49,8 @@ void lockref_get(struct lockref *lockref)
> > > ,
> > > return;
> > > );
> > > -
> > > - spin_lock(&lockref->lock);
> > > - lockref->count++;
> > > - spin_unlock(&lockref->lock);
> > > + /* should never get here */
> > > + WARN_ON_ONCE(1);
> > > }
> > > EXPORT_SYMBOL(lockref_get);
> > >
> > > @@ -60,8 +61,6 @@ EXPORT_SYMBOL(lockref_get);
> > > */
> > > int lockref_get_not_zero(struct lockref *lockref)
> > > {
> > > - int retval;
> > > -
> > > CMPXCHG_LOOP(
> > > new.count++;
> > > if (old.count <= 0)
> > > @@ -69,15 +68,9 @@ int lockref_get_not_zero(struct lockref *lockref)
> > > ,
> > > return 1;
> > > );
> > > -
> > > - spin_lock(&lockref->lock);
> > > - retval = 0;
> > > - if (lockref->count > 0) {
> > > - lockref->count++;
> > > - retval = 1;
> > > - }
> > > - spin_unlock(&lockref->lock);
> > > - return retval;
> > > + /* should never get here */
> > > + WARN_ON_ONCE(1);
> > > + return -1;
> > > }
> > > EXPORT_SYMBOL(lockref_get_not_zero);
> > >
> > > @@ -88,8 +81,6 @@ EXPORT_SYMBOL(lockref_get_not_zero);
> > > */
> > > int lockref_put_not_zero(struct lockref *lockref)
> > > {
> > > - int retval;
> > > -
> > > CMPXCHG_LOOP(
> > > new.count--;
> > > if (old.count <= 1)
> > > @@ -97,15 +88,9 @@ int lockref_put_not_zero(struct lockref *lockref)
> > > ,
> > > return 1;
> > > );
> > > -
> > > - spin_lock(&lockref->lock);
> > > - retval = 0;
> > > - if (lockref->count > 1) {
> > > - lockref->count--;
> > > - retval = 1;
> > > - }
> > > - spin_unlock(&lockref->lock);
> > > - return retval;
> > > + /* should never get here */
> > > + WARN_ON_ONCE(1);
> > > + return -1;
> > > }
> > > EXPORT_SYMBOL(lockref_put_not_zero);
> > >
> > > @@ -125,6 +110,8 @@ int lockref_put_return(struct lockref *lockref)
> > > ,
> > > return new.count;
> > > );
> > > + /* should never get here */
> > > + WARN_ON_ONCE(1);
> > > return -1;
> > > }
> > > EXPORT_SYMBOL(lockref_put_return);
> > > @@ -171,8 +158,6 @@ EXPORT_SYMBOL(lockref_mark_dead);
> > > */
> > > int lockref_get_not_dead(struct lockref *lockref)
> > > {
> > > - int retval;
> > > -
> > > CMPXCHG_LOOP(
> > > new.count++;
> > > if (old.count < 0)
> > > @@ -180,14 +165,8 @@ int lockref_get_not_dead(struct lockref *lockref)
> > > ,
> > > return 1;
> > > );
> > > -
> > > - spin_lock(&lockref->lock);
> > > - retval = 0;
> > > - if (lockref->count >= 0) {
> > > - lockref->count++;
> > > - retval = 1;
> > > - }
> > > - spin_unlock(&lockref->lock);
> > > - return retval;
> > > + /* should never get here */
> > > + WARN_ON_ONCE(1);
> > > + return -1;
> > > }
> > > EXPORT_SYMBOL(lockref_get_not_dead);
> > >
> > > --
> > > 2.45.2
> > >
>
>
>
--
Jeff Layton <jlayton@...nel.org>
Powered by blists - more mailing lists