[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6vizzdoktqzzkyyvxqupr6jgzqcd4cclc24pujgx53irxtsy4h@lzevj646ccmg>
Date: Mon, 23 Sep 2024 15:56:50 -0400
From: Kent Overstreet <kent.overstreet@...ux.dev>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: linux-bcachefs@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, Dave Chinner <dchinner@...hat.com>
Subject: Re: [GIT PULL] bcachefs changes for 6.12-rc1
On Mon, Sep 23, 2024 at 10:18:57AM GMT, Linus Torvalds wrote:
> On Sat, 21 Sept 2024 at 12:28, Kent Overstreet
> <kent.overstreet@...ux.dev> wrote:
> >
> > We're now using an rhashtable instead of the system inode hash table;
> > this is another significant performance improvement on multithreaded
> > metadata workloads, eliminating more lock contention.
>
> So I've pulled this, but I reacted to this issue - what is the load
> where the inode hash table is actually causing issues?
>
> Because honestly, the reason we're using that "one single lock" for
> the inode hash table is that nobody has ever bothered.
>
> In particular, it *should* be reasonably straightforward (lots of
> small details, but not fundamentally hard) to just make the inode hash
> table be a "bl" hash - with the locking done per-bucket, not using one
> global lock.
Dave was working on that - he posted patches and it seemed like they
were mostly done, not sure what happened with those.
> But nobody has ever seemed to care before. Because the inode hash just
> isn't all that heavily used, since normal loads don't look up inodes
> directly (they look up dentries, and you find the inodes off those -
> and the *dentry* hash scales well thanks to both RCU and doing the
> bucket lock thing).
>
> So the fact that bcachefs cares makes me go "Why?"
I've seen benchmarks (not just on bcachefs, ext4 as well) where it was
actually _really_ heavy. They were synthetics, but I'd expect to see the
same effect from workloads where we're streaming a lot of inodes through
the cache - think rsync.
Another data point is that the btree key cache work I just did was also
for exactly this type of situation - it addressed lock contention when
streaming items into and out of the key cache, which are mostly inodes
(it's also used for the alloc btree) - and I had reports from multiple
users it was a drastic improvement, much more than expected, and
eliminated the main source of our "srcu lock held too long" warnings -
and if lock contention is bad enough to trip a 10 second warning, you
know it's bad, heh.
I try to really jump on these sorts of lock contention issues because
I've seen many instances where they caused performance cliffs - in
practice they're often O(n^2) issues because as your workload scales up
it's usually not just number of instances where you take the lock that
increases, usually something else is scaling too - even just increased
cacheline bouncing from the lock contention itself will be problematic.
Powered by blists - more mailing lists