linux-kernel - Re: [PATCH 22/32] vfs: inode cache conversion to hash-bl

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZGQOlrcvLplTfZmf@dread.disaster.area>
Date:   Wed, 17 May 2023 09:15:34 +1000
From:   Dave Chinner <david@...morbit.com>
To:     Kent Overstreet <kent.overstreet@...ux.dev>
Cc:     Christian Brauner <brauner@...nel.org>,
        linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-bcachefs@...r.kernel.org, Dave Chinner <dchinner@...hat.com>,
        Alexander Viro <viro@...iv.linux.org.uk>
Subject: Re: [PATCH 22/32] vfs: inode cache conversion to hash-bl

On Tue, May 16, 2023 at 12:17:04PM -0400, Kent Overstreet wrote:
> On Tue, May 16, 2023 at 05:45:19PM +0200, Christian Brauner wrote:
> > On Wed, May 10, 2023 at 02:45:57PM +1000, Dave Chinner wrote:
> > There's a bit of a backlog before I get around to looking at this but
> > it'd be great if we'd have a few reviewers for this change.
> 
> It is well tested - it's been in the bcachefs tree for ages with zero
> issues. I'm pulling it out of the bcachefs-prerequisites series though
> since Dave's still got it in his tree, he's got a newer version with
> better commit messages.
> 
> It's a significant performance boost on metadata heavy workloads for any
> non-XFS filesystem, we should definitely get it in.

I've got an up to date vfs-scale tree here (6.4-rc1) but I have not
been able to test it effectively right now because my local
performance test server is broken. I'll do what I can on the old
small machine that I have to validate it when I get time, but that
might be a few weeks away....

git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs.git vfs-scale

As it is, the inode hash-bl changes have zero impact on XFS because
it has it's own highly scalable lockless, sharded inode cache. So
unless I'm explicitly testing ext4 or btrfs scalability (rare) it's
not getting a lot of scalability exercise. It is being used by the
root filesytsems on all those test VMs, but that's about it...

That said, my vfs-scale tree also has Waiman Long's old dlist code
(per cpu linked list) which converts the sb inode list and removes
the global lock there. This does make a huge impact for XFS - the
current code limits inode cache cycling to about 600,000 inodes/sec
on >=16p machines. With dlists, however:

| 5.17.0 on a XFS filesystem with 50 million inodes in it on a 32p
| machine with a 1.6MIOPS/6.5GB/s block device.
| 
| Fully concurrent full filesystem bulkstat:
| 
| 		wall time	sys time	IOPS	BW	rate
| unpatched:	1m56.035s	56m12.234s	 8k     200MB/s	0.4M/s
| patched:	0m15.710s	 3m45.164s	70k	1.9GB/s 3.4M/s
| 
| Unpatched flat kernel profile:
| 
|   81.97%  [kernel]  [k] __pv_queued_spin_lock_slowpath
|    1.84%  [kernel]  [k] do_raw_spin_lock
|    1.33%  [kernel]  [k] __raw_callee_save___pv_queued_spin_unlock
|    0.50%  [kernel]  [k] memset_erms
|    0.42%  [kernel]  [k] do_raw_spin_unlock
|    0.42%  [kernel]  [k] xfs_perag_get
|    0.40%  [kernel]  [k] xfs_buf_find
|    0.39%  [kernel]  [k] __raw_spin_lock_init
| 
| Patched flat kernel profile:
| 
|   10.90%  [kernel]  [k] do_raw_spin_lock
|    7.21%  [kernel]  [k] __raw_callee_save___pv_queued_spin_unlock
|    3.16%  [kernel]  [k] xfs_buf_find
|    3.06%  [kernel]  [k] rcu_segcblist_enqueue
|    2.73%  [kernel]  [k] memset_erms
|    2.31%  [kernel]  [k] __pv_queued_spin_lock_slowpath
|    2.15%  [kernel]  [k] __raw_spin_lock_init
|    2.15%  [kernel]  [k] do_raw_spin_unlock
|    2.12%  [kernel]  [k] xfs_perag_get
|    1.93%  [kernel]  [k] xfs_btree_lookup

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com