[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250810170311.GA16624@sol>
Date: Sun, 10 Aug 2025 10:03:11 -0700
From: Eric Biggers <ebiggers@...nel.org>
To: Christoph Hellwig <hch@...radead.org>
Cc: linux-fscrypt@...r.kernel.org, fsverity@...ts.linux.dev,
linux-fsdevel@...r.kernel.org, linux-ext4@...r.kernel.org,
linux-f2fs-devel@...ts.sourceforge.net,
linux-mtd@...ts.infradead.org, linux-btrfs@...r.kernel.org,
ceph-devel@...r.kernel.org, Christian Brauner <brauner@...nel.org>
Subject: Re: [PATCH v5 00/13] Move fscrypt and fsverity info out of struct
inode
On Sun, Aug 10, 2025 at 07:49:53AM -0700, Christoph Hellwig wrote:
> On Sun, Aug 10, 2025 at 12:56:53AM -0700, Eric Biggers wrote:
> > This is a cleaned-up implementation of moving the i_crypt_info and
> > i_verity_info pointers out of 'struct inode' and into the fs-specific
> > part of the inode, as proposed previously by Christian at
> > https://lore.kernel.org/r/20250723-work-inode-fscrypt-v4-0-c8e11488a0e6@kernel.org/
>
> I would really much prefer to move fscrypt to use a hash lookup instead
> of bloating all inodes for a each file system supporting it, even if
> very few files on very few file systems are using it. With the fsverity
> xfs series posted again this is becoming personal :)
>
> You mentioned you were looking into it but didn't like the rhashtable
> API. My offer to help with that still stands.
I assume you actually still mean fsverity, not fscrypt. First, it would
be helpful not to use one solution for fscrypt and a totally different
solution for fsverity, as that would increase the maintenance cost well
beyond that of either solution individually.
Second, the fsverity info can be loaded very frequently. For example,
curently it's loaded for each 4K data block processed. Also, there
*are* use cases in which most files on the filesystem have fsverity
enabled. Not super common, but they exist.
Yes, the rhashtable is technically O(1), but its code is a mess, both
source code and generated code. Let's look at the x86_64 code generated
by the fs-specific field solution in this patchset:
a74: 48 8b 47 28 mov 0x28(%rdi),%rax
a78: 48 8b 80 a8 00 00 00 mov 0xa8(%rax),%rax
a7f: 48 8b 00 mov (%rax),%rax
a82: 48 8b 04 07 mov (%rdi,%rax,1),%rax
A few dependent loads, but nice and simple.
Now let's look at the rhashtable version of the load. (I used the
'struct inode *' as the rhashtable key):
ffffffff81487f00 <fsverity_get_info>:
ffffffff81487f00: f3 0f 1e fa endbr64
ffffffff81487f04: 55 push %rbp
ffffffff81487f05: 48 89 e5 mov %rsp,%rbp
ffffffff81487f08: 41 57 push %r15
ffffffff81487f0a: 41 56 push %r14
ffffffff81487f0c: 41 55 push %r13
ffffffff81487f0e: 41 54 push %r12
ffffffff81487f10: 53 push %rbx
ffffffff81487f11: 48 83 ec 10 sub $0x10,%rsp
ffffffff81487f15: 48 89 7d d0 mov %rdi,-0x30(%rbp)
ffffffff81487f19: e8 d2 c6 e9 ff call ffffffff813245f0 <__rcu_read_lock>
ffffffff81487f1e: 48 8b 1d 5b 15 c9 00 mov 0xc9155b(%rip),%rbx # ffffffff82119480 <fsverity_info_hashtable>
ffffffff81487f25: 8b 43 08 mov 0x8(%rbx),%eax
ffffffff81487f28: 8b 55 d4 mov -0x2c(%rbp),%edx
ffffffff81487f2b: 8b 4d d0 mov -0x30(%rbp),%ecx
ffffffff81487f2e: 8b 3b mov (%rbx),%edi
ffffffff81487f30: 2d 09 41 52 21 sub $0x21524109,%eax
ffffffff81487f35: 01 c2 add %eax,%edx
ffffffff81487f37: 01 c1 add %eax,%ecx
ffffffff81487f39: 89 d6 mov %edx,%esi
ffffffff81487f3b: 31 d0 xor %edx,%eax
ffffffff81487f3d: c1 c6 0e rol $0xe,%esi
ffffffff81487f40: 29 f0 sub %esi,%eax
ffffffff81487f42: 89 c6 mov %eax,%esi
ffffffff81487f44: 31 c1 xor %eax,%ecx
ffffffff81487f46: c1 c6 0b rol $0xb,%esi
ffffffff81487f49: 29 f1 sub %esi,%ecx
ffffffff81487f4b: 89 ce mov %ecx,%esi
ffffffff81487f4d: 31 ca xor %ecx,%edx
ffffffff81487f4f: c1 ce 07 ror $0x7,%esi
ffffffff81487f52: 29 f2 sub %esi,%edx
ffffffff81487f54: 89 d6 mov %edx,%esi
ffffffff81487f56: 31 d0 xor %edx,%eax
ffffffff81487f58: c1 c6 10 rol $0x10,%esi
ffffffff81487f5b: 29 f0 sub %esi,%eax
ffffffff81487f5d: 89 c6 mov %eax,%esi
ffffffff81487f5f: 31 c1 xor %eax,%ecx
ffffffff81487f61: c1 c6 04 rol $0x4,%esi
ffffffff81487f64: 29 f1 sub %esi,%ecx
ffffffff81487f66: 8d 77 ff lea -0x1(%rdi),%esi
ffffffff81487f69: 31 ca xor %ecx,%edx
ffffffff81487f6b: c1 c1 0e rol $0xe,%ecx
ffffffff81487f6e: 29 ca sub %ecx,%edx
ffffffff81487f70: 31 d0 xor %edx,%eax
ffffffff81487f72: c1 ca 08 ror $0x8,%edx
ffffffff81487f75: 29 d0 sub %edx,%eax
ffffffff81487f77: 21 c6 and %eax,%esi
ffffffff81487f79: 8b 43 04 mov 0x4(%rbx),%eax
ffffffff81487f7c: 4c 8d 6c f3 40 lea 0x40(%rbx,%rsi,8),%r13
ffffffff81487f81: 85 c0 test %eax,%eax
ffffffff81487f83: 0f 85 8c 00 00 00 jne ffffffff81488015 <fsverity_get_info+0x115>
ffffffff81487f89: 49 8b 4d 00 mov 0x0(%r13),%rcx
ffffffff81487f8d: 48 83 e1 fe and $0xfffffffffffffffe,%rcx
ffffffff81487f91: 74 71 je ffffffff81488004 <fsverity_get_info+0x104>
ffffffff81487f93: 0f b7 05 f8 14 c9 00 movzwl 0xc914f8(%rip),%eax # ffffffff82119492 <fsverity_info_hashtable+0x12>
ffffffff81487f9a: 44 0f b7 25 f2 14 c9 movzwl 0xc914f2(%rip),%r12d # ffffffff82119494 <fsverity_info_hashtable+0x14>
ffffffff81487fa1: 00
ffffffff81487fa2: 49 89 ce mov %rcx,%r14
ffffffff81487fa5: 44 0f b7 3d e9 14 c9 movzwl 0xc914e9(%rip),%r15d # ffffffff82119496 <fsverity_info_hashtable+0x16>
ffffffff81487fac: 00
ffffffff81487fad: 48 89 45 c8 mov %rax,-0x38(%rbp)
ffffffff81487fb1: 4d 29 fc sub %r15,%r12
ffffffff81487fb4: 48 8b 55 c8 mov -0x38(%rbp),%rdx
ffffffff81487fb8: 4b 8d 3c 26 lea (%r14,%r12,1),%rdi
ffffffff81487fbc: 48 8d 75 d0 lea -0x30(%rbp),%rsi
ffffffff81487fc0: e8 2b 53 4b 00 call ffffffff8193d2f0 <memcmp>
ffffffff81487fc5: 85 c0 test %eax,%eax
ffffffff81487fc7: 75 26 jne ffffffff81487fef <fsverity_get_info+0xef>
ffffffff81487fc9: 4d 85 f6 test %r14,%r14
ffffffff81487fcc: 74 43 je ffffffff81488011 <fsverity_get_info+0x111>
ffffffff81487fce: 4c 89 f3 mov %r14,%rbx
ffffffff81487fd1: 4c 29 fb sub %r15,%rbx
ffffffff81487fd4: e8 67 18 ea ff call ffffffff81329840 <__rcu_read_unlock>
ffffffff81487fd9: 48 89 d8 mov %rbx,%rax
ffffffff81487fdc: 48 83 c4 10 add $0x10,%rsp
ffffffff81487fe0: 5b pop %rbx
ffffffff81487fe1: 41 5c pop %r12
ffffffff81487fe3: 41 5d pop %r13
ffffffff81487fe5: 41 5e pop %r14
ffffffff81487fe7: 41 5f pop %r15
ffffffff81487fe9: 5d pop %rbp
ffffffff81487fea: e9 01 53 4d 00 jmp ffffffff8195d2f0 <__pi___x86_return_thunk>
ffffffff81487fef: 4d 8b 36 mov (%r14),%r14
ffffffff81487ff2: 41 f6 c6 01 test $0x1,%r14b
ffffffff81487ff6: 74 bc je ffffffff81487fb4 <fsverity_get_info+0xb4>
ffffffff81487ff8: 4c 89 e8 mov %r13,%rax
ffffffff81487ffb: 48 83 c8 01 or $0x1,%rax
ffffffff81487fff: 49 39 c6 cmp %rax,%r14
ffffffff81488002: 75 85 jne ffffffff81487f89 <fsverity_get_info+0x89>
ffffffff81488004: 48 8b 5b 30 mov 0x30(%rbx),%rbx
ffffffff81488008: 48 85 db test %rbx,%rbx
ffffffff8148800b: 0f 85 14 ff ff ff jne ffffffff81487f25 <fsverity_get_info+0x25>
ffffffff81488011: 31 db xor %ebx,%ebx
ffffffff81488013: eb bf jmp ffffffff81487fd4 <fsverity_get_info+0xd4>
ffffffff81488015: 48 89 df mov %rbx,%rdi
ffffffff81488018: e8 33 94 11 00 call ffffffff815a1450 <rht_bucket_nested>
ffffffff8148801d: 49 89 c5 mov %rax,%r13
ffffffff81488020: e9 64 ff ff ff jmp ffffffff81487f89 <fsverity_get_info+0x89>
ffffffff81488025: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1)
ffffffff8148802c: 00 00 00 00
It doesn't really seem like the kind of solution that's a good choice
for a frequently-loaded field. And that's only the load; it's not
getting into the insertion (and resizing) part.
Reliability due to the code complexity is also a concern, of course.
'git log --grep="rhashtable:" --grep="Fixes:" --all-match' turns up
quite a few issues in rhashtable over time...
If we're going so far as to use a rhashtable, I have to wonder why we
aren't first prioritizing other fields. For example ext4_inode_info
unconditionally has 40 bytes for fast_commit information, even though
fast_commit is an experimental ext4 feature that isn't enabled on most
filesystems. That's 5 times as much as i_verity_info. And quota has 24
bytes under CONFIG_QUOTA. And there are even holes in the
ext4_inode_info struct; we could also just improve the field packing!
The fs-specific field solution from this patchset is much more efficient
than the rhashtable: efficient enough that we don't really have to worry
about it, regardless of fscrypt or fsverity. So I think it's a good
middle ground, and I'd like to just do it this way.
- Eric
Powered by blists - more mailing lists