lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <v5lvhjauvcx27fcsyhyugzexdk7sik7an2soyxtx5dxj3oxjqa@gbvyu2kc7vpy>
Date: Tue, 24 Sep 2024 22:13:01 -0400
From: Kent Overstreet <kent.overstreet@...ux.dev>
To: Dave Chinner <david@...morbit.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>, 
	linux-bcachefs@...r.kernel.org, linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org, 
	Dave Chinner <dchinner@...hat.com>
Subject: Re: [GIT PULL] bcachefs changes for 6.12-rc1

On Wed, Sep 25, 2024 at 11:00:10AM GMT, Dave Chinner wrote:
> > Eh? Of course it'd have to be coherent, but just checking if an inode is
> > present in the VFS cache is what, 1-2 cache misses? Depending on hash
> > table fill factor...
> 
> Sure, when there is no contention and you have CPU to spare. But the
> moment the lookup hits contention problems (i.e. we are exceeding
> the cache lookup scalability capability), we are straight back to
> running a VFS cache speed instead of uncached speed.

The cache lookups are just reads; they don't introduce scalability
issues, unless they're contending with other cores writing to those
cachelines - checking if an item is present in a hash table is trivial
to do locklessly.

But pulling an inode into and then evicting it from the inode cache
entails a lot more work - just initializing a struct inode is
nontrivial, and then there's the (multiple) shared data structures you
have to manipulate.

> Keep in mind that not having a referenced inode opens up the code to
> things like pre-emption races. i.e. a cache miss doesn't prevent
> the current task from being preempted before it reads the inode
> information into the user buffer. The VFS inode could bei
> instantiated and modified before the uncached access runs again and
> pulls stale information from the underlying buffer and returns that
> to userspace.

Yeah, if you're reading from a buffer cache that doesn't have a lock
that does get dicy - but for bcachefs where we're reading from a btree
node that does have a lock it's quite manageable.

And incidentally this sort of "we have a cache on top of the btree, but
sometimes we have to do direct access" is already something that comes
up a lot in bcachefs, primarily for the alloc btree. _Tons_ of fun, but
doesn't actually come up here for us since we don't use the vfs inode
cache as a writeback cache.

Now, for some completely different sillyness, there's actually _three_
levels of caching for inodes in bcachefs: btree node cache, btree key
cache, then the vfs cache. In the first two inodes are packed down to
~100 bytes so it's not that bad, but it does make you go "...what?". It
would be nice in theory to collapse - but the upside is that we don't
have the interactions between the vfs inode cache and journalling that
xfs has.

But if vfs inodes no longer have their own lifetime like you've been
talking about, that might open up interesting possibilities.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ