lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250822133047.GA927384@perftesting>
Date: Fri, 22 Aug 2025 09:30:47 -0400
From: Josef Bacik <josef@...icpanda.com>
To: Christian Brauner <brauner@...nel.org>
Cc: linux-fsdevel@...r.kernel.org, linux-btrfs@...r.kernel.org,
	kernel-team@...com, linux-ext4@...r.kernel.org,
	linux-xfs@...r.kernel.org, viro@...iv.linux.org.uk
Subject: Re: [PATCH 00/50] fs: rework inode reference counting

On Fri, Aug 22, 2025 at 12:51:29PM +0200, Christian Brauner wrote:
> On Thu, Aug 21, 2025 at 04:18:11PM -0400, Josef Bacik wrote:
> > Hello,
> > 
> > This series is the first part of a larger body of work geared towards solving a
> > variety of scalability issues in the VFS.
> > 
> > We have historically had a variety of foot-guns related to inode freeing.  We
> > have I_WILL_FREE and I_FREEING flags that indicated when the inode was in the
> > different stages of being reclaimed.  This lead to confusion, and bugs in cases
> > where one was checked but the other wasn't.  Additionally, it's frankly
> > confusing to have both of these flags and to deal with them in practice.
> 
> Agreed.
> 
> > However, this exists because we have an odd behavior with inodes, we allow them
> > to have a 0 reference count and still be usable. This again is a pretty unfun
> > footgun, because generally speaking we want reference counts to be meaningful.
> 
> Agreed.
> 
> > The problem with the way we reference inodes is the final iput(). The majority
> > of file systems do their final truncate of a unlinked inode in their
> > ->evict_inode() callback, which happens when the inode is actually being
> > evicted. This can be a long process for large inodes, and thus isn't safe to
> > happen in a variety of contexts. Btrfs, for example, has an entire delayed iput
> > infrastructure to make sure that we do not do the final iput() in a dangerous
> > context. We cannot expand the use of this reference count to all the places the
> > inode is used, because there are cases where we would need to iput() in an IRQ
> > context  (end folio writeback) or other unsafe context, which is not allowed.
> > 
> > To that end, resolve this by introducing a new i_obj_count reference count. This
> > will be used to control when we can actually free the inode. We then can use
> > this reference count in all the places where we may reference the inode. This
> > removes another huge footgun, having ways to access the inode itself without
> > having an actual reference to it. The writeback code is one of the main places
> > where we see this. Inodes end up on all sorts of lists here without a proper
> > reference count. This allows us to protect the inode from being freed by giving
> > this an other code mechanisms to protect their access to the inode.
> > 
> > With this we can separate the concept of the inode being usable, and the inode
> > being freed.  The next part of the patch series is to stop allowing for inodes
> > to have an i_count of 0 and still be viable.  This comes with some warts. The
> > biggest wart is now if we choose to cache inodes in the LRU list we have to
> > remove the inode from the LRU list if we access it once it's on the LRU list.
> > This will result in more contention on the lru list lock, but in practice we
> > rarely have inodes that do not have a dentry, and if we do that inode is not
> > long for this world.
> > 
> > With not allowing inodes to hit a refcount of 0, we can take advantage of that
> > common pattern of using refcount_inc_not_zero() in all of the lockless places
> > where we do inode lookup in cache.  From there we can change all the users who
> > check I_WILL_FREE or I_FREEING to simply check the i_count. If it is 0 then they
> > aren't allowed to do their work, othrwise they can proceed as normal.
> > 
> > With all of that in place we can finally remove these two flags.
> > 
> > This is a large series, but it is mostly mechanical. I've kept the patches very
> > small, to make it easy to review and logic about each change. I have run this
> > through fstests for btrfs and ext4, xfs is currently going. I wanted to get this
> > out for review to make sure this big design changes are reasonable to everybody.
> > 
> > The series is based on vfs/vfs.all branch, which is based on 6.9-rc1. Thanks,
> 
> I so hope you meant 6.17-rc1 because otherwise I did something very very
> wrong. :)

Stupid AI hallucination...

Josef

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ