[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAGudoHE5UmqcbZD1apLsc7G=YmUsDQ=-i=ZQHSD=4qAtsYa3yA@mail.gmail.com>
Date: Sat, 30 Aug 2025 17:54:35 +0200
From: Mateusz Guzik <mjguzik@...il.com>
To: brauner@...nel.org
Cc: viro@...iv.linux.org.uk, jack@...e.cz, linux-kernel@...r.kernel.org,
linux-fsdevel@...r.kernel.org, josef@...icpanda.com, kernel-team@...com,
amir73il@...il.com, linux-btrfs@...r.kernel.org, linux-ext4@...r.kernel.org,
linux-xfs@...r.kernel.org
Subject: Re: [PATCH] fs: revamp iput()
I'm writing a long response to this series, in the meantime I noticed
this bit landed in
https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?h=vfs-6.18.inode.refcount.preliminaries&id=3cba19f6a00675fbc2af0987dfc90e216e6cfb74
but with some whitespace issues in comments -- they are indented with
spaces instead of tabs after the opening line.
I verified the mail I sent does not have it, so I'm guessing this was
copy-pasted?
Tabing them by hand does the trick, below is my copy-paste as proof,
please indent by hand in your editor ;)
diff --git a/fs/inode.c b/fs/inode.c
index 2db680a37235..fe4868e2a954 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1915,10 +1915,10 @@ void iput(struct inode *inode)
lockdep_assert_not_held(&inode->i_lock);
VFS_BUG_ON_INODE(inode->i_state & I_CLEAR, inode);
/*
- * Note this assert is technically racy as if the count is bogusly
- * equal to one, then two CPUs racing to further drop it can both
- * conclude it's fine.
- */
+ * Note this assert is technically racy as if the count is bogusly
+ * equal to one, then two CPUs racing to further drop it can both
+ * conclude it's fine.
+ */
VFS_BUG_ON_INODE(atomic_read(&inode->i_count) < 1, inode);
if (atomic_add_unless(&inode->i_count, -1, 1))
@@ -1942,9 +1942,9 @@ void iput(struct inode *inode)
}
/*
- * iput_final() drops ->i_lock, we can't assert on it as the inode may
- * be deallocated by the time the call returns.
- */
+ * iput_final() drops ->i_lock, we can't assert on it as the inode may
+ * be deallocated by the time the call returns.
+ */
iput_final(inode);
}
EXPORT_SYMBOL(iput);
While here, vim told me about spaces instead of tabs in 2 more spots
in the file. Again to show the lines:
diff --git a/fs/inode.c b/fs/inode.c
index 2db680a37235..833de5457a06 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -550,11 +550,11 @@ static void __inode_add_lru(struct inode *inode,
bool rotate)
struct wait_queue_head *inode_bit_waitqueue(struct wait_bit_queue_entry *wqe,
struct inode *inode, u32 bit)
{
- void *bit_address;
+ void *bit_address;
- bit_address = inode_state_wait_address(inode, bit);
- init_wait_var_entry(wqe, bit_address, 0);
- return __var_waitqueue(bit_address);
+ bit_address = inode_state_wait_address(inode, bit);
+ init_wait_var_entry(wqe, bit_address, 0);
+ return __var_waitqueue(bit_address);
}
EXPORT_SYMBOL(inode_bit_waitqueue);
@@ -2938,7 +2938,7 @@ EXPORT_SYMBOL(mode_strip_sgid);
*/
void dump_inode(struct inode *inode, const char *reason)
{
- pr_warn("%s encountered for inode %px", reason, inode);
+ pr_warn("%s encountered for inode %px", reason, inode);
}
EXPORT_SYMBOL(dump_inode);
Christian, I think it would be the most expedient if you just made
changes on your own with whatever commit message you see fit. No need
to mention I brought this up. If you insist I can send a patch.
On Wed, Aug 27, 2025 at 6:24 PM Mateusz Guzik <mjguzik@...il.com> wrote:
>
> The material change is I_DIRTY_TIME handling without a spurious ref
> acquire/release cycle.
>
> While here a bunch of smaller changes:
> 1. predict there is an inode -- bpftrace suggests one is passed vast
> majority of the time
> 2. convert BUG_ON into VFS_BUG_ON_INODE
> 3. assert on ->i_count
> 4. assert ->i_lock is not held
> 5. flip the order of I_DIRTY_TIME and nlink count checks as the former
> is less likely to be true
>
> I verified atomic_read(&inode->i_count) does not show up in asm if
> debug is disabled.
>
> Signed-off-by: Mateusz Guzik <mjguzik@...il.com>
> ---
>
> The routine kept annoying me, so here is a further revised variant.
>
> I verified this compiles, but I still cannot runtime test. I'm sorry for
> that. My signed-off is conditional on a good samaritan making sure it
> works :)
>
> diff compared to the thing I sent "informally":
> - if (unlikely(!inode))
> - asserts
> - slightly reworded iput_final commentary
> - unlikely() on the second I_DIRTY_TIME check
>
> Given the revamp I think it makes sense to attribute the change to me,
> hence a "proper" mail.
>
> The thing surviving from the submission by Josef is:
> + if (atomic_add_unless(&inode->i_count, -1, 1))
> + return;
>
> And of course he is the one who brought up the spurious refcount trip in
> the first place.
>
> I'm happy with Reported-by, Co-developed-by or whatever other credit
> as you guys see fit.
>
> That aside I think it would be nice if NULL inodes passed to iput
> became illegal, but that's a different story for another day.
>
> fs/inode.c | 46 +++++++++++++++++++++++++++++++++++-----------
> 1 file changed, 35 insertions(+), 11 deletions(-)
>
> diff --git a/fs/inode.c b/fs/inode.c
> index 01ebdc40021e..01a554e11279 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -1908,20 +1908,44 @@ static void iput_final(struct inode *inode)
> */
> void iput(struct inode *inode)
> {
> - if (!inode)
> + if (unlikely(!inode))
> return;
> - BUG_ON(inode->i_state & I_CLEAR);
> +
> retry:
> - if (atomic_dec_and_lock(&inode->i_count, &inode->i_lock)) {
> - if (inode->i_nlink && (inode->i_state & I_DIRTY_TIME)) {
> - atomic_inc(&inode->i_count);
> - spin_unlock(&inode->i_lock);
> - trace_writeback_lazytime_iput(inode);
> - mark_inode_dirty_sync(inode);
> - goto retry;
> - }
> - iput_final(inode);
> + lockdep_assert_not_held(&inode->i_lock);
> + VFS_BUG_ON_INODE(inode->i_state & I_CLEAR, inode);
> + /*
> + * Note this assert is technically racy as if the count is bogusly
> + * equal to one, then two CPUs racing to further drop it can both
> + * conclude it's fine.
> + */
> + VFS_BUG_ON_INODE(atomic_read(&inode->i_count) < 1, inode);
> +
> + if (atomic_add_unless(&inode->i_count, -1, 1))
> + return;
> +
> + if ((inode->i_state & I_DIRTY_TIME) && inode->i_nlink) {
> + trace_writeback_lazytime_iput(inode);
> + mark_inode_dirty_sync(inode);
> + goto retry;
> }
> +
> + spin_lock(&inode->i_lock);
> + if (unlikely((inode->i_state & I_DIRTY_TIME) && inode->i_nlink)) {
> + spin_unlock(&inode->i_lock);
> + goto retry;
> + }
> +
> + if (!atomic_dec_and_test(&inode->i_count)) {
> + spin_unlock(&inode->i_lock);
> + return;
> + }
> +
> + /*
> + * iput_final() drops ->i_lock, we can't assert on it as the inode may
> + * be deallocated by the time the call returns.
> + */
> + iput_final(inode);
> }
> EXPORT_SYMBOL(iput);
>
> --
> 2.43.0
>
--
Mateusz Guzik <mjguzik gmail.com>
Powered by blists - more mailing lists