lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAGudoHE5UmqcbZD1apLsc7G=YmUsDQ=-i=ZQHSD=4qAtsYa3yA@mail.gmail.com>
Date: Sat, 30 Aug 2025 17:54:35 +0200
From: Mateusz Guzik <mjguzik@...il.com>
To: brauner@...nel.org
Cc: viro@...iv.linux.org.uk, jack@...e.cz, linux-kernel@...r.kernel.org, 
	linux-fsdevel@...r.kernel.org, josef@...icpanda.com, kernel-team@...com, 
	amir73il@...il.com, linux-btrfs@...r.kernel.org, linux-ext4@...r.kernel.org, 
	linux-xfs@...r.kernel.org
Subject: Re: [PATCH] fs: revamp iput()

I'm writing a long response to this series, in the meantime I noticed
this bit landed in
https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?h=vfs-6.18.inode.refcount.preliminaries&id=3cba19f6a00675fbc2af0987dfc90e216e6cfb74
but with some whitespace issues in comments -- they are indented with
spaces instead of tabs after the opening line.

I verified the mail I sent does not have it, so I'm guessing this was
copy-pasted?

Tabing them by hand does the trick, below is my copy-paste as proof,
please indent by hand in your editor ;)

diff --git a/fs/inode.c b/fs/inode.c
index 2db680a37235..fe4868e2a954 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1915,10 +1915,10 @@ void iput(struct inode *inode)
        lockdep_assert_not_held(&inode->i_lock);
        VFS_BUG_ON_INODE(inode->i_state & I_CLEAR, inode);
        /*
-        * Note this assert is technically racy as if the count is bogusly
-        * equal to one, then two CPUs racing to further drop it can both
-        * conclude it's fine.
-        */
+        * Note this assert is technically racy as if the count is bogusly
+        * equal to one, then two CPUs racing to further drop it can both
+        * conclude it's fine.
+        */
        VFS_BUG_ON_INODE(atomic_read(&inode->i_count) < 1, inode);

        if (atomic_add_unless(&inode->i_count, -1, 1))
@@ -1942,9 +1942,9 @@ void iput(struct inode *inode)
        }

        /*
-        * iput_final() drops ->i_lock, we can't assert on it as the inode may
-        * be deallocated by the time the call returns.
-        */
+        * iput_final() drops ->i_lock, we can't assert on it as the inode may
+        * be deallocated by the time the call returns.
+        */
        iput_final(inode);
 }
 EXPORT_SYMBOL(iput);

While here, vim told me about spaces instead of tabs in 2 more spots
in the file. Again to show the lines:

diff --git a/fs/inode.c b/fs/inode.c
index 2db680a37235..833de5457a06 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -550,11 +550,11 @@ static void __inode_add_lru(struct inode *inode,
bool rotate)
 struct wait_queue_head *inode_bit_waitqueue(struct wait_bit_queue_entry *wqe,
                                            struct inode *inode, u32 bit)
 {
-        void *bit_address;
+       void *bit_address;

-        bit_address = inode_state_wait_address(inode, bit);
-        init_wait_var_entry(wqe, bit_address, 0);
-        return __var_waitqueue(bit_address);
+       bit_address = inode_state_wait_address(inode, bit);
+       init_wait_var_entry(wqe, bit_address, 0);
+       return __var_waitqueue(bit_address);
 }
 EXPORT_SYMBOL(inode_bit_waitqueue);
@@ -2938,7 +2938,7 @@ EXPORT_SYMBOL(mode_strip_sgid);
  */
 void dump_inode(struct inode *inode, const char *reason)
 {
-       pr_warn("%s encountered for inode %px", reason, inode);
+       pr_warn("%s encountered for inode %px", reason, inode);
 }

 EXPORT_SYMBOL(dump_inode);

Christian, I think it would be the most expedient if you just made
changes on your own with whatever commit message you see fit. No need
to mention I brought this up. If you insist I can send a patch.

On Wed, Aug 27, 2025 at 6:24 PM Mateusz Guzik <mjguzik@...il.com> wrote:
>
> The material change is I_DIRTY_TIME handling without a spurious ref
> acquire/release cycle.
>
> While here a bunch of smaller changes:
> 1. predict there is an inode -- bpftrace suggests one is passed vast
>    majority of the time
> 2. convert BUG_ON into VFS_BUG_ON_INODE
> 3. assert on ->i_count
> 4. assert ->i_lock is not held
> 5. flip the order of I_DIRTY_TIME and nlink count checks as the former
>    is less likely to be true
>
> I verified atomic_read(&inode->i_count) does not show up in asm if
> debug is disabled.
>
> Signed-off-by: Mateusz Guzik <mjguzik@...il.com>
> ---
>
> The routine kept annoying me, so here is a further revised variant.
>
> I verified this compiles, but I still cannot runtime test. I'm sorry for
> that.  My signed-off is conditional on a good samaritan making sure it
> works :)
>
> diff compared to the thing I sent "informally":
> - if (unlikely(!inode))
> - asserts
> - slightly reworded iput_final commentary
> - unlikely() on the second I_DIRTY_TIME check
>
> Given the revamp I think it makes sense to attribute the change to me,
> hence a "proper" mail.
>
> The thing surviving from the submission by Josef is:
> +       if (atomic_add_unless(&inode->i_count, -1, 1))
> +               return;
>
> And of course he is the one who brought up the spurious refcount trip in
> the first place.
>
> I'm happy with Reported-by, Co-developed-by or whatever other credit
> as you guys see fit.
>
> That aside I think it would be nice if NULL inodes passed to iput
> became illegal, but that's a different story for another day.
>
>  fs/inode.c | 46 +++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 35 insertions(+), 11 deletions(-)
>
> diff --git a/fs/inode.c b/fs/inode.c
> index 01ebdc40021e..01a554e11279 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -1908,20 +1908,44 @@ static void iput_final(struct inode *inode)
>   */
>  void iput(struct inode *inode)
>  {
> -       if (!inode)
> +       if (unlikely(!inode))
>                 return;
> -       BUG_ON(inode->i_state & I_CLEAR);
> +
>  retry:
> -       if (atomic_dec_and_lock(&inode->i_count, &inode->i_lock)) {
> -               if (inode->i_nlink && (inode->i_state & I_DIRTY_TIME)) {
> -                       atomic_inc(&inode->i_count);
> -                       spin_unlock(&inode->i_lock);
> -                       trace_writeback_lazytime_iput(inode);
> -                       mark_inode_dirty_sync(inode);
> -                       goto retry;
> -               }
> -               iput_final(inode);
> +       lockdep_assert_not_held(&inode->i_lock);
> +       VFS_BUG_ON_INODE(inode->i_state & I_CLEAR, inode);
> +       /*
> +        * Note this assert is technically racy as if the count is bogusly
> +        * equal to one, then two CPUs racing to further drop it can both
> +        * conclude it's fine.
> +        */
> +       VFS_BUG_ON_INODE(atomic_read(&inode->i_count) < 1, inode);
> +
> +       if (atomic_add_unless(&inode->i_count, -1, 1))
> +               return;
> +
> +       if ((inode->i_state & I_DIRTY_TIME) && inode->i_nlink) {
> +               trace_writeback_lazytime_iput(inode);
> +               mark_inode_dirty_sync(inode);
> +               goto retry;
>         }
> +
> +       spin_lock(&inode->i_lock);
> +       if (unlikely((inode->i_state & I_DIRTY_TIME) && inode->i_nlink)) {
> +               spin_unlock(&inode->i_lock);
> +               goto retry;
> +       }
> +
> +       if (!atomic_dec_and_test(&inode->i_count)) {
> +               spin_unlock(&inode->i_lock);
> +               return;
> +       }
> +
> +       /*
> +        * iput_final() drops ->i_lock, we can't assert on it as the inode may
> +        * be deallocated by the time the call returns.
> +        */
> +       iput_final(inode);
>  }
>  EXPORT_SYMBOL(iput);
>
> --
> 2.43.0
>


-- 
Mateusz Guzik <mjguzik gmail.com>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ