lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250825193502.GB1310133@perftesting>
Date: Mon, 25 Aug 2025 15:35:02 -0400
From: Josef Bacik <josef@...icpanda.com>
To: Christian Brauner <brauner@...nel.org>
Cc: linux-fsdevel@...r.kernel.org, linux-btrfs@...r.kernel.org,
	kernel-team@...com, linux-ext4@...r.kernel.org,
	linux-xfs@...r.kernel.org, viro@...iv.linux.org.uk
Subject: Re: [PATCH 16/50] fs: change evict_inodes to use iput instead of
 evict directly

On Mon, Aug 25, 2025 at 11:07:55AM +0200, Christian Brauner wrote:
> On Thu, Aug 21, 2025 at 04:18:27PM -0400, Josef Bacik wrote:
> > At evict_inodes() time, we no longer have SB_ACTIVE set, so we can
> > easily go through the normal iput path to clear any inodes. Update
> 
> I'm a bit lost why SB_ACTIVE is used here as a justification to call
> iput(). I think it's because iput_final() would somehow add it back to
> the LRU if SB_ACTIVE was still set and the filesystem somehow would
> indicate it wouldn't want to drop the inode.
> 
> I'm confused where that would even happen. IOW, which filesystem would
> indicate "don't drop the inode" even though it's about to vanish. But
> anyway, that's probably not important because...
> 
> > dispose_list() to check how we need to free the inode, and then grab a
> > full reference to the inode while we're looping through the remaining
> > inodes, and simply iput them at the end.
> > 
> > Since we're just calling iput we don't really care about the i_count on
> > the inode at the current time.  Remove the i_count checks and just call
> > iput on every inode we find.
> > 
> > Signed-off-by: Josef Bacik <josef@...icpanda.com>
> > ---
> >  fs/inode.c | 26 +++++++++++---------------
> >  1 file changed, 11 insertions(+), 15 deletions(-)
> > 
> > diff --git a/fs/inode.c b/fs/inode.c
> > index 72981b890ec6..80ad327746a7 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -933,7 +933,7 @@ static void evict(struct inode *inode)
> >   * Dispose-list gets a local list with local inodes in it, so it doesn't
> >   * need to worry about list corruption and SMP locks.
> >   */
> > -static void dispose_list(struct list_head *head)
> > +static void dispose_list(struct list_head *head, bool for_lru)
> >  {
> >  	while (!list_empty(head)) {
> >  		struct inode *inode;
> > @@ -941,8 +941,12 @@ static void dispose_list(struct list_head *head)
> >  		inode = list_first_entry(head, struct inode, i_lru);
> >  		list_del_init(&inode->i_lru);
> >  
> > -		evict(inode);
> > -		iobj_put(inode);
> > +		if (for_lru) {
> > +			evict(inode);
> > +			iobj_put(inode);
> > +		} else {
> > +			iput(inode);
> > +		}
> 
> ... Afaict, if we end up in dispose_list() we came from one of two
> locations:
> 
> (1) prune_icache_sb()
>     In which case inode_lru_isolate() will have only returned inodes
>     that prior to your changes would have inode->i_count zero.
> 
> (2) evict_inodes()
>     Similar story, this only hits inodes with inode->i_count zero.
> 
> With your change you're adding an increment from zero for (2) via
> __iget() so that you always end up with a full refcount, and that is
> backing your changes to dispose_list() later.
> 
> I don't see the same done for (1) though and so your later call to
> iput() drops the reference below zero? It's accidently benign because
> iiuc atomic_dec_and_test() will simply tell you that reference count
> didn't go to zero and so iput() will back off. But still this should be
> fixed if I'm right.

Because (1) at this point doesn't have a full reference, it only has an
i_obj_count reference. The next patch converts this, and removes this bit. I did
it this way to clearly mark the change in behavior.

prune_icache_sb() will call dispose_list(&list, true), which will do the
evict(inode) and iobj_put(inode). This is correct because the inodes on the list
from prune_icache_sb() will have an i_count == and have I_WILL_FREE set, so it
will never have it's i_count increased to 1.

The change here is to change evict_inodes() to simply call iput(), as it calls
dispose_list(&list, false). We will increase the i_count to 1 from zero via
__iget(), which at this point in the series is completely correct behavior. Then
we will call iput() which will drop the i_count back to zero, and then call
iput_final, and since SB_ACTIVE is not set, it will call evict(inode) and clean
everything up properly.

> 
> The conversion to iput() is introducing a lot of subtlety in the middle
> of the series. If I'm right then the iput() is a always a nop because in
> all cases it was an increment from zero. But it isn't really a nop
> because we still do stuff like call ->drop_inode() again. Maybe it's
> fine because no filesystem would have issues with this but I wouldn't
> count on it and also it feels rather unclean to do it this way.

So I'm definitely introducing another call to ->drop_inode() here, but
->drop_inode() has always been a "do we want to keep this inode on the LRU"
call, calling it again doesn't really change anything.

That being said it is a subtle functional change. I put it here specifically
because it is a functional change. If it bites us in the ass in some unforseen
way we'll be able to bisect it down to here and then we can all laugh at Josef
because he missed something.

> 
> So, under the assumption, that after the increment from zero you did, we
> really only have a blatant zombie inode on our hands and we only need to
> get rid of the i_count we took make that explicit and do:
> 
> 	if (for_lru) {
> 		evict(inode);
> 		iobj_put(inode);
> 	} else {
> 		/* This inode was always incremented from zero.
> 		 * Get rid of that reference without doing anything else.
> 		 */
> 		WARN_ON_ONCE(!atomic_dec_and_test(&inode->i_count));
> 	}

We still need the evict() to actually free the inode.  We're just getting there
via iput_final() now instead of directly calling evict().

> 
> Btw, for the iobj_put() above, I assume that we're not guaranteed that
> i_obj_count == 1?

Right, it's purely dropping the LRU list i_obj_count reference.  Thanks,

Josef

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ