lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250826-begnadet-vorarbeit-901ad1e2bfa0@brauner>
Date: Tue, 26 Aug 2025 11:56:12 +0200
From: Christian Brauner <brauner@...nel.org>
To: Josef Bacik <josef@...icpanda.com>
Cc: linux-fsdevel@...r.kernel.org, linux-btrfs@...r.kernel.org, 
	kernel-team@...com, linux-ext4@...r.kernel.org, linux-xfs@...r.kernel.org, 
	viro@...iv.linux.org.uk
Subject: Re: [PATCH 16/50] fs: change evict_inodes to use iput instead of
 evict directly

On Mon, Aug 25, 2025 at 03:35:02PM -0400, Josef Bacik wrote:
> On Mon, Aug 25, 2025 at 11:07:55AM +0200, Christian Brauner wrote:
> > On Thu, Aug 21, 2025 at 04:18:27PM -0400, Josef Bacik wrote:
> > > At evict_inodes() time, we no longer have SB_ACTIVE set, so we can
> > > easily go through the normal iput path to clear any inodes. Update
> > 
> > I'm a bit lost why SB_ACTIVE is used here as a justification to call
> > iput(). I think it's because iput_final() would somehow add it back to
> > the LRU if SB_ACTIVE was still set and the filesystem somehow would
> > indicate it wouldn't want to drop the inode.
> > 
> > I'm confused where that would even happen. IOW, which filesystem would
> > indicate "don't drop the inode" even though it's about to vanish. But
> > anyway, that's probably not important because...
> > 
> > > dispose_list() to check how we need to free the inode, and then grab a
> > > full reference to the inode while we're looping through the remaining
> > > inodes, and simply iput them at the end.
> > > 
> > > Since we're just calling iput we don't really care about the i_count on
> > > the inode at the current time.  Remove the i_count checks and just call
> > > iput on every inode we find.
> > > 
> > > Signed-off-by: Josef Bacik <josef@...icpanda.com>
> > > ---
> > >  fs/inode.c | 26 +++++++++++---------------
> > >  1 file changed, 11 insertions(+), 15 deletions(-)
> > > 
> > > diff --git a/fs/inode.c b/fs/inode.c
> > > index 72981b890ec6..80ad327746a7 100644
> > > --- a/fs/inode.c
> > > +++ b/fs/inode.c
> > > @@ -933,7 +933,7 @@ static void evict(struct inode *inode)
> > >   * Dispose-list gets a local list with local inodes in it, so it doesn't
> > >   * need to worry about list corruption and SMP locks.
> > >   */
> > > -static void dispose_list(struct list_head *head)
> > > +static void dispose_list(struct list_head *head, bool for_lru)
> > >  {
> > >  	while (!list_empty(head)) {
> > >  		struct inode *inode;
> > > @@ -941,8 +941,12 @@ static void dispose_list(struct list_head *head)
> > >  		inode = list_first_entry(head, struct inode, i_lru);
> > >  		list_del_init(&inode->i_lru);
> > >  
> > > -		evict(inode);
> > > -		iobj_put(inode);
> > > +		if (for_lru) {
> > > +			evict(inode);
> > > +			iobj_put(inode);
> > > +		} else {
> > > +			iput(inode);
> > > +		}
> > 
> > ... Afaict, if we end up in dispose_list() we came from one of two
> > locations:
> > 
> > (1) prune_icache_sb()
> >     In which case inode_lru_isolate() will have only returned inodes
> >     that prior to your changes would have inode->i_count zero.
> > 
> > (2) evict_inodes()
> >     Similar story, this only hits inodes with inode->i_count zero.
> > 
> > With your change you're adding an increment from zero for (2) via
> > __iget() so that you always end up with a full refcount, and that is
> > backing your changes to dispose_list() later.
> > 
> > I don't see the same done for (1) though and so your later call to
> > iput() drops the reference below zero? It's accidently benign because
> > iiuc atomic_dec_and_test() will simply tell you that reference count
> > didn't go to zero and so iput() will back off. But still this should be
> > fixed if I'm right.
> 
> Because (1) at this point doesn't have a full reference, it only has an
> i_obj_count reference. The next patch converts this, and removes this bit. I did
> it this way to clearly mark the change in behavior.

Ah, right, I forgot to take the the boolean argument into account.
You're passing that as true in (1). Sure and then sorry about the noise.

> prune_icache_sb() will call dispose_list(&list, true), which will do the
> evict(inode) and iobj_put(inode). This is correct because the inodes on the list
> from prune_icache_sb() will have an i_count == and have I_WILL_FREE set, so it
> will never have it's i_count increased to 1.
> 
> The change here is to change evict_inodes() to simply call iput(), as it calls
> dispose_list(&list, false). We will increase the i_count to 1 from zero via
> __iget(), which at this point in the series is completely correct behavior. Then
> we will call iput() which will drop the i_count back to zero, and then call
> iput_final, and since SB_ACTIVE is not set, it will call evict(inode) and clean
> everything up properly.
> 
> > 
> > The conversion to iput() is introducing a lot of subtlety in the middle
> > of the series. If I'm right then the iput() is a always a nop because in
> > all cases it was an increment from zero. But it isn't really a nop
> > because we still do stuff like call ->drop_inode() again. Maybe it's
> > fine because no filesystem would have issues with this but I wouldn't
> > count on it and also it feels rather unclean to do it this way.
> 
> So I'm definitely introducing another call to ->drop_inode() here, but
> ->drop_inode() has always been a "do we want to keep this inode on the LRU"
> call, calling it again doesn't really change anything.

Right, the first time the inode reference count drops to zero we call
iput_final() and then ->drop_inode() which tells us to put it on the
LRU. And we leave it otherwise in tact. This is the part I forgot.

Once you add it to the LRU you'll have incremented i_obj_count either
when you added it to the cached LRU and moving it from there to the
"actual" LRU or you'll take a new one.

And then dispose_list() for (1) will just need to put the i_obj_count.
Thanks!

> That being said it is a subtle functional change. I put it here specifically
> because it is a functional change. If it bites us in the ass in some unforseen
> way we'll be able to bisect it down to here and then we can all laugh at Josef
> because he missed something.

Yeah, it's good. It's just very confusing because for a brief time we
have to understand two reference counts that are first coupled almost
1:1 and then slowly decoupled which makes this particularly nasty...

> > So, under the assumption, that after the increment from zero you did, we
> > really only have a blatant zombie inode on our hands and we only need to
> > get rid of the i_count we took make that explicit and do:
> > 
> > 	if (for_lru) {
> > 		evict(inode);
> > 		iobj_put(inode);
> > 	} else {
> > 		/* This inode was always incremented from zero.
> > 		 * Get rid of that reference without doing anything else.
> > 		 */
> > 		WARN_ON_ONCE(!atomic_dec_and_test(&inode->i_count));
> > 	}
> 
> We still need the evict() to actually free the inode.  We're just getting there
> via iput_final() now instead of directly calling evict().

Right, thanks!

> 
> > 
> > Btw, for the iobj_put() above, I assume that we're not guaranteed that
> > i_obj_count == 1?
> 
> Right, it's purely dropping the LRU list i_obj_count reference.  Thanks,

Thanks for the comments!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ