linux-kernel - Re: [PATCH v3 2/4] fs: hide ->i_state handling behind accessors

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <h4xj2os657va3ylszf6hgqp2aab5bc7mywdacj3sl6py4tadhy@3eqqcdhxrdtc>
Date: Thu, 11 Sep 2025 11:00:49 +0200
From: Mateusz Guzik <mjguzik@...il.com>
To: Dave Chinner <david@...morbit.com>
Cc: brauner@...nel.org, viro@...iv.linux.org.uk, jack@...e.cz, 
	linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org, josef@...icpanda.com, 
	kernel-team@...com, amir73il@...il.com, linux-btrfs@...r.kernel.org, 
	linux-ext4@...r.kernel.org, linux-xfs@...r.kernel.org, ocfs2-devel@...ts.linux.dev
Subject: Re: [PATCH v3 2/4] fs: hide ->i_state handling behind accessors

On Thu, Sep 11, 2025 at 04:52:57PM +1000, Dave Chinner wrote:
> On Thu, Sep 11, 2025 at 06:55:55AM +0200, Mateusz Guzik wrote:
> > Signed-off-by: Mateusz Guzik <mjguzik@...il.com>
> 
> So why did you choose these specific wrapper functions?
> 
> Some commentary on why you choose this specific API would be very
> useful here.
> 

Hi Dave,

thanks for the reply.

I believe the end state we are both aiming for is similar. I did not
spend any time in this cover letter outlining the state I consider
desirable for the long run, so I see why you would assume the new
i_state helpers are the endgame in what I'm looking for. I wrote about
it at length in my responses to the refcount thread, but maybe I failed
to convey it. Bottom line is I do support dedicated helpers, I don't
believe the kernel is in a good position to add them as is.

> > diff --git a/block/bdev.c b/block/bdev.c
> > index b77ddd12dc06..77f04042ac67 100644
> > --- a/block/bdev.c
> > +++ b/block/bdev.c
> > @@ -67,7 +67,7 @@ static void bdev_write_inode(struct block_device *bdev)
> >  	int ret;
> >  
> >  	spin_lock(&inode->i_lock);
> > -	while (inode->i_state & I_DIRTY) {
> > +	while (inode_state_read(inode) & I_DIRTY) {
> >  		spin_unlock(&inode->i_lock);
> >  		ret = write_inode_now(inode, true);
> >  		if (ret)
> 
> This isn't an improvement.
> 
> It makes the code harder to read, and now I have to go look at the
> implementation of a set of helper functions to determine if that's
> the right helper to use for the context the code is operating in.
> 
> What would be an improvement is making all the state flags disappear
> behind the same flag APIs as other high level objects that
> filesystems interface with. e.g. folio flags use
> folio_test.../folio_set.../folio_clear...
> 
> Looking wider, at least XFS, ext4 and btrfs use these same
> set/test/clear flag APIs for feature and mount option flags. XFS
> also uses them for oeprational state in mount, journal and per-ag
> structures, etc. It's a pretty common pattern.
> 
> Using it for the inode state flags would lead to code like this:
> 
> 	spin_lock(&inode->i_lock);
> 	while (inode_test_dirty(inode)) {
> 	.....
> 
> That's far cleaner and easier to understand and use than an API that
> explicitly encodes the locking context of the specific access being
> made in the helper names.
> 
> IOWs, the above I_DIRTY flag ends up with a set of wrappers that
> look like:
> 
> bool inode_test_dirty_unlocked(struct inode *inode)
> {
> 	return inode->i_state & I_DIRTY;
> }
> 
> bool inode_test_dirty(struct inode *inode)
> {
> 	lockdep_assert_held(&inode->i_lock);
> 	return inode_test_dirty_unlocked(inode);
> }
> 
> void inode_set_dirty(struct inode *inode)
> {
> 	lockdep_assert_held(&inode->i_lock);
> 	inode->i_state |= I_DIRTY;
> }
> 
> void inode_clear_dirty(struct inode *inode)
> {
> 	lockdep_assert_held(&inode->i_lock);
> 	inode->i_state &= ~I_DIRTY;
> }
> 
> With this, almost no callers need to know about the I_DIRTY flag -
> direct use of it is a red flag and/or an exceptional case.  It's
> self documenting that it is an exceptional case, and it better have
> a comment explaining why it is safe....
> 
> This also gives us the necessary lockdep checks to ensure the right
> locks are held when modifications are being made.
> 
> And best of all, the wrappers can be generated by macros; they don't
> need to be directly coded and maintained.
> 
> Yes, we have compound state checks, but like page-flags.h we can
> manually implement those few special cases such as this one:
> 

I need to make a statement that the current flag situation is just
horrid. Even ignoring the open-coded access, the real work will boil
down to sanitizing semantics (and hopefully removing numerous flags in
the process).

AFAICS you do agree i_state accesses need to be hidden, you just
disagree with how I did it.

The material difference between your proposal and mine is that you also
hide flags.

I very much would like to see consumers stop messing with them and
instead have consumer use well-defined helpers, but my idea how to get
there boils down to small incremental steps (assert-checked accesses
instead of open-codeding them being one of the first things to do).

Doing it with the current situation looks like a temporary API explosion
to me. I would like sanitized semantics instead, likely avoiding any
need to generate helpers and whatnot. It should also be easier to get
there with the smaller steps. I'm elaborating on this later.

I don't get the rest of the criticism though, most notably this part
from earlier:
> It makes the code harder to read, and now I have to go look at the
> implementation of a set of helper functions to determine if that's
> the right helper to use for the context the code is operating in.

If code like in your proposal does not require checking if that's the
right helper:
 	spin_lock(&inode->i_lock);
 	while (inode_test_dirty(inode)) {
	 	.....

I don't understand how this does:
 	spin_lock(&inode->i_lock);
 	while (inode_state_read(inode) & I_DIRTY) {
	 	.....

The funcs as proposed by me are very much self-documenting.  I would
expect people will need to look at them about once and be done with the
transition.

As I see it, it's the same as direct i_state access, except when you are
operating without the spinlock held you need to spell out you are
knowingly doing it.

The API is:
inode_state_read() -- no qualifiers, so you need the lock
inode_state_add() -- no qualifiers, so you need the lock
inode_state_del() -- no qualifiers, so you need the lock

Note misuse is caught by lockdep, like in your proposal.

inode_state_read_unstable() -- the developer explicitly spells out they
acknowledge i_state can change from under them. routine trivial to find
if you need it. I chose "_unstable" instead of "_unlocked" as the suffix
because the latter leads to fishy-looking code in practice, for example: 
@@ -1638,7 +1638,7 @@ cifs_iget(struct super_block *sb, struct cifs_fattr *fattr)
                cifs_fattr_to_inode(inode, fattr, false);
                if (sb->s_flags & SB_NOATIME)
                        inode->i_flags |= S_NOATIME | S_NOCMTIME;
-               if (inode->i_state & I_NEW) {
+               if (inode_state_read_unstable(inode) & I_NEW) {
                        inode->i_ino = hash;
                        cifs_fscache_get_inode_cookie(inode);
                        unlock_new_inode(inode);

inode_state_read_unlocked() followed by unlock_new_inode() would raise
my eyebrow.

Finally:
inode_state_add_unchecked() -- the developer explicitly asks to forego
sanity checks. this one has few users and is a kludge until vfs gets
better lifecycle tracking (as in it will go away). also note even in
your proposal you would need variants of this sort to account for XFS
not taking the lock

So at the end of it I don't believe my proposal adds any real
work/mental load/whatever on the count of the developer having to use
it. At the same time I claim it is an improvement as is because:
- it prevents surprise reloads thanks to READ_ONCE
- it adds the asserts to most consumers

The last bit helping pave the way to saner internals.

> > @@ -1265,7 +1265,7 @@ void sync_bdevs(bool wait)
> >  		struct block_device *bdev;
> >  
> >  		spin_lock(&inode->i_lock);
> > -		if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW) ||
> > +		if (inode_state_read(inode) & (I_FREEING|I_WILL_FREE|I_NEW) ||
> 
> -		if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW) ||
> +		if (inode_test_new_or_freeing(inode)) ||
> 
> bool inode_test_new_or_freeing(struct inode *inode)
> {
> 	lockdep_assert_held(&inode->i_lock);
> 	return inode->i_state & (I_FREEING | I_WILL_FREE | I_NEW);
> }
> 
> Or if we want to avoid directly using flags in these wrappers,
> we write them like this:
> 
> bool inode_test_new_or_freeing(struct inode *inode)
> {
> 	return inode_test_freeing(inode) ||
> 		inode_test_will_free(inode) ||
> 		inode_test_new(inode);
> }
> 

This I_FREEING, I_WILL_FREE and I_NEW stuff serves as a great example
why the kernel would use quite a bit of sanitizing before one rolls with
helpers hiding flag usage.

There are tests for:
- I_FREEING
- I_FREEING | I_WILL_FREE
- I_FREEING | I_NEW
- I_FREEING | I_WILL_FREE | I_NEW

I_WILL_FREE needs to die and I have WIP to do it (different than the
thing I posted). Reasoning about the flag is already convoluted and
would be much easier to review if it was not preceeded by introduction
of soon-to-go-away helpers.  Note I don't consider "inode->i_state &
I_FLAG" replaced with "inode_state_read(inode) & I_FLAG" to constitute a
readability problem (if anything it helps because you know the lock is
held at that spot).

Say I_WILL_FREE is gone. Even then the current tests are pretty wierd,
because they are *sometimes* in the vicinity of tests for I_CREATING.
The flag is inconsitently used and at best the inode hash APIs need some
sanitizing to make it clear what's going on, at worst some of it is
plain bugs. I may end up writing about it separately.

I_FREEING | I_NEW checks are also stemming from the kernel not having a
flag to denote "the inode is ready to use" (i.e., it should probably
grow a flag to explicitly say it. or maybe use a completely separate
mechanism (I mentioned enums as one idea in my responses to the refcount
patchset)).

And so on.

So assuming someone(tm) will clean these problems up (I intend to sort
out I_WILL_FREE, I don't know about the rest), vast majority of helpers
which would need to be added now will be stale immediately after.
I don't see any value spending time/churn on them.

> Writing the compound wrappers this way then allows future
> improvements such as changing the state flags to atomic bitops so
> we can remove all the dependencies on holding inode->i_lock to
> manipulate state flags safely.
> 
> Hence I think moving the state flags behind an API similar to folio
> state flags makes the code easier to read and use correctly, whilst
> also providing the checking that the correct locks are held at the
> correct times. It also makes it  easier to further improve the
> implementation in future because all the users of the API are
> completely isolated from the implmentation....
> 

So I think with the assumption someone would go with your proposal, but
also start sanitizing all the behavior (whacking I_WILL_FREE and so on),
I think the kernel would end up in a similar spot to the one I'm aiming
for.

However, I claim my steps are more feasible to go with.