lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250822-orcas-bemannten-728c9946b160@brauner>
Date: Fri, 22 Aug 2025 13:08:07 +0200
From: Christian Brauner <brauner@...nel.org>
To: Josef Bacik <josef@...icpanda.com>
Cc: linux-fsdevel@...r.kernel.org, linux-btrfs@...r.kernel.org, 
	kernel-team@...com, linux-ext4@...r.kernel.org, linux-xfs@...r.kernel.org, 
	viro@...iv.linux.org.uk
Subject: Re: [PATCH 02/50] fs: make the i_state flags an enum

On Thu, Aug 21, 2025 at 04:18:13PM -0400, Josef Bacik wrote:
> Adjusting i_state flags always means updating the values manually. Bring
> these forward into the 2020's and make a nice clean macro for defining
> the i_state values as an enum, providing __ variants for the cases where
> we need the bit position instead of the actual value, and leaving the
> actual NAME as the 1U << bit value.
> 
> Signed-off-by: Josef Bacik <josef@...icpanda.com>
> ---
>  include/linux/fs.h | 234 +++++++++++++++++++++++----------------------
>  1 file changed, 122 insertions(+), 112 deletions(-)
> 
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 9a1ce67eed33..e741dc453c2c 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -665,6 +665,127 @@ is_uncached_acl(struct posix_acl *acl)
>  #define IOP_MGTIME	0x0020
>  #define IOP_CACHED_LINK	0x0040
>  
> +/*
> + * Inode state bits.  Protected by inode->i_lock
> + *
> + * Four bits determine the dirty state of the inode: I_DIRTY_SYNC,
> + * I_DIRTY_DATASYNC, I_DIRTY_PAGES, and I_DIRTY_TIME.
> + *
> + * Four bits define the lifetime of an inode.  Initially, inodes are I_NEW,
> + * until that flag is cleared.  I_WILL_FREE, I_FREEING and I_CLEAR are set at
> + * various stages of removing an inode.
> + *
> + * Two bits are used for locking and completion notification, I_NEW and I_SYNC.
> + *
> + * I_DIRTY_SYNC		Inode is dirty, but doesn't have to be written on
> + *			fdatasync() (unless I_DIRTY_DATASYNC is also set).
> + *			Timestamp updates are the usual cause.
> + * I_DIRTY_DATASYNC	Data-related inode changes pending.  We keep track of
> + *			these changes separately from I_DIRTY_SYNC so that we
> + *			don't have to write inode on fdatasync() when only
> + *			e.g. the timestamps have changed.
> + * I_DIRTY_PAGES	Inode has dirty pages.  Inode itself may be clean.
> + * I_DIRTY_TIME		The inode itself has dirty timestamps, and the
> + *			lazytime mount option is enabled.  We keep track of this
> + *			separately from I_DIRTY_SYNC in order to implement
> + *			lazytime.  This gets cleared if I_DIRTY_INODE
> + *			(I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set. But
> + *			I_DIRTY_TIME can still be set if I_DIRTY_SYNC is already
> + *			in place because writeback might already be in progress
> + *			and we don't want to lose the time update
> + * I_NEW		Serves as both a mutex and completion notification.
> + *			New inodes set I_NEW.  If two processes both create
> + *			the same inode, one of them will release its inode and
> + *			wait for I_NEW to be released before returning.
> + *			Inodes in I_WILL_FREE, I_FREEING or I_CLEAR state can
> + *			also cause waiting on I_NEW, without I_NEW actually
> + *			being set.  find_inode() uses this to prevent returning
> + *			nearly-dead inodes.
> + * I_WILL_FREE		Must be set when calling write_inode_now() if i_count
> + *			is zero.  I_FREEING must be set when I_WILL_FREE is
> + *			cleared.
> + * I_FREEING		Set when inode is about to be freed but still has dirty
> + *			pages or buffers attached or the inode itself is still
> + *			dirty.
> + * I_CLEAR		Added by clear_inode().  In this state the inode is
> + *			clean and can be destroyed.  Inode keeps I_FREEING.
> + *
> + *			Inodes that are I_WILL_FREE, I_FREEING or I_CLEAR are
> + *			prohibited for many purposes.  iget() must wait for
> + *			the inode to be completely released, then create it
> + *			anew.  Other functions will just ignore such inodes,
> + *			if appropriate.  I_NEW is used for waiting.
> + *
> + * I_SYNC		Writeback of inode is running. The bit is set during
> + *			data writeback, and cleared with a wakeup on the bit
> + *			address once it is done. The bit is also used to pin
> + *			the inode in memory for flusher thread.
> + *
> + * I_REFERENCED		Marks the inode as recently references on the LRU list.
> + *
> + * I_WB_SWITCH		Cgroup bdi_writeback switching in progress.  Used to
> + *			synchronize competing switching instances and to tell
> + *			wb stat updates to grab the i_pages lock.  See
> + *			inode_switch_wbs_work_fn() for details.
> + *
> + * I_OVL_INUSE		Used by overlayfs to get exclusive ownership on upper
> + *			and work dirs among overlayfs mounts.
> + *
> + * I_CREATING		New object's inode in the middle of setting up.
> + *
> + * I_DONTCACHE		Evict inode as soon as it is not used anymore.
> + *
> + * I_SYNC_QUEUED	Inode is queued in b_io or b_more_io writeback lists.
> + *			Used to detect that mark_inode_dirty() should not move
> + *			inode between dirty lists.
> + *
> + * I_PINNING_FSCACHE_WB	Inode is pinning an fscache object for writeback.
> + *
> + * I_LRU_ISOLATING	Inode is pinned being isolated from LRU without holding
> + *			i_count.
> + *
> + * Q: What is the difference between I_WILL_FREE and I_FREEING?
> + *
> + * __I_{SYNC,NEW,LRU_ISOLATING} are used to derive unique addresses to wait
> + * upon. There's one free address left.
> + */
> +
> +/*
> + * As simple macro to define the inode state bits, __NAME will be the bit value
> + * (0, 1, 2, ...), and NAME will be the bit mask (1U << __NAME). The __NAME_SEQ
> + * is used to reset the sequence number so the next name gets the next bit value
> + * in the sequence.
> + */
> +#define INODE_BIT(name)			\
> +	__ ## name,			\
> +	name = (1U << __ ## name),	\
> +	__ ## name ## _SEQ = __ ## name

I'm not sure if this is the future we want :D
I think it's harder to parse than what we have now.

> +
> +enum inode_state_bits {
> +	INODE_BIT(I_NEW),
> +	INODE_BIT(I_SYNC),
> +	INODE_BIT(I_LRU_ISOLATING),
> +	INODE_BIT(I_DIRTY_SYNC),
> +	INODE_BIT(I_DIRTY_DATASYNC),
> +	INODE_BIT(I_DIRTY_PAGES),
> +	INODE_BIT(I_WILL_FREE),
> +	INODE_BIT(I_FREEING),
> +	INODE_BIT(I_CLEAR),
> +	INODE_BIT(I_REFERENCED),
> +	INODE_BIT(I_LINKABLE),
> +	INODE_BIT(I_DIRTY_TIME),
> +	INODE_BIT(I_WB_SWITCH),
> +	INODE_BIT(I_OVL_INUSE),
> +	INODE_BIT(I_CREATING),
> +	INODE_BIT(I_DONTCACHE),
> +	INODE_BIT(I_SYNC_QUEUED),
> +	INODE_BIT(I_PINNING_NETFS_WB),
> +};

Good idea but I really dislike this macro indirection.
Can't we just do the really boring?

enum inode_state_bits {
	__I_BIT_NEW		= 0U
	__I_BIT_SYNC		= 1U
	__I_BIT_LRU_ISOLATING	= 2U
}

enum inode_state_flags_t {
	I_NEW			= (1U << __I_BIT_NEW)
	I_SYNC			= (1U << __I_BIT_SYNC)
	I_LRU_ISOLATING		= (1U << __I_BIT_LRU_ISOLATING)
	I_DIRTY_SYNC		= (1U << 3)
	I_DIRTY_DATASYNC	= (1U << 4)
	I_DIRTY_PAGES		= (1U << 5)
	I_WILL_FREE		= (1U << 6)
	I_FREEING		= (1U << 7)
	I_CLEAR			= (1U << 8)
	I_REFERENCED		= (1U << 9)
	I_LINKABLE		= (1U << 10)
	I_DIRTY_TIME		= (1U << 11)
	I_WB_SWITCH		= (1U << 12)
	I_OVL_INUSE		= (1U << 13)
	I_CREATING		= (1U << 14)
	I_DONTCACHE		= (1U << 15)
	I_SYNC_QUEUED		= (1U << 16)
	I_PINNING_NETFS_WB	= (1U << 17)
};

Note that inode_state_wait_address() and that only works on four bits so
we can't really use higher bits anyway without switching back to a
scheme where we have to use unsigned long and waste for bytes for
nothing on 64 bit.

With that out of the way,

Reviewed-by: Christian Brauner <brauner@...nel.org>

> +#define I_DIRTY_INODE (I_DIRTY_SYNC | I_DIRTY_DATASYNC)
> +#define I_DIRTY (I_DIRTY_INODE | I_DIRTY_PAGES)
> +#define I_DIRTY_ALL (I_DIRTY | I_DIRTY_TIME)
> +
>  /*
>   * Keep mostly read-only and often accessed (especially for
>   * the RCU path lookup and 'stat' data) fields at the beginning
> @@ -723,7 +844,7 @@ struct inode {
>  #endif
>  
>  	/* Misc */
> -	u32			i_state;
> +	enum inode_state_bits	i_state;
>  	/* 32-bit hole */
>  	struct rw_semaphore	i_rwsem;
>  
> @@ -2484,117 +2605,6 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
>  	};
>  }
>  
> -/*
> - * Inode state bits.  Protected by inode->i_lock
> - *
> - * Four bits determine the dirty state of the inode: I_DIRTY_SYNC,
> - * I_DIRTY_DATASYNC, I_DIRTY_PAGES, and I_DIRTY_TIME.
> - *
> - * Four bits define the lifetime of an inode.  Initially, inodes are I_NEW,
> - * until that flag is cleared.  I_WILL_FREE, I_FREEING and I_CLEAR are set at
> - * various stages of removing an inode.
> - *
> - * Two bits are used for locking and completion notification, I_NEW and I_SYNC.
> - *
> - * I_DIRTY_SYNC		Inode is dirty, but doesn't have to be written on
> - *			fdatasync() (unless I_DIRTY_DATASYNC is also set).
> - *			Timestamp updates are the usual cause.
> - * I_DIRTY_DATASYNC	Data-related inode changes pending.  We keep track of
> - *			these changes separately from I_DIRTY_SYNC so that we
> - *			don't have to write inode on fdatasync() when only
> - *			e.g. the timestamps have changed.
> - * I_DIRTY_PAGES	Inode has dirty pages.  Inode itself may be clean.
> - * I_DIRTY_TIME		The inode itself has dirty timestamps, and the
> - *			lazytime mount option is enabled.  We keep track of this
> - *			separately from I_DIRTY_SYNC in order to implement
> - *			lazytime.  This gets cleared if I_DIRTY_INODE
> - *			(I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set. But
> - *			I_DIRTY_TIME can still be set if I_DIRTY_SYNC is already
> - *			in place because writeback might already be in progress
> - *			and we don't want to lose the time update
> - * I_NEW		Serves as both a mutex and completion notification.
> - *			New inodes set I_NEW.  If two processes both create
> - *			the same inode, one of them will release its inode and
> - *			wait for I_NEW to be released before returning.
> - *			Inodes in I_WILL_FREE, I_FREEING or I_CLEAR state can
> - *			also cause waiting on I_NEW, without I_NEW actually
> - *			being set.  find_inode() uses this to prevent returning
> - *			nearly-dead inodes.
> - * I_WILL_FREE		Must be set when calling write_inode_now() if i_count
> - *			is zero.  I_FREEING must be set when I_WILL_FREE is
> - *			cleared.
> - * I_FREEING		Set when inode is about to be freed but still has dirty
> - *			pages or buffers attached or the inode itself is still
> - *			dirty.
> - * I_CLEAR		Added by clear_inode().  In this state the inode is
> - *			clean and can be destroyed.  Inode keeps I_FREEING.
> - *
> - *			Inodes that are I_WILL_FREE, I_FREEING or I_CLEAR are
> - *			prohibited for many purposes.  iget() must wait for
> - *			the inode to be completely released, then create it
> - *			anew.  Other functions will just ignore such inodes,
> - *			if appropriate.  I_NEW is used for waiting.
> - *
> - * I_SYNC		Writeback of inode is running. The bit is set during
> - *			data writeback, and cleared with a wakeup on the bit
> - *			address once it is done. The bit is also used to pin
> - *			the inode in memory for flusher thread.
> - *
> - * I_REFERENCED		Marks the inode as recently references on the LRU list.
> - *
> - * I_WB_SWITCH		Cgroup bdi_writeback switching in progress.  Used to
> - *			synchronize competing switching instances and to tell
> - *			wb stat updates to grab the i_pages lock.  See
> - *			inode_switch_wbs_work_fn() for details.
> - *
> - * I_OVL_INUSE		Used by overlayfs to get exclusive ownership on upper
> - *			and work dirs among overlayfs mounts.
> - *
> - * I_CREATING		New object's inode in the middle of setting up.
> - *
> - * I_DONTCACHE		Evict inode as soon as it is not used anymore.
> - *
> - * I_SYNC_QUEUED	Inode is queued in b_io or b_more_io writeback lists.
> - *			Used to detect that mark_inode_dirty() should not move
> - * 			inode between dirty lists.
> - *
> - * I_PINNING_FSCACHE_WB	Inode is pinning an fscache object for writeback.
> - *
> - * I_LRU_ISOLATING	Inode is pinned being isolated from LRU without holding
> - *			i_count.
> - *
> - * Q: What is the difference between I_WILL_FREE and I_FREEING?
> - *
> - * __I_{SYNC,NEW,LRU_ISOLATING} are used to derive unique addresses to wait
> - * upon. There's one free address left.
> - */
> -#define __I_NEW			0
> -#define I_NEW			(1 << __I_NEW)
> -#define __I_SYNC		1
> -#define I_SYNC			(1 << __I_SYNC)
> -#define __I_LRU_ISOLATING	2
> -#define I_LRU_ISOLATING		(1 << __I_LRU_ISOLATING)
> -
> -#define I_DIRTY_SYNC		(1 << 3)
> -#define I_DIRTY_DATASYNC	(1 << 4)
> -#define I_DIRTY_PAGES		(1 << 5)
> -#define I_WILL_FREE		(1 << 6)
> -#define I_FREEING		(1 << 7)
> -#define I_CLEAR			(1 << 8)
> -#define I_REFERENCED		(1 << 9)
> -#define I_LINKABLE		(1 << 10)
> -#define I_DIRTY_TIME		(1 << 11)
> -#define I_WB_SWITCH		(1 << 12)
> -#define I_OVL_INUSE		(1 << 13)
> -#define I_CREATING		(1 << 14)
> -#define I_DONTCACHE		(1 << 15)
> -#define I_SYNC_QUEUED		(1 << 16)
> -#define I_PINNING_NETFS_WB	(1 << 17)
> -
> -#define I_DIRTY_INODE (I_DIRTY_SYNC | I_DIRTY_DATASYNC)
> -#define I_DIRTY (I_DIRTY_INODE | I_DIRTY_PAGES)
> -#define I_DIRTY_ALL (I_DIRTY | I_DIRTY_TIME)
> -
>  extern void __mark_inode_dirty(struct inode *, int);
>  static inline void mark_inode_dirty(struct inode *inode)
>  {
> -- 
> 2.49.0
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ