lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4jxwogttddiaoqbstlgou5ox6zs27ngjjz5ukrxafm2z5ijxod@so4eqnykiegj>
Date: Tue, 6 Jan 2026 19:06:53 +0100
From: Jan Kara <jack@...e.cz>
To: Li Chen <me@...ux.beauty>
Cc: Theodore Ts'o <tytso@....edu>, 
	Andreas Dilger <adilger.kernel@...ger.ca>, Jan Kara <jack@...e.cz>, linux-ext4@...r.kernel.org, 
	linux-kernel@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH] ext4: publish jinode after initialization

On Fri 26-12-25 13:02:20, Li Chen wrote:
> ext4_inode_attach_jinode() publishes ei->jinode to concurrent users.
> It assigned ei->jinode before calling jbd2_journal_init_jbd_inode().
> 
> This allows another thread to observe a non-NULL jinode with i_vfs_inode
> still unset. The fast commit flush path can then pass this jinode to
> jbd2_wait_inode_data(), which dereferences i_vfs_inode->i_mapping and may
> crash.
> 
> Below is the crash I observe:
> ```
> BUG: unable to handle page fault for address: 000000010beb47f4
> PGD 110e51067 P4D 110e51067 PUD 0
> Oops: Oops: 0000 [#1] SMP NOPTI
> CPU: 1 UID: 0 PID: 4850 Comm: fc_fsync_bench_ Not tainted 6.18.0-00764-g795a690c06a5 #1 PREEMPT(voluntary)
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.17.0-2-2 04/01/2014
> RIP: 0010:xas_find_marked+0x3d/0x2e0
> Code: e0 03 48 83 f8 02 0f 84 f0 01 00 00 48 8b 47 08 48 89 c3 48 39 c6 0f 82 fd 01 00 00 48 85 c9 74 3d 48 83 f9 03 77 63 4c 8b 0f <49> 8b 71 08 48 c7 47 18 00 00 00 00 48 89 f1 83 e1 03 48 83 f9 02
> RSP: 0018:ffffbbee806e7bf0 EFLAGS: 00010246
> RAX: 000000000010beb4 RBX: 000000000010beb4 RCX: 0000000000000003
> RDX: 0000000000000001 RSI: 0000002000300000 RDI: ffffbbee806e7c10
> RBP: 0000000000000001 R08: 0000002000300000 R09: 000000010beb47ec
> R10: ffff9ea494590090 R11: 0000000000000000 R12: 0000002000300000
> R13: ffffbbee806e7c90 R14: ffff9ea494513788 R15: ffffbbee806e7c88
> FS:  00007fc2f9e3e6c0(0000) GS:ffff9ea6b1444000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000010beb47f4 CR3: 0000000119ac5000 CR4: 0000000000750ef0
> PKRU: 55555554
> Call Trace:
>  <TASK>
>  filemap_get_folios_tag+0x87/0x2a0
>  __filemap_fdatawait_range+0x5f/0xd0
>  ? srso_alias_return_thunk+0x5/0xfbef5
>  ? __schedule+0x3e7/0x10c0
>  ? srso_alias_return_thunk+0x5/0xfbef5
>  ? srso_alias_return_thunk+0x5/0xfbef5
>  ? srso_alias_return_thunk+0x5/0xfbef5
>  ? preempt_count_sub+0x5f/0x80
>  ? srso_alias_return_thunk+0x5/0xfbef5
>  ? cap_safe_nice+0x37/0x70
>  ? srso_alias_return_thunk+0x5/0xfbef5
>  ? preempt_count_sub+0x5f/0x80
>  ? srso_alias_return_thunk+0x5/0xfbef5
>  filemap_fdatawait_range_keep_errors+0x12/0x40
>  ext4_fc_commit+0x697/0x8b0
>  ? ext4_file_write_iter+0x64b/0x950
>  ? srso_alias_return_thunk+0x5/0xfbef5
>  ? preempt_count_sub+0x5f/0x80
>  ? srso_alias_return_thunk+0x5/0xfbef5
>  ? vfs_write+0x356/0x480
>  ? srso_alias_return_thunk+0x5/0xfbef5
>  ? preempt_count_sub+0x5f/0x80
>  ext4_sync_file+0xf7/0x370
>  do_fsync+0x3b/0x80
>  ? syscall_trace_enter+0x108/0x1d0
>  __x64_sys_fdatasync+0x16/0x20
>  do_syscall_64+0x62/0x2c0
>  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> ...
> ```
> 
> To fix this issue, initialize the jbd2_inode first and only then publish
> the pointer with smp_store_release(). Use smp_load_acquire() at the read
> sites to pair with the release and ensure the initialized fields are visible.
> 
> On x86 (TSO), the crash should primarily be due to the logical early publish
> window (another CPU can run between the store and initialization). x86
> also relies on compiler ordering; the acquire/release helpers include
> the necessary compiler barriers while keeping the fast-path cheap.
> 
> On weakly-ordered architectures (e.g. arm64/ppc), plain "init; store ptr"
> is not sufficient: without release/acquire, a reader may observe the
> pointer while still missing prior initialization stores. The explicit
> pairing makes this publish/consume relationship correct under LKMM.
> 
> Fixes: a361293f5fede ("jbd2: Fix oops in jbd2_journal_file_inode()")
> Cc: stable@...r.kernel.org
> Signed-off-by: Li Chen <me@...ux.beauty>

Thanks for the analysis and the patch. I think using smp_load_acquire() for
reading EXT4_I(inode)->jinode is a bit of an overkill since all we care
about is that EXT4_I(inode)->jinode->foo loads see the initialized content
of jinode. So it should be enough to just do smp_wmb() between inode
initialization and setting of EXT4_I(inode)->jinode and then use
READ_ONCE() to read EXT4_I(inode)->jinode, shouldn't it?

Also I think the problem, in particular with fastcommit code, goes further
than just the initialization. Generally jbd2_inode is modified under
journal->j_list_lock however readers such as jbd2_submit_inode_data() or
jbd2_wait_inode_data() and respective filesystem implementations
ocfs2_journal_submit_inode_data_buffers() and
ext4_journal_submit_inode_data_buffers() don't hold j_list_lock when
reading fields from jbd2_inode. So we should be using READ_ONCE /
WRITE_ONCE for reading / updating jbd2_inode. But thinking about it that's
a separate problem so let's deal with the init issues first.

								Honza

> ---
>  fs/ext4/ext4_jbd2.h   | 18 ++++++++++++++----
>  fs/ext4/fast_commit.c |  9 +++++++--
>  fs/ext4/inode.c       | 15 +++++++++++----
>  fs/ext4/super.c       | 10 +++++++---
>  4 files changed, 39 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
> index 63d17c5201b5..3bc79b894130 100644
> --- a/fs/ext4/ext4_jbd2.h
> +++ b/fs/ext4/ext4_jbd2.h
> @@ -336,18 +336,28 @@ static inline int ext4_journal_force_commit(journal_t *journal)
>  static inline int ext4_jbd2_inode_add_write(handle_t *handle,
>  		struct inode *inode, loff_t start_byte, loff_t length)
>  {
> -	if (ext4_handle_valid(handle))
> +	if (ext4_handle_valid(handle)) {
> +		struct jbd2_inode *jinode;
> +
> +		/* Pairs with smp_store_release() in ext4_inode_attach_jinode(). */
> +		jinode = smp_load_acquire(&EXT4_I(inode)->jinode);
>  		return jbd2_journal_inode_ranged_write(handle,
> -				EXT4_I(inode)->jinode, start_byte, length);
> +				jinode, start_byte, length);
> +	}
>  	return 0;
>  }
>  
>  static inline int ext4_jbd2_inode_add_wait(handle_t *handle,
>  		struct inode *inode, loff_t start_byte, loff_t length)
>  {
> -	if (ext4_handle_valid(handle))
> +	if (ext4_handle_valid(handle)) {
> +		struct jbd2_inode *jinode;
> +
> +		/* Pairs with smp_store_release() in ext4_inode_attach_jinode(). */
> +		jinode = smp_load_acquire(&EXT4_I(inode)->jinode);
>  		return jbd2_journal_inode_ranged_wait(handle,
> -				EXT4_I(inode)->jinode, start_byte, length);
> +				jinode, start_byte, length);
> +	}
>  	return 0;
>  }
>  
> diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
> index a6e79b3f1b48..3f148c048a6f 100644
> --- a/fs/ext4/fast_commit.c
> +++ b/fs/ext4/fast_commit.c
> @@ -1087,16 +1087,21 @@ static int ext4_fc_flush_data(journal_t *journal)
>  	struct super_block *sb = journal->j_private;
>  	struct ext4_sb_info *sbi = EXT4_SB(sb);
>  	struct ext4_inode_info *ei;
> +	struct jbd2_inode *jinode;
>  	int ret = 0;
>  
>  	list_for_each_entry(ei, &sbi->s_fc_q[FC_Q_MAIN], i_fc_list) {
> -		ret = jbd2_submit_inode_data(journal, ei->jinode);
> +		/* Pairs with smp_store_release() in ext4_inode_attach_jinode(). */
> +		jinode = smp_load_acquire(&ei->jinode);
> +		ret = jbd2_submit_inode_data(journal, jinode);
>  		if (ret)
>  			return ret;
>  	}
>  
>  	list_for_each_entry(ei, &sbi->s_fc_q[FC_Q_MAIN], i_fc_list) {
> -		ret = jbd2_wait_inode_data(journal, ei->jinode);
> +		/* Pairs with smp_store_release() in ext4_inode_attach_jinode(). */
> +		jinode = smp_load_acquire(&ei->jinode);
> +		ret = jbd2_wait_inode_data(journal, jinode);
>  		if (ret)
>  			return ret;
>  	}
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 78ea864fa8cd..74b189c10f2b 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -126,6 +126,9 @@ void ext4_inode_csum_set(struct inode *inode, struct ext4_inode *raw,
>  static inline int ext4_begin_ordered_truncate(struct inode *inode,
>  					      loff_t new_size)
>  {
> +	/* Pairs with smp_store_release() in ext4_inode_attach_jinode(). */
> +	struct jbd2_inode *jinode = smp_load_acquire(&EXT4_I(inode)->jinode);
> +
>  	trace_ext4_begin_ordered_truncate(inode, new_size);
>  	/*
>  	 * If jinode is zero, then we never opened the file for
> @@ -133,10 +136,10 @@ static inline int ext4_begin_ordered_truncate(struct inode *inode,
>  	 * jbd2_journal_begin_ordered_truncate() since there's no
>  	 * outstanding writes we need to flush.
>  	 */
> -	if (!EXT4_I(inode)->jinode)
> +	if (!jinode)
>  		return 0;
>  	return jbd2_journal_begin_ordered_truncate(EXT4_JOURNAL(inode),
> -						   EXT4_I(inode)->jinode,
> +						   jinode,
>  						   new_size);
>  }
>  
> @@ -4497,8 +4500,12 @@ int ext4_inode_attach_jinode(struct inode *inode)
>  			spin_unlock(&inode->i_lock);
>  			return -ENOMEM;
>  		}
> -		ei->jinode = jinode;
> -		jbd2_journal_init_jbd_inode(ei->jinode, inode);
> +		jbd2_journal_init_jbd_inode(jinode, inode);
> +		/*
> +		 * Publish ->jinode only after it is fully initialized so that
> +		 * readers never observe a partially initialized jbd2_inode.
> +		 */
> +		smp_store_release(&ei->jinode, jinode);
>  		jinode = NULL;
>  	}
>  	spin_unlock(&inode->i_lock);
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 43f1ac6e8559..a3f015129c00 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -1513,16 +1513,20 @@ static void destroy_inodecache(void)
>  
>  void ext4_clear_inode(struct inode *inode)
>  {
> +	struct jbd2_inode *jinode;
> +
>  	ext4_fc_del(inode);
>  	invalidate_inode_buffers(inode);
>  	clear_inode(inode);
>  	ext4_discard_preallocations(inode);
>  	ext4_es_remove_extent(inode, 0, EXT_MAX_BLOCKS);
>  	dquot_drop(inode);
> -	if (EXT4_I(inode)->jinode) {
> +	/* Pairs with smp_store_release() in ext4_inode_attach_jinode(). */
> +	jinode = smp_load_acquire(&EXT4_I(inode)->jinode);
> +	if (jinode) {
>  		jbd2_journal_release_jbd_inode(EXT4_JOURNAL(inode),
> -					       EXT4_I(inode)->jinode);
> -		jbd2_free_inode(EXT4_I(inode)->jinode);
> +					       jinode);
> +		jbd2_free_inode(jinode);
>  		EXT4_I(inode)->jinode = NULL;
>  	}
>  	fscrypt_put_encryption_info(inode);
> -- 
> 2.52.0
> 
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ