linux-kernel - Re: [PATCH] ext4: publish jinode after initialization

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <19b98ef7441.2e449c605279413.2845425093226852052@linux.beauty>
Date: Wed, 07 Jan 2026 22:49:48 +0800
From: Li Chen <me@...ux.beauty>
To: "Jan Kara" <jack@...e.cz>
Cc: "Theodore Ts'o" <tytso@....edu>,
	"Andreas Dilger" <adilger.kernel@...ger.ca>,
	"linux-ext4" <linux-ext4@...r.kernel.org>,
	"linux-kernel" <linux-kernel@...r.kernel.org>,
	"stable" <stable@...r.kernel.org>
Subject: Re: [PATCH] ext4: publish jinode after initialization

Hi Jan

Thanks a lot for your detailed review!

 ---- On Wed, 07 Jan 2026 02:06:53 +0800  Jan Kara <jack@...e.cz> wrote --- 
 > On Fri 26-12-25 13:02:20, Li Chen wrote:
 > > ext4_inode_attach_jinode() publishes ei->jinode to concurrent users.
 > > It assigned ei->jinode before calling jbd2_journal_init_jbd_inode().
 > > 
 > > This allows another thread to observe a non-NULL jinode with i_vfs_inode
 > > still unset. The fast commit flush path can then pass this jinode to
 > > jbd2_wait_inode_data(), which dereferences i_vfs_inode->i_mapping and may
 > > crash.
 > > 
 > > Below is the crash I observe:
 > > ```
 > > BUG: unable to handle page fault for address: 000000010beb47f4
 > > PGD 110e51067 P4D 110e51067 PUD 0
 > > Oops: Oops: 0000 [#1] SMP NOPTI
 > > CPU: 1 UID: 0 PID: 4850 Comm: fc_fsync_bench_ Not tainted 6.18.0-00764-g795a690c06a5 #1 PREEMPT(voluntary)
 > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.17.0-2-2 04/01/2014
 > > RIP: 0010:xas_find_marked+0x3d/0x2e0
 > > Code: e0 03 48 83 f8 02 0f 84 f0 01 00 00 48 8b 47 08 48 89 c3 48 39 c6 0f 82 fd 01 00 00 48 85 c9 74 3d 48 83 f9 03 77 63 4c 8b 0f <49> 8b 71 08 48 c7 47 18 00 00 00 00 48 89 f1 83 e1 03 48 83 f9 02
 > > RSP: 0018:ffffbbee806e7bf0 EFLAGS: 00010246
 > > RAX: 000000000010beb4 RBX: 000000000010beb4 RCX: 0000000000000003
 > > RDX: 0000000000000001 RSI: 0000002000300000 RDI: ffffbbee806e7c10
 > > RBP: 0000000000000001 R08: 0000002000300000 R09: 000000010beb47ec
 > > R10: ffff9ea494590090 R11: 0000000000000000 R12: 0000002000300000
 > > R13: ffffbbee806e7c90 R14: ffff9ea494513788 R15: ffffbbee806e7c88
 > > FS:  00007fc2f9e3e6c0(0000) GS:ffff9ea6b1444000(0000) knlGS:0000000000000000
 > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 > > CR2: 000000010beb47f4 CR3: 0000000119ac5000 CR4: 0000000000750ef0
 > > PKRU: 55555554
 > > Call Trace:
 > >  <TASK>
 > >  filemap_get_folios_tag+0x87/0x2a0
 > >  __filemap_fdatawait_range+0x5f/0xd0
 > >  ? srso_alias_return_thunk+0x5/0xfbef5
 > >  ? __schedule+0x3e7/0x10c0
 > >  ? srso_alias_return_thunk+0x5/0xfbef5
 > >  ? srso_alias_return_thunk+0x5/0xfbef5
 > >  ? srso_alias_return_thunk+0x5/0xfbef5
 > >  ? preempt_count_sub+0x5f/0x80
 > >  ? srso_alias_return_thunk+0x5/0xfbef5
 > >  ? cap_safe_nice+0x37/0x70
 > >  ? srso_alias_return_thunk+0x5/0xfbef5
 > >  ? preempt_count_sub+0x5f/0x80
 > >  ? srso_alias_return_thunk+0x5/0xfbef5
 > >  filemap_fdatawait_range_keep_errors+0x12/0x40
 > >  ext4_fc_commit+0x697/0x8b0
 > >  ? ext4_file_write_iter+0x64b/0x950
 > >  ? srso_alias_return_thunk+0x5/0xfbef5
 > >  ? preempt_count_sub+0x5f/0x80
 > >  ? srso_alias_return_thunk+0x5/0xfbef5
 > >  ? vfs_write+0x356/0x480
 > >  ? srso_alias_return_thunk+0x5/0xfbef5
 > >  ? preempt_count_sub+0x5f/0x80
 > >  ext4_sync_file+0xf7/0x370
 > >  do_fsync+0x3b/0x80
 > >  ? syscall_trace_enter+0x108/0x1d0
 > >  __x64_sys_fdatasync+0x16/0x20
 > >  do_syscall_64+0x62/0x2c0
 > >  entry_SYSCALL_64_after_hwframe+0x76/0x7e
 > > ...
 > > ```
 > > 
 > > To fix this issue, initialize the jbd2_inode first and only then publish
 > > the pointer with smp_store_release(). Use smp_load_acquire() at the read
 > > sites to pair with the release and ensure the initialized fields are visible.
 > > 
 > > On x86 (TSO), the crash should primarily be due to the logical early publish
 > > window (another CPU can run between the store and initialization). x86
 > > also relies on compiler ordering; the acquire/release helpers include
 > > the necessary compiler barriers while keeping the fast-path cheap.
 > > 
 > > On weakly-ordered architectures (e.g. arm64/ppc), plain "init; store ptr"
 > > is not sufficient: without release/acquire, a reader may observe the
 > > pointer while still missing prior initialization stores. The explicit
 > > pairing makes this publish/consume relationship correct under LKMM.
 > > 
 > > Fixes: a361293f5fede ("jbd2: Fix oops in jbd2_journal_file_inode()")
 > > Cc: stable@...r.kernel.org
 > > Signed-off-by: Li Chen <me@...ux.beauty>
 > 
 > Thanks for the analysis and the patch. I think using smp_load_acquire() for
 > reading EXT4_I(inode)->jinode is a bit of an overkill since all we care
 > about is that EXT4_I(inode)->jinode->foo loads see the initialized content
 > of jinode. So it should be enough to just do smp_wmb() between inode
 > initialization and setting of EXT4_I(inode)->jinode and then use
 > READ_ONCE() to read EXT4_I(inode)->jinode, shouldn't it?

Agreed. I'll do it in this way in next version.
 
 > Also I think the problem, in particular with fastcommit code, goes further
 > than just the initialization. Generally jbd2_inode is modified under
 > journal->j_list_lock however readers such as jbd2_submit_inode_data() or
 > jbd2_wait_inode_data() and respective filesystem implementations
 > ocfs2_journal_submit_inode_data_buffers() and
 > ext4_journal_submit_inode_data_buffers() don't hold j_list_lock when
 > reading fields from jbd2_inode. So we should be using READ_ONCE /
 > WRITE_ONCE for reading / updating jbd2_inode. But thinking about it that's
 > a separate problem so let's deal with the init issues first.
 
Makes sense. For this patch I'd still focus only on the
publish-before-init window. And I would try to convert the lockless jbd2_inode field reads
to READ_ONCE/WRITE_ONCE in another patchset.

Regards,
Li