linux-kernel - Re: [PATCH 3/3] ocfs2: use READ_ONCE for lockless jinode reads

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <aXzeDE5P_lxiXfPB@casper.infradead.org>
Date: Fri, 30 Jan 2026 16:36:28 +0000
From: Matthew Wilcox <willy@...radead.org>
To: Li Chen <me@...ux.beauty>
Cc: Mark Fasheh <mark@...heh.com>, Joel Becker <jlbec@...lplan.org>,
	Joseph Qi <joseph.qi@...ux.alibaba.com>,
	ocfs2-devel <ocfs2-devel@...ts.linux.dev>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	Jan Kara <jack@...e.com>
Subject: Re: [PATCH 3/3] ocfs2: use READ_ONCE for lockless jinode reads

On Fri, Jan 30, 2026 at 08:26:40PM +0800, Li Chen wrote:
> Hi Matthew,
> 
>  > On Fri, Jan 30, 2026 at 11:12:32AM +0800, Li Chen wrote:
>  > > ocfs2 journal commit callback reads jbd2_inode dirty range fields without
>  > > holding journal->j_list_lock.
>  > > 
>  > > Use READ_ONCE() for these reads to correct the concurrency assumptions.
>  > 
>  > I don't think this is the right solution to the problem.  If it is,
>  > there needs to be much better argumentation in the commit message.
>  > 
>  > As I understand it, jbd2_journal_file_inode() initialises jinode,
>  > then adds it to the t_inode_list, then drops the j_list_lock.  So the
>  > actual problem we need to address is that there's no memory barrier
>  > between the store to i_dirty_start and the list_add().  Once that's
>  > added, there's no need for a READ_ONCE here.
>  > 
>  > Or have I misunderstood the problem?
> 
> Thanks for the review.
> 
> My understanding of your point is that you're worried about a missing
> "publish" ordering in jbd2_journal_file_inode(): we store
> jinode->i_dirty_start/end and then list_add() the jinode to
> t_inode_list, and a core which observes the list entry might miss the prior
> i_dirty_* stores. Is that the issue you had in mind?

I think that's the only issue that exists ...

> If so, for the normal commit path where the list is walked under
> journal->j_list_lock (e.g. journal_submit_data_buffers() in
> fs/jbd2/commit.c), spin_lock()/spin_unlock() should already provide the
> necessary ordering, since both the i_dirty_* updates and the list_add()
> happen inside the same critical section.

I don't think that's true.  I think what you're asserting is that:

	int *pi;
	int **ppi;

	spin_lock(&lock);
	*pi = 1;
	*ppi = pi;
	spin_unlock(&lock);

that the store to *pi must be observed before the store to *ppi, and
that's not true for a reader which doesn't read the value of lock.
The store to *ppi needs a store barrier before it.

> The ocfs2 case I was aiming at is different: the filesystem callback is
> invoked after unlocking journal->j_list_lock and may sleep, so it can't hold
> j_list_lock but it still reads jinode->i_dirty_start/end while other
> threads update these fields under the lock. Adding a barrier between the
> stores and list_add() would not address that concurrent update window.

I don't think that race exists.  If it does exist, the READ_ONCE will
not help (on 32 bit platforms) because it's a 64-bit quantity and 32-bit
platforms do not, in general, have a way to do an atomic 64-bit load
(look at the implementation of i_size_read() for the gyrations we go
through to assure a non-torn read of that value).

> "ocfs2 reads jinode->i_dirty_start/end without journal->j_list_lock
> (callback may sleep); these fields are updated under j_list_lock in jbd2.
> Use READ_ONCE() so the callback takes a single snapshot via actual loads
> from the variable (i.e. don't let the compiler reuse a value kept in a register
> or fold multiple reads)."

I think the prevention of this race occurs at a higher level than
"it's updated under a lock".  That is, jbd2_journal_file_inode()
is never called for a jinode which is currently being operated on by
j_submit_inode_data_buffers().  Now, I'm not an expert on the jbd code,
so I may be wrong here.