linux-kernel - Re: [RFC v3 1/2] ext4: fast_commit: assert i_data

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a507a2ce-a2ae-4592-b171-63974034fc1b@huaweicloud.com>
Date: Wed, 7 Jan 2026 10:00:23 +0800
From: Zhang Yi <yi.zhang@...weicloud.com>
To: Li Chen <me@...ux.beauty>
Cc: linux-ext4 <linux-ext4@...r.kernel.org>, Theodore Ts'o <tytso@....edu>,
 Andreas Dilger <adilger.kernel@...ger.ca>,
 linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [RFC v3 1/2] ext4: fast_commit: assert i_data_sem only before
 sleep

On 1/6/2026 8:18 PM, Li Chen wrote:
> Hi Zhang Yi,
> 
>  ---- On Mon, 05 Jan 2026 20:18:42 +0800  Zhang Yi <yi.zhang@...weicloud.com> wrote --- 
>  > Hi Li,
>  > 
>  > On 12/24/2025 11:29 AM, Li Chen wrote:
>  > > ext4_fc_track_inode() can return without sleeping when
>  > > EXT4_STATE_FC_COMMITTING is already clear. The lockdep assertion for
>  > > ei->i_data_sem was done unconditionally before the wait loop, which can
>  > > WARN in call paths that hold i_data_sem even though we never block. Move
>  > > lockdep_assert_not_held(&ei->i_data_sem) into the actual sleep path,
>  > > right before schedule().
>  > > 
>  > > Signed-off-by: Li Chen <me@...ux.beauty>
>  > 
>  > Thank you for the fix patch! However, the solution does not seem to fix
>  > the issue. IIUC, the root cause of this issue is the following race
>  > condition (show only one case), and it may cause a real ABBA dead lock
>  > issue.
>  > 
>  > ext4_map_blocks()
>  >  hold i_data_sem // <- A
>  >  ext4_mb_new_blocks()
>  >   ext4_dirty_inode()
>  >                                  ext4_fc_commit()
>  >                                   ext4_fc_perform_commit()
>  >                                    set EXT4_STATE_FC_COMMITTING  <-B
>  >                                    ext4_fc_write_inode_data()
>  >                                    ext4_map_blocks()
>  >                                     hold i_data_sem  // <- A
>  >    ext4_fc_track_inode()
>  >     wait EXT4_STATE_FC_COMMITTING  <- B
>  >                                   jbd2_fc_end_commit()
>  >                                    ext4_fc_cleanup()
>  >                                     clear EXT4_STATE_FC_COMMITTING()
>  > 
>  > Postponing the lockdep assertion to the point where sleeping is actually
>  > necessary does not resolve this deadlock issue, it merely masks the
>  > problem, right?
>  > 
>  > I currently don't quite understand why only ext4_fc_track_inode() needs
>  > to wait for the inode being fast committed to be completed, instead of
>  > adding it to the FC_Q_STAGING list like other tracking operations.

It seems that the inode metadata of the tracked inode was not recorded
during the __track_inode(), so the inode metadata committed at commit
time reflects real-time data. However, the current
ext4_fc_perform_commit() lacks concurrency control, allowing other
processes to simultaneously initiate new handles that modify the inode
metadata while the previous metadata is being fast committed. Therefore,
to prevent recording newly changed inode metadata during the old commit
phase, the ext4_fc_track_inode() function must wait for the ongoing
commit process to complete before modifying.

>  > So
>  > now I don't have a good idea to fix this problem either.  Perhaps we
>  > need to rethink the necessity of this waiting, or find a way to avoid
>  > acquiring i_data_sem during fast commit.

Ha, the solution seems to have already been listed in the TODOs in
fast_commit.c.

  Change ext4_fc_commit() to lookup logical to physical mapping using extent
  status tree. This would get rid of the need to call ext4_fc_track_inode()
  before acquiring i_data_sem. To do that we would need to ensure that
  modified extents from the extent status tree are not evicted from memory.

Alternatively, recording the mapped range of tracking might also be
feasible.

Thanks,
Yi.

> 
> Thanks a lot for your kind review! I'll provide feedback tomorrow.
> 
> Regards,
> Li
>