linux-kernel - Re: [PATCH 0/2] ext4: fix an data corruption issue in nojournal mode

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4a152e1b-c468-4fbf-ac0b-dbb76fa1e2ac@linux.alibaba.com>
Date: Thu, 2 Oct 2025 19:42:34 +0800
From: Gao Xiang <hsiangkao@...ux.alibaba.com>
To: Zhang Yi <yi.zhang@...weicloud.com>, linux-ext4@...r.kernel.org
Cc: linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
 tytso@....edu, adilger.kernel@...ger.ca, jack@...e.cz, yi.zhang@...wei.com,
 libaokun1@...wei.com, yukuai3@...wei.com, yangerkun@...wei.com
Subject: Re: [PATCH 0/2] ext4: fix an data corruption issue in nojournal mode

Hi Ted,

On 2025/9/16 17:33, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@...wei.com>
> 
> Hello!
> 
> This series fixes an data corruption issue reported by Gao Xiang in
> nojournal mode. The problem is happened after a metadata block is freed,
> it can be immediately reallocated as a data block. However, the metadata
> on this block may still be in the process of being written back, which
> means the new data in this block could potentially be overwritten by the
> stale metadata and trigger a data corruption issue. Please see below
> discussion with Jan for more details:
> 
>    https://lore.kernel.org/linux-ext4/a9417096-9549-4441-9878-b1955b899b4e@huaweicloud.com/
> 
> Patch 1 strengthens the same case in ordered journal mode, theoretically
> preventing the occurrence of stale data issues.
> Patch 2 fix this issue in nojournal mode.

It seems this series is not applied, is it ignored?

When ext4 nojournal mode is used, it is actually a very
serious bug since data corruption can happen very easily
in specific conditions (we actually have a specific
environment which can reproduce the issue very quickly)

Also it seems AWS folks reported this issue years ago
(2021), the phenomenon was almost the same, but the issue
still exists until now:
https://lore.kernel.org/linux-ext4/20211108173520.xp6xphodfhcen2sy@u87e72aa3c6c25c.ant.amazon.com/

Some of our internal businesses actually rely on EXT4
no_journal mode and when they upgrade the kernel from
4.19 to 5.10, they actually read corrupted data after
page cache memory is reclaimed (actually the on-disk
data was corrupted even earlier).

So personally I wonder what's the current status of
EXT4 no_journal mode since this issue has been existing
for more than 5 years but some people may need
an extent-enabled ext2 so they selected this mode.

We already released an announcement to advise customers
not using no_journal mode because it seems lack of
enough maintainence (yet many end users are interested
in this mode):
https://www.alibabacloud.com/help/en/alinux/support/data-corruption-risk-and-solution-in-ext4-nojounral-mode

Thanks,
Gao Xiang

> 
> Regards,
> Yi.
> 
> Zhang Yi (2):
>    jbd2: ensure that all ongoing I/O complete before freeing blocks
>    ext4: wait for ongoing I/O to complete before freeing blocks
> 
>   fs/ext4/ext4_jbd2.c   | 11 +++++++++--
>   fs/jbd2/transaction.c | 13 +++++++++----
>   2 files changed, 18 insertions(+), 6 deletions(-)
>