linux-ext4 - Re: [PATCH] ext4: defer updating i

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHuHWtnPtMscpO+fxYncQ_K+T+j2AGxG75kimoL6gp5Q__GfTw@mail.gmail.com>
Date:   Wed, 29 Mar 2023 11:36:46 +0800
From:   Chung-Chiang Cheng <shepjeng@...il.com>
To:     Zhang Yi <yi.zhang@...wei.com>, Jan Kara <jack@...e.cz>
Cc:     Chung-Chiang Cheng <cccheng@...ology.com>,
        linux-ext4@...r.kernel.org, tytso@....edu,
        adilger.kernel@...ger.ca, kernel@...heng.net,
        Robbie Ko <robbieko@...ology.com>
Subject: Re: [PATCH] ext4: defer updating i_disksize until endio

On Mon, Mar 27, 2023 at 7:17 PM Zhang Yi <yi.zhang@...wei.com> wrote:
>
> On 2023/3/27 18:28, Chung-Chiang Cheng wrote:
> > It's a pity that this issue also occurs with data=ordered due to delayed
> > allocation being enabled by default. If delayed allocation were disabled,
> > it would not be as easy to reproduce.
> >
> > This is because if data is written to the end of a file and the block is
> > allocated, the new i_disksize will be immediately committed to the journal
> > at ext4_da_write_end(), but the writeback procedure is not yet triggered.
> > By default, ext4 commits the journal every 5 seconds, but a dirty page may
> > not be written back until 30 seconds later. This is not a short time window,
> > and any improper shutdown during this time may lead to the issue :(
> >

Thank you for the explanation from you and Jan. I agree that it is not the
responsibility of ext4 to provide application consistency, but this case is
not even crash consistent, although no sensitive data were revealed after
crash.

> It seems that the case you've mentioned is intra-block append write (no?),
> current data=ordered mount option doesn't work in this case because
> ext4_map_blocks() doesn't attach inode to the t_inode_list of the running
> transaction. If delayed allocation were disabled, the lose data window is still
> there, because ext4_write_end()->ext4_update_inode_size() is also updating
> i_disksize before writing data back. This is at least guarantee no store data.
> We had discussed this in [1].

Yes, you're right. I've reconfirmed my experiment and determined that this
issue can be reproduced with or without delayed allocation.

I've tried your previous solution of adding the required inode to the current
transaction's ordered data list. It seems to work perfectly for me and simply
solves the issue, but the journal handling needs to be added back to the
delayed allocation write. Does it really have an obvious performance impact?

>
> [1]. https://lore.kernel.org/linux-ext4/1554370192-113254-1-git-send-email-yi.zhang@huawei.com/
>
> Thanks,
> Yi.