linux-ext4 - Re: [PATCH] ext4: defer updating i

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230327113403.25iafutxorqezugs@quack3>
Date:   Mon, 27 Mar 2023 13:34:03 +0200
From:   Jan Kara <jack@...e.cz>
To:     Chung-Chiang Cheng <shepjeng@...il.com>
Cc:     Jan Kara <jack@...e.cz>, Chung-Chiang Cheng <cccheng@...ology.com>,
        linux-ext4@...r.kernel.org, tytso@....edu,
        adilger.kernel@...ger.ca, yi.zhang@...wei.com, kernel@...heng.net,
        Robbie Ko <robbieko@...ology.com>
Subject: Re: [PATCH] ext4: defer updating i_disksize until endio

On Mon 27-03-23 18:28:55, Chung-Chiang Cheng wrote:
> On Mon, Mar 27, 2023 at 5:29 PM Jan Kara <jack@...e.cz> wrote:
> >
> > As Zhang Yi already noted in his review, this is expected at least with
> > data=writeback mount option. With data=ordered this should not happen
> > though as the commit of the transaction with i_disksize update will wait
> > for page writeback to complete (this is exactly the reason why data=ordered
> > exists after all). Are you able to observe this problem with data=ordered
> > mount option?
> >
> >                                                                 Honza
> 
> It's a pity that this issue also occurs with data=ordered due to delayed
> allocation being enabled by default. If delayed allocation were disabled,
> it would not be as easy to reproduce.

Ah, ok. With data=ordered and expanding within the last block, you are
right you can see zeros at the end of the file after a crash. We were
discussing this in the past already but decided not to improve this because
the fix would have performance cost we didn't want to impose on users.

> This is because if data is written to the end of a file and the block is
> allocated, the new i_disksize will be immediately committed to the journal
> at ext4_da_write_end(), but the writeback procedure is not yet triggered.
> By default, ext4 commits the journal every 5 seconds, but a dirty page may
> not be written back until 30 seconds later. This is not a short time window,
> and any improper shutdown during this time may lead to the issue :(

Yeah, I agree. The time window is not small. What we could do and what
could even bring some performance benefit is if we moved the i_disksize
update from ext4_da_write_end() to ext4_do_writepages(). Currently we do
the i_disksize update only in mpage_map_and_submit_extent() but we could
add a similar logic when exiting from ext4_do_writepages() to update
i_disksize for written back pages beyond i_disksize which didn't need block
allocation. *Except* there is a problem that we couldn't do this i_disksize
update when the pages are written from jbd2 during ordered data writeback
(we cannot start transaction in that context). And this is nasty because
we will completely loose the i_disksize update. We could handle it by
redirtying the tail page in this case but it gets a bit ugly...

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR