[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <924cfe2c-318c-493f-89a5-10849bfc7a00@huaweicloud.com>
Date: Tue, 20 May 2025 20:46:51 +0800
From: Zhang Yi <yi.zhang@...weicloud.com>
To: Jan Kara <jack@...e.cz>
Cc: linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, willy@...radead.org, tytso@....edu,
adilger.kernel@...ger.ca, yi.zhang@...wei.com, libaokun1@...wei.com,
yukuai3@...wei.com, yangerkun@...wei.com
Subject: Re: [PATCH v2 4/8] ext4/jbd2: convert jbd2_journal_blocks_per_page()
to support large folio
On 2025/5/20 4:16, Jan Kara wrote:
> On Mon 12-05-25 14:33:15, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@...wei.com>
>>
>> jbd2_journal_blocks_per_page() returns the number of blocks in a single
>> page. Rename it to jbd2_journal_blocks_per_folio() and make it returns
>> the number of blocks in the largest folio, preparing for the calculation
>> of journal credits blocks when allocating blocks within a large folio in
>> the writeback path.
>>
>> Signed-off-by: Zhang Yi <yi.zhang@...wei.com>
> ...
>> @@ -2657,9 +2657,10 @@ void jbd2_journal_ack_err(journal_t *journal)
>> write_unlock(&journal->j_state_lock);
>> }
>>
>> -int jbd2_journal_blocks_per_page(struct inode *inode)
>> +int jbd2_journal_blocks_per_folio(struct inode *inode)
>> {
>> - return 1 << (PAGE_SHIFT - inode->i_sb->s_blocksize_bits);
>> + return 1 << (PAGE_SHIFT + mapping_max_folio_order(inode->i_mapping) -
>> + inode->i_sb->s_blocksize_bits);
>> }
>
> FWIW this will result in us reserving some 10k transaction credits for 1k
> blocksize with maximum 2M folio size. That is going to create serious
> pressure on the journalling machinery. For now I guess we are fine
Oooh, indeed, you are right, thanks a lot for pointing this out. As you
mentioned in patch 5, the credits calculation I proposed was incorrect,
I thought it wouldn't require too many credits.
I believe it is risky to mount a filesystem with a small journal space
and a large number of block groups. For example, if we build an image
with a 1K block size and a 1MB journal on a 20GB disk (which contains
2,540 groups), it will require 2,263 credits, exceeding the available
journal space.
For now, I'm going to disable large folio support on the filesystem with
limited journal space. i.e., when the return value of
ext4_writepage_trans_blocks() is greater than
jbd2_max_user_trans_buffers(journal) / 2, ext4_should_enable_large_folio()
return false, thoughts?
> but
> eventually we should rewrite how credits for writing out folio are computed
> to reduce this massive overestimation. It will be a bit tricky but we could
> always reserve credits for one / couple of extents and try to extend the
> transaction if we need more. The tricky part is to do the partial folio
> writeout in case we cannot extend the transaction...
>
Yes, this is a feasible solution; however, I prefer to promote the iomap
conversion in the long run.
Thanks,
Yi.
Powered by blists - more mailing lists