lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <558c7f74-3d0a-4394-b9ab-3eafab136a23@huaweicloud.com>
Date: Sat, 21 Jun 2025 15:46:40 +0800
From: Zhang Yi <yi.zhang@...weicloud.com>
To: Jan Kara <jack@...e.cz>
Cc: linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org,
 linux-kernel@...r.kernel.org, tytso@....edu, adilger.kernel@...ger.ca,
 ojaswin@...ux.ibm.com, yi.zhang@...wei.com, libaokun1@...wei.com,
 yukuai3@...wei.com, yangerkun@...wei.com
Subject: Re: [PATCH v2 5/6] ext4/jbd2: reintroduce
 jbd2_journal_blocks_per_page()

On 2025/6/20 0:44, Jan Kara wrote:
> On Wed 11-06-25 19:16:24, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@...wei.com>
>>
>> This partially reverts commit d6bf294773a4 ("ext4/jbd2: convert
>> jbd2_journal_blocks_per_page() to support large folio"). This
>> jbd2_journal_blocks_per_folio() will lead to a significant
>> overestimation of journal credits. Since we still reserve credits for
>> one page and attempt to extend and restart handles during large folio
>> writebacks, so we should convert this helper back to
>> ext4_journal_blocks_per_page().
>>
>> Signed-off-by: Zhang Yi <yi.zhang@...wei.com>
> 
> Here I'm not decided. Does it make any particular sense to reserve credits
> for one *page* worth of blocks when pages don't have any particular meaning
> in our writeback code anymore? We could reserve credits just for one
> physical extent and that should be enough.

Indeed, reserving credits for a single page is no longer suitable in the
currently folio based context. It do seems more appropriate to allocate
credits for a single extent.

> For blocksize == pagesize (most
> common configs) this would be actually equivalent. If blocksize < pagesize,
> this could force us to do some more writeback retries and thus get somewhat
> higher writeback CPU overhead but do we really care for these configs?  It
> is well possible I've overlooked something and someone will spot a
> performance regression in practical setup with this in which case we'd have
> to come up with something more clever but I think it's worth it to start
> simple and complicate later.

This can indeed be a problem if the file system is already fragmented
enough. However, thanks to the credits extension logic in
__ext4_journal_ensure_credits(), I suppose that on most file system images,
it will not trigger excessive retry operations. Besides, although there
might be some lock cost in jbd2_journal_extend(), I suppose it won't be a
big deal.

Perhaps we could reserve more credits through a complex formula at the
outset, which would lower the cost of expanding the credits. But I don't
think this will help much in reducing the number of retries, it may only
be helpful in extreme cases (the running transaction stats to commit, we
cannot extend it).

So, I think we can implement it by reserving for an extent for the time
being. Do you agree?

Thanks,
Yi.

> 
> 
>> ---
>>  fs/ext4/ext4_jbd2.h  | 7 +++++++
>>  fs/ext4/inode.c      | 6 +++---
>>  fs/jbd2/journal.c    | 6 ++++++
>>  include/linux/jbd2.h | 1 +
>>  4 files changed, 17 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
>> index 63d17c5201b5..c0ee756cb34c 100644
>> --- a/fs/ext4/ext4_jbd2.h
>> +++ b/fs/ext4/ext4_jbd2.h
>> @@ -326,6 +326,13 @@ static inline int ext4_journal_blocks_per_folio(struct inode *inode)
>>  	return 0;
>>  }
>>  
>> +static inline int ext4_journal_blocks_per_page(struct inode *inode)
>> +{
>> +	if (EXT4_JOURNAL(inode) != NULL)
>> +		return jbd2_journal_blocks_per_page(inode);
>> +	return 0;
>> +}
>> +
>>  static inline int ext4_journal_force_commit(journal_t *journal)
>>  {
>>  	if (journal)
>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>> index 67e37dd546eb..9835145b1b27 100644
>> --- a/fs/ext4/inode.c
>> +++ b/fs/ext4/inode.c
>> @@ -2556,7 +2556,7 @@ static int mpage_map_and_submit_extent(handle_t *handle,
>>   */
>>  static int ext4_da_writepages_trans_blocks(struct inode *inode)
>>  {
>> -	int bpp = ext4_journal_blocks_per_folio(inode);
>> +	int bpp = ext4_journal_blocks_per_page(inode);
>>  
>>  	return ext4_meta_trans_blocks(inode,
>>  				MAX_WRITEPAGES_EXTENT_LEN + bpp - 1, bpp);
>> @@ -2634,7 +2634,7 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
>>  	ext4_lblk_t lblk;
>>  	struct buffer_head *head;
>>  	handle_t *handle = NULL;
>> -	int bpp = ext4_journal_blocks_per_folio(mpd->inode);
>> +	int bpp = ext4_journal_blocks_per_page(mpd->inode);
>>  
>>  	if (mpd->wbc->sync_mode == WB_SYNC_ALL || mpd->wbc->tagged_writepages)
>>  		tag = PAGECACHE_TAG_TOWRITE;
>> @@ -6255,7 +6255,7 @@ int ext4_meta_trans_blocks(struct inode *inode, int lblocks, int pextents)
>>   */
>>  int ext4_writepage_trans_blocks(struct inode *inode)
>>  {
>> -	int bpp = ext4_journal_blocks_per_folio(inode);
>> +	int bpp = ext4_journal_blocks_per_page(inode);
>>  	int ret;
>>  
>>  	ret = ext4_meta_trans_blocks(inode, bpp, bpp);
>> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
>> index d480b94117cd..7fccb425907f 100644
>> --- a/fs/jbd2/journal.c
>> +++ b/fs/jbd2/journal.c
>> @@ -84,6 +84,7 @@ EXPORT_SYMBOL(jbd2_journal_start_commit);
>>  EXPORT_SYMBOL(jbd2_journal_force_commit_nested);
>>  EXPORT_SYMBOL(jbd2_journal_wipe);
>>  EXPORT_SYMBOL(jbd2_journal_blocks_per_folio);
>> +EXPORT_SYMBOL(jbd2_journal_blocks_per_page);
>>  EXPORT_SYMBOL(jbd2_journal_invalidate_folio);
>>  EXPORT_SYMBOL(jbd2_journal_try_to_free_buffers);
>>  EXPORT_SYMBOL(jbd2_journal_force_commit);
>> @@ -2661,6 +2662,11 @@ int jbd2_journal_blocks_per_folio(struct inode *inode)
>>  		     inode->i_sb->s_blocksize_bits);
>>  }
>>  
>> +int jbd2_journal_blocks_per_page(struct inode *inode)
>> +{
>> +	return 1 << (PAGE_SHIFT - inode->i_sb->s_blocksize_bits);
>> +}
>> +
>>  /*
>>   * helper functions to deal with 32 or 64bit block numbers.
>>   */
>> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
>> index 43b9297fe8a7..f35369c104ba 100644
>> --- a/include/linux/jbd2.h
>> +++ b/include/linux/jbd2.h
>> @@ -1724,6 +1724,7 @@ static inline int tid_geq(tid_t x, tid_t y)
>>  }
>>  
>>  extern int jbd2_journal_blocks_per_folio(struct inode *inode);
>> +extern int jbd2_journal_blocks_per_page(struct inode *inode);
>>  extern size_t journal_tag_bytes(journal_t *journal);
>>  
>>  static inline int jbd2_journal_has_csum_v2or3(journal_t *journal)
>> -- 
>> 2.46.1
>>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ