linux-kernel - Re: ext4 writeback performance issue in 6.12

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ytvfwystemt45b32upwcwdtpl4l32ym6qtclll55kyyllayqsh@g4kakuary2qw>
Date: Thu, 9 Oct 2025 14:29:07 +0200
From: Jan Kara <jack@...e.cz>
To: Matt Fleming <matt@...dmodwrite.com>
Cc: Jan Kara <jack@...e.cz>, adilger.kernel@...ger.ca, 
	kernel-team@...udflare.com, libaokun1@...wei.com, linux-ext4@...r.kernel.org, 
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org, tytso@....edu, willy@...radead.org
Subject: Re: ext4 writeback performance issue in 6.12

On Thu 09-10-25 11:17:48, Matt Fleming wrote:
> On Wed, Oct 08, 2025 at 06:35:29PM +0200, Jan Kara wrote:
> > On Wed 08-10-25 16:07:05, Matt Fleming wrote:
> > So this particular hang check warning will be silenced by [1]. That being
> > said if the writeback is indeed taking longer than expected (depends on
> > cgroup configuration etc.) these patches will obviously not fix it. Based
> > on what you write below, are you saying that most of the time from these
> > 225s is spent in the filesystem allocating blocks? I'd expect we'd spend
> > most of the time waiting for IO to complete...
>  
> Yeah, you're right. Most of the time is spenting waiting for writeback
> to complete.

OK, so even if we reduce the somewhat pointless CPU load in the allocator
you aren't going to see substantial increase in your writeback throughput.
Reducing the CPU load is obviously a worthy goal but I'm not sure if that's
your motivation or something else that I'm missing :).

> > So I'm somewhat confused here. How big is the allocation request? Above you
> > write that average size of order 9 bucket is < 1280 which is true and
> > makes me assume the allocation is for 1 stripe which is 1280 blocks. But
> > here you write about order 9 allocation.
>  
> Sorry, I muddled my words. The allocation request is for 1280 blocks.

OK, thanks for confirmation.

> > Anyway, stripe aligned allocations don't always play well with
> > mb_optimize_scan logic, so you can try mounting the filesystem with
> > mb_optimize_scan=0 mount option.
> 
> Thanks, but unfortunately running with mb_optimize_scan=0 gives us much
> worse performance. It looks like it's taking a long time to write out
> even 1 page to disk. The flusher thread has been running for 20+hours
> now non-stop and it's blocking tasks waiting on writeback.

OK, so clearly (based on the perf results you've posted) mb_optimize_scan
does significantly reduce the pointless scanning for free space (in the
past we had some pathological cases when it was making things worse). Just
there's still some pointless scanning left. Then, as Ted writes, removing
the stripe mount option might be another way how to reduce the scanning. 

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR