[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f4451542-1804-4417-84a8-6e3630da9da7@rocketmail.com>
Date: Sun, 8 Feb 2026 05:11:14 +0100
From: Mario Lohajner <mario_lohajner@...ketmail.com>
To: Theodore Tso <tytso@....edu>
Cc: Baokun Li <libaokun1@...wei.com>, adilger.kernel@...ger.ca,
linux-ext4@...r.kernel.org, linux-kernel@...r.kernel.org,
Yang Erkun <yangerkun@...wei.com>, libaokun9@...il.com
Subject: Re: [PATCH] ext4: add optional rotating block allocation policy
On 07. 02. 2026. 18:55, Theodore Tso wrote:
> On Sat, Feb 07, 2026 at 01:45:06PM +0100, Mario Lohajner wrote:
>> The pattern I keep referring to as “observable in practice” is about
>> repeated free -> reallocate cycles, allocator restart points, and reuse
>> bias - i.e., which regions of the address space are revisited most
>> frequently over time.
>
> But you haven't proved that this *matters*. You need to justify
> **why** we should care about portions of the address space are
> revisted more frequently. Why is it worth code complexity and
> mainteance overhead?
>
> "Because" is not an sufficient answer.
>
>> The question I’m raising is much narrower: whether allocator
>> policy choices can unintentionally reinforce reuse patterns under
>> certain workloads - and whether offering an *alternative policy* is
>> reasonable (I dare to say; in some cases more optimal).
>
> Optimal WHY? You have yet to show anything other than wear leveling
> why reusing portions of the LBA space is problematic, and why avoiding
> said reuse might be worthwhile.
>
> In fact, there is an argument to be made that an SSD-specific
> allocation algorithm which aggressively tries to reuse recently
> deleted blocks would result in better performance. Why? Because it
> is an implicit discard --- overwriting the LBA tells the Flash
> Translation Layer that the previous contents of the flash associated
> with the LBA is no longer needed, without the overhead of sending an
> explicit discard request. Discards are expensive for the FTL, and so
> when they have a lot of I/O pressure, some FTL implementations will
> just ignore the discard request in favor of serving immediate I/O
> requests, even if this results in more garbage collection overhead
> later.
>
> However, we've never done this because it wasn't clear the complexity
> was worth it --- and whenever you make changes to the block allocation
> algorithm, it's important to make sure performance and file
> fragmentation works well across a large number of workloads and a wide
> variety of different flash storage devices --- and both when the file
> system is freshly formatted, but also after the equivalent of years of
> file system aging (that is, after long-term use). For more
> information, see [1][2][3].
>
> [1] https://www.cs.williams.edu/~jannen/teaching/s21/cs333/meetings/aging.html
> [2] https://www.usenix.org/conference/hotstorage19/presentation/conway
> [3] https://dl.acm.org/doi/10.1145/258612.258689
>
> So an SSD-specific allocation policy which encourages and embraces
> reuse of LBA's (and NOT avoiding reuse) has a lot more theoretical and
> principled support. But despite that, the questions of "is this
> really worth the extra complexity", and "can we make sure that it
> works well across a wide variety of workloads and with both new and
> aged file systems" haven't been answered satisfactorily yet.
>
> The way to answer these questions would require running benchmarks and
> file system aging tools, such as those described in [3], while
> creating prototype changes. Hand-waving is enough for the creation of
> prototypes and proof-of-concept patches. But it's not enough for
> something that we would merge into the upstream kernel.
>
> Cheers,
>
> - Ted
It seems your mind is set :-(
-do allow me to briefly reiterate the relevant points.
*
The original allocator remains *intact and unchanged*.
The proposed allocator employes a *simple tweak*
(as recognised by a maintainer)
*
Clear allocator separation *guarantees* that nothing in the regular
allocator is disturbed or influenced by the proposal.
*
The goal of proposed allocator/policy is prioritizing sequential
distribution (round-robin) over strict locality.
(acknowledged by another maintainer)
*
Patch is implemented as optional mount option "-o rotalloc"
(disabled by default)
This patch consists of:
1) trivial group counter “cursor”,
2) trivial vectored allocator,
3) trivial simple proposed allocator "simply enforcing the cursor”
*** To reset the semantics clearly:
This is an *optional* alternative allocator that trades locality
for distribution.
The intent is deterministic sequential distribution (aka round-robin)
across the full LBA space, *agnostic* of the underlying device behavior.
It operates at the block/group allocation level!
Best regards,
Mario Lohajner manjo
Powered by blists - more mailing lists