linux-kernel - Re: [PATCH] ext4: add optional rotating block allocation policy

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <f4451542-1804-4417-84a8-6e3630da9da7@rocketmail.com>
Date: Sun, 8 Feb 2026 05:11:14 +0100
From: Mario Lohajner <mario_lohajner@...ketmail.com>
To: Theodore Tso <tytso@....edu>
Cc: Baokun Li <libaokun1@...wei.com>, adilger.kernel@...ger.ca,
 linux-ext4@...r.kernel.org, linux-kernel@...r.kernel.org,
 Yang Erkun <yangerkun@...wei.com>, libaokun9@...il.com
Subject: Re: [PATCH] ext4: add optional rotating block allocation policy

On 07. 02. 2026. 18:55, Theodore Tso wrote:
> On Sat, Feb 07, 2026 at 01:45:06PM +0100, Mario Lohajner wrote:
>> The pattern I keep referring to as “observable in practice” is about
>> repeated free -> reallocate cycles, allocator restart points, and reuse
>> bias - i.e., which regions of the address space are revisited most
>> frequently over time.
> 
> But you haven't proved that this *matters*.  You need to justify
> **why** we should care about portions of the address space are
> revisted more frequently.  Why is it worth code complexity and
> mainteance overhead?
> 
> "Because" is not an sufficient answer.
> 
>> The question I’m raising is much narrower: whether allocator
>> policy choices can unintentionally reinforce reuse patterns under
>> certain workloads - and whether offering an *alternative policy* is
>> reasonable (I dare to say; in some cases more optimal).
> 
> Optimal WHY?  You have yet to show anything other than wear leveling
> why reusing portions of the LBA space is problematic, and why avoiding
> said reuse might be worthwhile.
> 
> In fact, there is an argument to be made that an SSD-specific
> allocation algorithm which aggressively tries to reuse recently
> deleted blocks would result in better performance.  Why?  Because it
> is an implicit discard --- overwriting the LBA tells the Flash
> Translation Layer that the previous contents of the flash associated
> with the LBA is no longer needed, without the overhead of sending an
> explicit discard request.  Discards are expensive for the FTL, and so
> when they have a lot of I/O pressure, some FTL implementations will
> just ignore the discard request in favor of serving immediate I/O
> requests, even if this results in more garbage collection overhead
> later.
> 
> However, we've never done this because it wasn't clear the complexity
> was worth it --- and whenever you make changes to the block allocation
> algorithm, it's important to make sure performance and file
> fragmentation works well across a large number of workloads and a wide
> variety of different flash storage devices --- and both when the file
> system is freshly formatted, but also after the equivalent of years of
> file system aging (that is, after long-term use).  For more
> information, see [1][2][3].
> 
> [1] https://www.cs.williams.edu/~jannen/teaching/s21/cs333/meetings/aging.html
> [2] https://www.usenix.org/conference/hotstorage19/presentation/conway
> [3] https://dl.acm.org/doi/10.1145/258612.258689
> 
> So an SSD-specific allocation policy which encourages and embraces
> reuse of LBA's (and NOT avoiding reuse) has a lot more theoretical and
> principled support.  But despite that, the questions of "is this
> really worth the extra complexity", and "can we make sure that it
> works well across a wide variety of workloads and with both new and
> aged file systems" haven't been answered satisfactorily yet.
> 
> The way to answer these questions would require running benchmarks and
> file system aging tools, such as those described in [3], while
> creating prototype changes.  Hand-waving is enough for the creation of
> prototypes and proof-of-concept patches.  But it's not enough for
> something that we would merge into the upstream kernel.
> 
> Cheers,
> 
> 							- Ted

It seems your mind is set :-(
-do allow me to briefly reiterate the relevant points.

*
The original allocator remains *intact and unchanged*.
The proposed allocator employes a *simple tweak*
(as recognised by a maintainer)

*
Clear allocator separation *guarantees* that nothing in the regular
allocator is disturbed or influenced by the proposal.

*
The goal of proposed allocator/policy is prioritizing sequential
distribution (round-robin) over strict locality.
(acknowledged by another maintainer)

*
Patch is implemented as optional mount option "-o rotalloc"
(disabled by default)

This patch consists of:
1) trivial group counter “cursor”,
2) trivial vectored allocator,
3) trivial simple proposed allocator "simply enforcing the cursor”


*** To reset the semantics clearly:

This is an *optional* alternative allocator that trades locality
for distribution.

The intent is deterministic sequential distribution (aka round-robin)
across the full LBA space, *agnostic* of the underlying device behavior.

It operates at the block/group allocation level!

Best regards,
Mario Lohajner manjo