lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260207175522.GB87551@macsyma.lan>
Date: Sat, 7 Feb 2026 12:55:22 -0500
From: "Theodore Tso" <tytso@....edu>
To: Mario Lohajner <mario_lohajner@...ketmail.com>
Cc: Baokun Li <libaokun1@...wei.com>, adilger.kernel@...ger.ca,
        linux-ext4@...r.kernel.org, linux-kernel@...r.kernel.org,
        Yang Erkun <yangerkun@...wei.com>, libaokun9@...il.com
Subject: Re: [PATCH] ext4: add optional rotating block allocation policy

On Sat, Feb 07, 2026 at 01:45:06PM +0100, Mario Lohajner wrote:
> The pattern I keep referring to as “observable in practice” is about
> repeated free -> reallocate cycles, allocator restart points, and reuse
> bias - i.e., which regions of the address space are revisited most
> frequently over time.

But you haven't proved that this *matters*.  You need to justify
**why** we should care about portions of the address space are
revisted more frequently.  Why is it worth code complexity and
mainteance overhead?

"Because" is not an sufficient answer.

> The question I’m raising is much narrower: whether allocator
> policy choices can unintentionally reinforce reuse patterns under
> certain workloads - and whether offering an *alternative policy* is
> reasonable (I dare to say; in some cases more optimal).

Optimal WHY?  You have yet to show anything other than wear leveling
why reusing portions of the LBA space is problematic, and why avoiding
said reuse might be worthwhile.

In fact, there is an argument to be made that an SSD-specific
allocation algorithm which aggressively tries to reuse recently
deleted blocks would result in better performance.  Why?  Because it
is an implicit discard --- overwriting the LBA tells the Flash
Translation Layer that the previous contents of the flash associated
with the LBA is no longer needed, without the overhead of sending an
explicit discard request.  Discards are expensive for the FTL, and so
when they have a lot of I/O pressure, some FTL implementations will
just ignore the discard request in favor of serving immediate I/O
requests, even if this results in more garbage collection overhead
later.

However, we've never done this because it wasn't clear the complexity
was worth it --- and whenever you make changes to the block allocation
algorithm, it's important to make sure performance and file
fragmentation works well across a large number of workloads and a wide
variety of different flash storage devices --- and both when the file
system is freshly formatted, but also after the equivalent of years of
file system aging (that is, after long-term use).  For more
information, see [1][2][3].

[1] https://www.cs.williams.edu/~jannen/teaching/s21/cs333/meetings/aging.html
[2] https://www.usenix.org/conference/hotstorage19/presentation/conway
[3] https://dl.acm.org/doi/10.1145/258612.258689

So an SSD-specific allocation policy which encourages and embraces
reuse of LBA's (and NOT avoiding reuse) has a lot more theoretical and
principled support.  But despite that, the questions of "is this
really worth the extra complexity", and "can we make sure that it
works well across a wide variety of workloads and with both new and
aged file systems" haven't been answered satisfactorily yet.

The way to answer these questions would require running benchmarks and
file system aging tools, such as those described in [3], while
creating prototype changes.  Hand-waving is enough for the creation of
prototypes and proof-of-concept patches.  But it's not enough for
something that we would merge into the upstream kernel.

Cheers,

							- Ted

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ