[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20260206014249.GH31420@macsyma.lan>
Date: Thu, 5 Feb 2026 20:42:49 -0500
From: "Theodore Tso" <tytso@....edu>
To: Mario Lohajner <mario_lohajner@...ketmail.com>
Cc: Baokun Li <libaokun1@...wei.com>, adilger.kernel@...ger.ca,
linux-ext4@...r.kernel.org, linux-kernel@...r.kernel.org,
Yang Erkun <yangerkun@...wei.com>, libaokun9@...il.com
Subject: Re: [PATCH] ext4: add optional rotating block allocation policy
On Thu, Feb 05, 2026 at 01:23:18PM +0100, Mario Lohajner wrote:
> Let me briefly restate the intent, focusing on the fundamentals.
>
> Rotalloc is not wear leveling (and is intentionally not named as such).
> It is a allocation policy whose goal is to reduce allocation hotspots by
> enforcing mount-wide sequential allocation. Wear leveling, if any,
> remains a device/firmware concern and is explicitly out of scope.
> While WL motivated part of this work,
Yes, but *why* are you trying to reduce allocation hotspots? What
problem are you trying to solve? And actually, you are making
allocation hotspots *worse* since with global cursor, by definition
there is a single, super-hotspot. This will cause scalability issues
on a system with multiple CPU's trying to write in parallel.
> the main added value of this patch is allocator separation.
> The policy indirection (aka vectored allocator) allows allocation
> strategies that are orthogonal to the regular allocator to operate
> outside the hot path, preserving existing heuristics and improving
> maintainability.
Allocator separation is not necessarily that an unalloyed good thing.
By having duplicated code, it means that if we need to make a change
in infrastructure code, we might now need to make it in multiple code
paths. It is also one more code path that we have to test and
maintain. So there is a real cost from the perspctive of the upstream
maintenance perspective.
Also, because having a single global allocation point (your "cursor")
is going to absolutely *trash* performance, especially for high speed
NVMe devices connected to high count CPU's, it's not clear to me why
performance is necessary for rotalloc.
> The rotating allocator itself is a working prototype.
> It was written with minimal diff and clarity in mind to make the policy
> reviewable. Refinements and simplifications are expected and welcome.
OK, so this sounds like it's not ready for prime time....
> Regarding discard/trim: while discard prepares blocks for reuse and
> signals that a block is free, it does not implement wear leveling by
> itself. Rotalloc operates at a higher layer; by promoting sequentiality,
> it reduces block/group allocation hotspots regardless of underlying
> device behavior.
> Since it is not in line with the current allocator goals, it is
> implemented as an optional policy.
Again, what is the high level goal of rotalloc? What specific
hardware and workload are you trying to optimize for? If you want to
impose a maintaince overhead on upstream, you need to justify why the
mainteance overhead is worth it. And so that means you need to be a
bit more explicit about what specific real-world solution you are
trying to solve....
- Ted
Powered by blists - more mailing lists