linux-ext4 - Re: [PATCH v3 4/7] ext4: Add support for EXT4_GET_BLOCKS_QUERY_LEAF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87o6vvyqpe.fsf@gmail.com>
Date: Thu, 15 May 2025 00:17:41 +0530
From: Ritesh Harjani (IBM) <ritesh.list@...il.com>
To: "Darrick J. Wong" <djwong@...nel.org>
Cc: linux-ext4@...r.kernel.org, Theodore Ts'o <tytso@....edu>, Jan Kara <jack@...e.cz>, John Garry <john.g.garry@...cle.com>, Ojaswin Mujoo <ojaswin@...ux.ibm.com>, linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH v3 4/7] ext4: Add support for EXT4_GET_BLOCKS_QUERY_LEAF_BLOCKS

"Darrick J. Wong" <djwong@...nel.org> writes:

> On Fri, May 09, 2025 at 02:20:34AM +0530, Ritesh Harjani (IBM) wrote:
>> There can be a case where there are contiguous extents on the adjacent
>> leaf nodes of on-disk extent trees. So when someone tries to write to
>> this contiguous range, ext4_map_blocks() call will split by returning
>> 1 extent at a time if this is not already cached in extent_status tree
>> cache (where if these extents when cached can get merged since they are
>> contiguous).
>> 
>> This is fine for a normal write however in case of atomic writes, it
>> can't afford to break the write into two. Now this is also something
>> that will only happen in the slow write case where we call
>> ext4_map_blocks() for each of these extents spread across different leaf
>> nodes. However, there is no guarantee that these extent status cache
>> cannot be reclaimed before the last call to ext4_map_blocks() in
>> ext4_map_blocks_atomic_write_slow().
>
> Can you have two physically and logically contiguous mappings within a
> single leaf node?

On disk extent tree can merge two such blocks if it is within the same
leaf node. But there can be a case where there are two logically and
physically contiguous mappings lying on two different leaf nodes. 
(since on disk extent tree does not merge extents across branches.)

In that case ext4_map_blocks() can only return only 1 mapping at a time
(unless it is cached in extent status cache).

> Or is the key idea here that the extent status tree
> will merge adjacent mappings from the same leaf block, just not between
> leaf blocks?
>

Yes, in memory extent status cache can still merge this. But there can
be a case (we can argue in this case it may practically never happen)
that, the extent status cache got pruned due to memory pressure and we
have to look over on-disk extent tree. In that case we will need to look
ahead in the adjacent leaf block to see if we have a contiguous mapping.
Otherwise the atomic write will always fail over such contiguous region
split across two leaf nodes.

>> Hence this patch adds support of EXT4_GET_BLOCKS_QUERY_LEAF_BLOCKS.
>> This flag checks if the requested range can be fully found in extent
>> status cache and return. If not, it looks up in on-disk extent
>> tree via ext4_map_query_blocks(). If the found extent is the last entry
>> in the leaf node, then it goes and queries the next lblk to see if there
>> is an adjacent contiguous extent in the adjacent leaf node of the
>> on-disk extent tree.
>> 
>> Even though there can be a case where there are multiple adjacent extent
>> entries spread across multiple leaf nodes. But we only read an adjacent
>> leaf block i.e. in total of 2 extent entries spread across 2 leaf nodes.
>> The reason for this is that we are mostly only going to support atomic
>> writes with upto 64KB or maybe max upto 1MB of atomic write support.
>> 
>> Co-developed-by: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
>> Signed-off-by: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
>> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@...il.com>
>> ---
>>  fs/ext4/ext4.h    | 18 ++++++++-
>>  fs/ext4/extents.c | 12 ++++++
>>  fs/ext4/inode.c   | 97 +++++++++++++++++++++++++++++++++++++++++------
>>  3 files changed, 115 insertions(+), 12 deletions(-)
>> 
>> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
>> index e2b36a3c1b0f..b4bbe2837423 100644
>> --- a/fs/ext4/ext4.h
>> +++ b/fs/ext4/ext4.h
>> @@ -256,9 +256,19 @@ struct ext4_allocation_request {
>>  #define EXT4_MAP_UNWRITTEN	BIT(BH_Unwritten)
>>  #define EXT4_MAP_BOUNDARY	BIT(BH_Boundary)
>>  #define EXT4_MAP_DELAYED	BIT(BH_Delay)
>> +/*
>> + * This is for use in ext4_map_query_blocks() for a special case where we can
>> + * have a physically and logically contiguous blocks explit across two leaf
>
> s/explit/split/ ?

Thanks! Will fix it.

-ritesh

>
> --D
>