lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <65eb0ff3-07e6-422b-9c31-c5509fc0b2e7@oracle.com>
Date: Tue, 25 Feb 2025 15:36:22 -0500
From: Sidhartha Kumar <sidhartha.kumar@...cle.com>
To: "Liam R. Howlett" <Liam.Howlett@...cle.com>, linux-kernel@...r.kernel.org,
        maple-tree@...ts.infradead.org, linux-mm@...ck.org,
        akpm@...ux-foundation.org, richard.weiyang@...il.com
Subject: Re: [PATCH v2 5/6] maple_tree: add sufficient height

On 2/25/25 11:02 AM, Liam R. Howlett wrote:
> * Sidhartha Kumar <sidhartha.kumar@...cle.com> [250221 11:36]:
>> If a parent node is vacant but holds mt_min_slots + 1 entries,
>> rebalancing with a leaf node could cause this parent node to become
>> insufficient. This will lead to another level of rebalancing in the tree
>> and requires more node allocations. Therefore, we also have to track the
>> level at which there is a node with > mt_min_slots entries. We can use
>> this as the worst case for the spanning and rebalacning stores.
> 
> This may not explain the situation fully; We also have to track the last
> level at which there is a node that will not become insufficient.  We
> know that during rebalance, the number of entries in a non-leaf node may
> decrease by one.  Tracking the last node that will remain sufficient and
> stop the cascading operation can be used to reduce the number of nodes
> preallocated for the operation.
> 
> Note that this can happen at any level of an operation and not just a
> node containing leaves.
> 
> The spanning store operation can also be treated the same because the
> walk down the tree stops when it is detected.  That means the location
> of the walk that detects the spanning store may be reduced to be
> insufficient and will be rebalanced or may be split and need to absorb
> up to two entries.
> 
> I think this commit needs some more text explaining these changes.
> 

Does this commit message work better?


Using vacant height to reduce the worst case maple node allocation count 
can lead to a shortcoming of nodes in the following scenarios.

For rebalancing writes, when a leaf node becomes insufficient, we push 
the now insufficient number of entries into a sibling node. This means 
that the parent node which has entries for this children will lose one 
entry. If this parent node was only sufficient because it had the 
minimum number of entries to be sufficient, losing one entry will now 
cause this parent node to be insufficient. This leads to a cascading 
operation of rebalancing at different levels and can lead to more node 
allocations that simply using vacant height can return.

For spanning writes, a similar situation occurs. At the location at 
which a spanning write is detected, the number of ancestor nodes may 
similarly need to rebalanced into a smaller number of nodes and the same 
cascading situation could occur.

To use less than the full height of the tree for the number of 
allocations, we also need to track the height at which a non-leaf node 
cannot become insufficient. This means even if a rebalance occurs to a 
child of this node, it currently has enough entries that losing one 
entry will not cause this node to be insufficient. This field is stored 
in the maple write state as sufficient height. In mas_prealloc_calc() 
when figuring out how many nodes to allocate, we check if the the vacant 
node is lower in the tree than a sufficient node (has a larger value). 
If it is, we cannot use the vacant height and must use the different in 
the height and sufficient height as the basis for the number of nodes 
needed.




> 
>>
>> Signed-off-by: Sidhartha Kumar <sidhartha.kumar@...cle.com>
>> ---
>>   include/linux/maple_tree.h       |  4 +++-
>>   lib/maple_tree.c                 | 17 +++++++++++++++--
>>   tools/testing/radix-tree/maple.c | 28 ++++++++++++++++++++++++++++
>>   3 files changed, 46 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
>> index 7d777aa2d9ed..37dc9525dff6 100644
>> --- a/include/linux/maple_tree.h
>> +++ b/include/linux/maple_tree.h
>> @@ -464,6 +464,7 @@ struct ma_wr_state {
>>   	void *entry;			/* The entry to write */
>>   	void *content;			/* The existing entry that is being overwritten */
>>   	unsigned char vacant_height;	/* Depth of lowest node with free space */
>> +	unsigned char sufficient_height;/* Depth of lowest node with min sufficiency + 1 nodes */
>>   };
>>   
>>   #define mas_lock(mas)           spin_lock(&((mas)->tree->ma_lock))
>> @@ -499,7 +500,8 @@ struct ma_wr_state {
>>   		.mas = ma_state,					\
>>   		.content = NULL,					\
>>   		.entry = wr_entry,					\
>> -		.vacant_height = 0					\
>> +		.vacant_height = 0,					\
>> +		.sufficient_height = 0					\
>>   	}
>>   
>>   #define MA_TOPIARY(name, tree)						\
>> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
>> index 4de257003251..8fdd3f477198 100644
>> --- a/lib/maple_tree.c
>> +++ b/lib/maple_tree.c
>> @@ -3558,6 +3558,13 @@ static bool mas_wr_walk(struct ma_wr_state *wr_mas)
>>   		if (mas->end < mt_slots[wr_mas->type] - 1)
>>   			wr_mas->vacant_height = mas->depth + 1;
>>   
>> +		if (ma_is_root(mas_mn(mas))) {
>> +			/* root needs more than 2 entries to be sufficient + 1 */
>> +			if (mas->end > 2)
>> +				wr_mas->sufficient_height = 1;
>> +		} else if (mas->end > mt_min_slots[wr_mas->type] + 1)
>> +			wr_mas->sufficient_height = mas->depth + 1;
>> +
>>   		mas_wr_walk_traverse(wr_mas);
>>   	}
>>   
>> @@ -4193,13 +4200,19 @@ static inline int mas_prealloc_calc(struct ma_wr_state *wr_mas, void *entry)
>>   			ret = 0;
>>   		break;
>>   	case wr_spanning_store:
>> -		ret = height * 3 + 1;
>> +		if (wr_mas->sufficient_height < wr_mas->vacant_height)
>> +			ret = (height - wr_mas->sufficient_height) * 3 + 1;
>> +		else
>> +			ret = delta * 3 + 1;
> 
> Ah, ret was short lived.  Okay.
> 
> I still think this stuff needs some more context in the commit message.
> 
>>   		break;
>>   	case wr_split_store:
>>   		ret = delta * 2 + 1;
>>   		break;
>>   	case wr_rebalance:
>> -		ret = height * 2 + 1;
>> +		if (wr_mas->sufficient_height < wr_mas->vacant_height)
>> +			ret = (height - wr_mas->sufficient_height) * 2 + 1;
>> +		else
>> +			ret = delta * 2 + 1;
>>   		break;
>>   	case wr_node_store:
>>   		ret = mt_in_rcu(mas->tree) ? 1 : 0;
>> diff --git a/tools/testing/radix-tree/maple.c b/tools/testing/radix-tree/maple.c
>> index d22c1008dffe..d40f70671cb8 100644
>> --- a/tools/testing/radix-tree/maple.c
>> +++ b/tools/testing/radix-tree/maple.c
>> @@ -36334,6 +36334,30 @@ static noinline void __init check_mtree_dup(struct maple_tree *mt)
>>   
>>   extern void test_kmem_cache_bulk(void);
>>   
>> +/*
>> + * Test to check the path of a spanning rebalance which results in
>> + * a collapse where the rebalancing of the child node leads to
>> + * insufficieny in the parent node.
>> + */
>> +static void check_collapsing_rebalance(struct maple_tree *mt)
>> +{
>> +	int i = 0;
>> +	MA_STATE(mas, mt, ULONG_MAX, ULONG_MAX);
>> +
>> +	/* create a height 4 tree */
>> +	while (mt_height(mt) < 4) {
>> +		mtree_store_range(mt, i, i + 10, xa_mk_value(i), GFP_KERNEL);
>> +		i += 9;
>> +	}
>> +
>> +	/* delete all entries one at a time, starting from the right */
>> +	do {
>> +		mas_erase(&mas);
>> +	} while (mas_prev(&mas, 0) != NULL);
>> +
>> +	mtree_unlock(mt);
>> +}
>> +
>>   /* callback function used for check_nomem_writer_race() */
>>   static void writer2(void *maple_tree)
>>   {
>> @@ -36500,6 +36524,10 @@ void farmer_tests(void)
>>   	check_spanning_write(&tree);
>>   	mtree_destroy(&tree);
>>   
>> +	mt_init_flags(&tree, MT_FLAGS_ALLOC_RANGE);
>> +	check_collapsing_rebalance(&tree);
>> +	mtree_destroy(&tree);
>> +
>>   	mt_init_flags(&tree, MT_FLAGS_ALLOC_RANGE);
>>   	check_null_expand(&tree);
>>   	mtree_destroy(&tree);
>> -- 
>> 2.43.0
>>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ