lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1b93c383-3442-4202-a6c7-068e0ddafe9d@suse.cz>
Date: Mon, 7 Oct 2024 14:14:22 +0200
From: Vlastimil Babka <vbabka@...e.cz>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
 Andrew Morton <akpm@...ux-foundation.org>,
 "Liam R . Howlett" <Liam.Howlett@...cle.com>
Cc: Matthew Wilcox <willy@...radead.org>, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org, Sidhartha Kumar <sidhartha.kumar@...cle.com>,
 Bert Karwatzki <spasswolf@....de>,
 Mikhail Gavrilov <mikhail.v.gavrilov@...il.com>,
 maple-tree@...ts.infradead.org
Subject: Re: [PATCH v2 hotfix 6.12 1/2] maple_tree: correct tree corruption on
 spanning store



On 10/6/24 4:31 PM, Lorenzo Stoakes wrote:
> There has been a subtle bug present in the maple tree implementation from
> its inception.
> 
> This arises from how stores are performed - when a store occurs, it will
> overwrite overlapping ranges and adjust the tree as necessary to
> accommodate this.
> 
> A range may always ultimately span two leaf nodes. In this instance we walk
> the two leaf nodes, determine which elements are not overwritten to the
> left and to the right of the start and end of the ranges respectively and
> then rebalance the tree to contain these entries and the newly inserted
> one.
> 
> This kind of store is dubbed a 'spanning store' and is implemented by
> mas_wr_spanning_store().
> 
> In order to reach this stage, mas_store_gfp() invokes mas_wr_preallocate(),
> mas_wr_store_type() and mas_wr_walk() in turn to walk the tree and update
> the object (mas) to traverse to the location where the write should be
> performed, determining its store type.
> 
> When a spanning store is required, this function returns false stopping at
> the parent node which contains the target range, and mas_wr_store_type()
> marks the mas->store_type as wr_spanning_store to denote this fact.
> 
> When we go to perform the store in mas_wr_spanning_store(), we first
> determine the elements AFTER the END of the range we wish to store (that
> is, to the right of the entry to be inserted) - we do this by walking to
> the NEXT pivot in the tree (i.e. r_mas.last + 1), starting at the node we
> have just determined contains the range over which we intend to write.
> 
> We then turn our attention to the entries to the left of the entry we are
> inserting, whose state is represented by l_mas, and copy these into a 'big
> node', which is a special node which contains enough slots to contain two
> leaf node's worth of data.
> 
> We then copy the entry we wish to store immediately after this - the copy
> and the insertion of the new entry is performed by mas_store_b_node().
> 
> After this we copy the elements to the right of the end of the range which
> we are inserting, if we have not exceeded the length of the node
> (i.e. r_mas.offset <= r_mas.end).
> 
> Herein lies the bug - under very specific circumstances, this logic can
> break and corrupt the maple tree.
> 
> Consider the following tree:
> 
> Height
>   0                             Root Node
>                                  /      \
>                  pivot = 0xffff /        \ pivot = ULONG_MAX
>                                /          \
>   1                       A [-----]       ...
>                              /   \
>              pivot = 0x4fff /     \ pivot = 0xffff
>                            /       \
>   2 (LEAVES)          B [-----]  [-----] C
>                                       ^--- Last pivot 0xffff.
> 
> Now imagine we wish to store an entry in the range [0x4000, 0xffff] (note
> that all ranges expressed in maple tree code are inclusive):
> 
> 1. mas_store_gfp() descends the tree, finds node A at <=0xffff, then
>    determines that this is a spanning store across nodes B and C. The mas
>    state is set such that the current node from which we traverse further
>    is node A.
> 
> 2. In mas_wr_spanning_store() we try to find elements to the right of pivot
>    0xffff by searching for an index of 0x10000:
> 
>     - mas_wr_walk_index() invokes mas_wr_walk_descend() and
>       mas_wr_node_walk() in turn.
> 
>         - mas_wr_node_walk() loops over entries in node A until EITHER it
>           finds an entry whose pivot equals or exceeds 0x10000 OR it
>           reaches the final entry.
> 
>         - Since no entry has a pivot equal to or exceeding 0x10000, pivot
>           0xffff is selected, leading to node C.
> 
>     - mas_wr_walk_traverse() resets the mas state to traverse node C. We
>       loop around and invoke mas_wr_walk_descend() and mas_wr_node_walk()
>       in turn once again.
> 
>          - Again, we reach the last entry in node C, which has a pivot of
>            0xffff.
> 
> 3. We then copy the elements to the left of 0x4000 in node B to the big
>    node via mas_store_b_node(), and insert the new [0x4000, 0xffff] entry
>    too.
> 
> 4. We determine whether we have any entries to copy from the right of the
>    end of the range via - and with r_mas set up at the entry at pivot
>    0xffff, r_mas.offset <= r_mas.end, and then we DUPLICATE the entry at
>    pivot 0xffff.
> 
> 5. BUG! The maple tree is corrupted with a duplicate entry.
> 
> This requires a very specific set of circumstances - we must be spanning
> the last element in a leaf node, which is the last element in the parent
> node.
> 
> spanning store across two leaf nodes with a range that ends at that shared
> pivot.

This looks like missing something to form a complete sentence?

> A potential solution to this problem would simply be to reset the walk each
> time we traverse r_mas, however given the rarity of this situation it seems
> that would be rather inefficient.
> 
> Instead, this patch detects if the right hand node is populated, i.e. has
> anything we need to copy. We can do this easily in mas_wr_walk_index() by
> detecting if the pivot is either 0 (shorthand for the end of the range) or
> the required index is less than or equal to the last encountered pivot.
> 
> This change is made in mas_wr_walk_index() which is only used by the
> spanning store so it has minimal impact.
> 
> The work performed in commit f8d112a4e657 ("mm/mmap: avoid zeroing vma tree
> in mmap_region()") seems to have made the probability of this event much
> more likely, which is the point at which reports started to be submitted
> concerning this bug.
> 
> The motivation for this change arose from Bert Karwatzki's report of
> encountering mm instability after the release of kernel v6.12-rc1 which,
> after the use of CONFIG_DEBUG_VM_MAPLE_TREE and similar configuration
> options, was identified as maple tree corruption.
> 
> After Bert very generously provided his time and ability to reproduce this
> event consistently, I was able to finally identify that the issue discussed
> in this commit message was occurring for him.
> 
> Reported-and-tested-by: Bert Karwatzki <spasswolf@....de>
> Closes: https://lore.kernel.org/all/20241001023402.3374-1-spasswolf@web.de/
> Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@...il.com>
> Closes: https://lore.kernel.org/all/CABXGCsOPwuoNOqSMmAvWO2Fz4TEmPnjFj-b7iF+XFRu1h7-+Dg@mail.gmail.com/
> Fixes: 54a611b60590 ("Maple Tree: add new data structure")
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>

Acked-by: Vlastimil Babka <vbabka@...e.cz>

> ---
>  lib/maple_tree.c | 20 ++++++++++++++++----
>  1 file changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> index 20990ecba2dd..f72e1a5a4dfa 100644
> --- a/lib/maple_tree.c
> +++ b/lib/maple_tree.c
> @@ -2196,6 +2196,8 @@ static inline void mas_node_or_none(struct ma_state *mas,
> 
>  /*
>   * mas_wr_node_walk() - Find the correct offset for the index in the @mas.
> + *                      If @mas->index cannot be found within the containing
> + *                      node, we traverse to the last entry in the node.
>   * @wr_mas: The maple write state
>   *
>   * Uses mas_slot_locked() and does not need to worry about dead nodes.
> @@ -3532,6 +3534,12 @@ static bool mas_wr_walk(struct ma_wr_state *wr_mas)
>  	return true;
>  }
> 
> +/*
> + * Traverse the maple tree until the offset of mas->index is reached.
> + *
> + * Return: Is this node actually populated with entries possessing pivots equal
> + *         to or greater than mas->index?
> + */
>  static bool mas_wr_walk_index(struct ma_wr_state *wr_mas)
>  {
>  	struct ma_state *mas = wr_mas->mas;
> @@ -3540,8 +3548,11 @@ static bool mas_wr_walk_index(struct ma_wr_state *wr_mas)
>  		mas_wr_walk_descend(wr_mas);
>  		wr_mas->content = mas_slot_locked(mas, wr_mas->slots,
>  						  mas->offset);
> -		if (ma_is_leaf(wr_mas->type))
> -			return true;
> +		if (ma_is_leaf(wr_mas->type)) {
> +			unsigned long pivot = wr_mas->pivots[mas->offset];
> +
> +			return pivot == 0 || mas->index <= pivot;
> +		}
>  		mas_wr_walk_traverse(wr_mas);
> 
>  	}
> @@ -3701,6 +3712,7 @@ static noinline void mas_wr_spanning_store(struct ma_wr_state *wr_mas)
>  	struct maple_big_node b_node;
>  	struct ma_state *mas;
>  	unsigned char height;
> +	bool r_populated;
> 
>  	/* Left and Right side of spanning store */
>  	MA_STATE(l_mas, NULL, 0, 0);
> @@ -3742,7 +3754,7 @@ static noinline void mas_wr_spanning_store(struct ma_wr_state *wr_mas)
>  		r_mas.last++;
> 
>  	r_mas.index = r_mas.last;
> -	mas_wr_walk_index(&r_wr_mas);
> +	r_populated = mas_wr_walk_index(&r_wr_mas);
>  	r_mas.last = r_mas.index = mas->last;
> 
>  	/* Set up left side. */
> @@ -3766,7 +3778,7 @@ static noinline void mas_wr_spanning_store(struct ma_wr_state *wr_mas)
>  	/* Copy l_mas and store the value in b_node. */
>  	mas_store_b_node(&l_wr_mas, &b_node, l_mas.end);
>  	/* Copy r_mas into b_node. */
> -	if (r_mas.offset <= r_mas.end)
> +	if (r_populated && r_mas.offset <= r_mas.end)
>  		mas_mab_cp(&r_mas, r_mas.offset, r_mas.end,
>  			   &b_node, b_node.b_end + 1);
>  	else
> --
> 2.46.2

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ