lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 28 Sep 2023 15:59:26 -0400
From:   "Liam R. Howlett" <Liam.Howlett@...cle.com>
To:     Mirsad Todorovac <mirsad.todorovac@....unizg.hr>
Cc:     maple-tree@...ts.infradead.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: BUG: maple_tree: KCSAN: data-race in mas_topiary_replace /
 mtree_range_walk

* Mirsad Todorovac <mirsad.todorovac@....unizg.hr> [230923 03:27]:
> On 9/22/23 15:51, Liam R. Howlett wrote:

...

> > > [ 6413.367326] ==================================================================
> > > [ 6413.367349] BUG: KCSAN: data-race in mas_topiary_replace / mtree_range_walk
> > > 
> > > [ 6413.367375] write to 0xffff8883a0c5db00 of 8 bytes by task 5475 on cpu 24:
> > > [ 6413.367386] mas_topiary_replace (/home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:491 /home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:1701 /home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:2590)
> > > [ 6413.367399] mas_spanning_rebalance.isra.0 (/home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:2664 /home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:2961)
> > > [ 6413.367413] mas_wr_spanning_store.isra.0 (/home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:3894)
> > > [ 6413.367428] mas_wr_store_entry.isra.0 (/home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:4242)
> > > [ 6413.367442] mas_store_prealloc (/home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:256 /home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:5408)
> > > [ 6413.367455] vma_merge (/home/marvin/linux/kernel/torvalds2/mm/internal.h:1127 /home/marvin/linux/kernel/torvalds2/mm/mmap.c:1005)
> > > [ 6413.367466] mprotect_fixup (/home/marvin/linux/kernel/torvalds2/mm/mprotect.c:632)
> > > [ 6413.367480] do_mprotect_pkey (/home/marvin/linux/kernel/torvalds2/mm/mprotect.c:819)
> > > [ 6413.367494] __x64_sys_mprotect (/home/marvin/linux/kernel/torvalds2/mm/mprotect.c:837)
> > > [ 6413.367503] do_syscall_64 (/home/marvin/linux/kernel/torvalds2/arch/x86/entry/common.c:50 /home/marvin/linux/kernel/torvalds2/arch/x86/entry/common.c:80)
> > > [ 6413.367517] entry_SYSCALL_64_after_hwframe (/home/marvin/linux/kernel/torvalds2/arch/x86/entry/entry_64.S:120)
> > > 
> > > [ 6413.367534] read to 0xffff8883a0c5db00 of 8 bytes by task 5558 on cpu 11:
> > > [ 6413.367545] mtree_range_walk (/home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:539 /home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:2789)
> > > [ 6413.367556] mas_walk (/home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:251 /home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:4844)
> > > [ 6413.367567] lock_vma_under_rcu (/home/marvin/linux/kernel/torvalds2/mm/memory.c:5436)
> > > [ 6413.367579] do_user_addr_fault (/home/marvin/linux/kernel/torvalds2/arch/x86/mm/fault.c:1357)
> > > [ 6413.367593] exc_page_fault (/home/marvin/linux/kernel/torvalds2/./arch/x86/include/asm/paravirt.h:695 /home/marvin/linux/kernel/torvalds2/arch/x86/mm/fault.c:1513 /home/marvin/linux/kernel/torvalds2/arch/x86/mm/fault.c:1561)
> > > [ 6413.367602] asm_exc_page_fault (/home/marvin/linux/kernel/torvalds2/./arch/x86/include/asm/idtentry.h:570)
> > > 
> > > [ 6413.367617] value changed: 0xffff888341d43116 -> 0xffff888340f92116
> > > 
> > > [ 6413.367632] Reported by Kernel Concurrency Sanitizer on:
> > > [ 6413.367640] CPU: 11 PID: 5558 Comm: ThreadPoolForeg Tainted: G             L     6.6.0-rc2-kcsan-00143-gb5cbe7c00aa0 #41
> > > [ 6413.367653] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
> > > [ 6413.367660] ==================================================================
> > > 
> > > For your convenience, took the trouble of finding those suspicious lines of code:
> > > 
> > > The write side:
> > > 
> > > lib/maple_tree.c:491
> > > --------------------
> > >    456 /*
> > >    457  * mas_set_parent() - Set the parent node and encode the slot
> > >    458  * @enode: The encoded maple node.
> > >    459  * @parent: The encoded maple node that is the parent of @enode.
> > >    460  * @slot: The slot that @enode resides in @parent.
> > >    461  *
> > >    462  * Slot number is encoded in the enode->parent bit 3-6 or 2-6, depending on the
> > >    463  * parent type.
> > >    464  */
> > >    465 static inline
> > >    466 void mas_set_parent(struct ma_state *mas, struct maple_enode *enode,
> > >    467                     const struct maple_enode *parent, unsigned char slot)
> > >    468 {
> > >    469         unsigned long val = (unsigned long)parent;
> > >    470         unsigned long shift;
> > >    471         unsigned long type;
> > >    472         enum maple_type p_type = mte_node_type(parent);
> > >    473
> > >    474         MAS_BUG_ON(mas, p_type == maple_dense);
> > >    475         MAS_BUG_ON(mas, p_type == maple_leaf_64);
> > >    476
> > >    477         switch (p_type) {
> > >    478         case maple_range_64:
> > >    479         case maple_arange_64:
> > >    480                 shift = MAPLE_PARENT_SLOT_SHIFT;
> > >    481                 type = MAPLE_PARENT_RANGE64;
> > >    482                 break;
> > >    483         default:
> > >    484         case maple_dense:
> > >    485         case maple_leaf_64:
> > >    486                 shift = type = 0;
> > >    487                 break;
> > >    488         }
> > >    489
> > >    490         val &= ~MAPLE_NODE_MASK; /* Clear all node metadata in parent */
> > > → 491         val |= (slot << shift) | type;
> > >    492         mte_to_node(enode)->parent = ma_parent_ptr(val);
> > >    493 }

This should probably be 492, not 491.  I know what is racing here and it
shouldn't cause issue.

...
> > > The read side:
> > > 
> > >     527 /*
> > >     528  * ma_dead_node() - check if the @enode is dead.
> > >     529  * @enode: The encoded maple node
> > >     530  *
> > >     531  * Return: true if dead, false otherwise.
> > >     532  */
> > >     533 static inline bool ma_dead_node(const struct maple_node *node)
> > >     534 {
> > >     535         struct maple_node *parent;
> > >     536
> > >     537         /* Do not reorder reads from the node prior to the parent check */
> > >     538         smp_rmb();
> > > →  539         parent = (void *)((unsigned long) node->parent & ~MAPLE_NODE_MASK);
> > >     540         return (parent == node);
> > >     541 }

This is the correct line.

...
> > > 
> > > as above, but the smb_rmb() protection is here, so the KCSAN warning should be double-checked for
> > > validity.
> > > 
> > > But I do not really understand maple trees to their depth, I am only reporting the symptomatic
> > > outlook of the reported data-race.
> > 
> > This is the most complex part of the maple tree, and unique to the
> > ability to store a range over multiple existing entries.  I recently
> > rewrote this particular section to avoid a potential live-lock issue.
> 
> I see.
> 
> > I am confident that the parent pointer will not be set to the node
> > pointer as checked by ma_dead_node().  But in an abundance of caution
> > and to resolve this potential false-positive, I am looking at this in
> > more detail.
> > 
> > This race is to see if the walk down the tree into unchanged data will
> > be seen before it is placed in the new subtree with its data also
> > unchanged.  Since we know the parent can never be the node, I feel this
> > is safe - but I'm not willing to live with the complaints from kasan.
> 
> I cannot analyse a couple of thousand lines at such a short notice.

Don't worry, I will :)

> 
> It looks suspicious because val in line 491 in a local variable and thus
> isn't available elsewhere.

It is used in the node->parent, as described above.  It is a race, but
it doesn't matter who wins.

> 
> Maybe the compiler (gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0) did something
> to the code?

Probably off-by-one line.

> 
> > > This is all-in-all a very interested subject, and I wish there was a way to just slurp all those
> > > interesting kernel intrinsics into the brain, but it just ain't that easy. Forgive me for being
> > > open ...
> > 
> > I appreciate that and your detailed analysis with code pointers of where
> > this happens.  Is this easy to recreate?  If so, how?  Can you attach
> > your kernel config?
> 
> Got that attached first, so I do not forget. :-/
> 
> Recreate? Actually it happened quite a number of times on my configuration (480+?).

I'm having issues recreating it because I hit a lot of races before this
one in my test setup.  I will keep working on reproducing this race, but
in the mean time can you test the attached patch with your setup?

...

Thanks,
Liam

View attachment "race_fix.patch" of type "text/x-diff" (382 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ