lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 2 Mar 2015 08:46:12 +0100
From:	Ingo Molnar <mingo@...nel.org>
To:	Michel Lespinasse <walken@...gle.com>
Cc:	Peter Zijlstra <peterz@...radead.org>, rusty@...tcorp.com.au,
	mathieu.desnoyers@...icios.com, oleg@...hat.com,
	paulmck@...ux.vnet.ibm.com, linux-kernel@...r.kernel.org,
	andi@...stfloor.org, rostedt@...dmis.org, tglx@...utronix.de,
	Andrea Arcangeli <aarcange@...hat.com>,
	David Woodhouse <David.Woodhouse@...el.com>,
	Rik van Riel <riel@...hat.com>,
	Josh Triplett <josh@...htriplett.org>
Subject: Re: [RFC][PATCH 5/9] rbtree: Make lockless searches non-fatal


* Michel Lespinasse <walken@...gle.com> wrote:

> On Sat, Feb 28, 2015 at 1:24 PM, Peter Zijlstra <peterz@...radead.org> wrote:
> > Change the insert and erase code such that lockless searches are
> > non-fatal.
> >
> > In and of itself an rbtree cannot be correctly searched while
> > in-modification, we can however provide weaker guarantees that will
> > allow the rbtree to be used in conjunction with other techniques, such
> > as latches; see 9b0fd802e8c0 ("seqcount: Add raw_write_seqcount_latch()").
> >
> > For this to work we need the following guarantees from the rbtree
> > code:
> >
> >  1) a lockless reader must not see partial stores, this would allow it
> >     to observe nodes that are invalid memory.
> >
> >  2) there must not be (temporary) loops in the tree structure in the
> >     modifier's program order, this would cause a lookup which
> >     interrupts the modifier to get stuck indefinitely.
> >
> > For 1) we must use WRITE_ONCE() for all updates to the tree structure;
> > in particular this patch only does rb_{left,right} as those are the
> > only element required for simple searches.
> >
> > It generates slightly worse code, probably because gcc stinks at
> > volatile. But in pointer chasing heavy code a few instructions more
> > should not matter.
> 
> So, I was worried that this would penalize all rbtree users, for the 
> benefit of just the one you're adding later in this series. We have 
> several rbtrees where we care about performance a lot, such as the 
> ones used in the scheduler or for indexing vmas.
> 
> That said, I checked with the compiler we are using here (gcc 4.7 
> variant) and I didn't see any change in the generated code. So, no 
> issue here for me.

Yeah, I had that worry too. With gcc 4.9.1 on x86-64 defconfig I get 
this comparison:

lib/rbtree.o:

   text    data     bss     dec     hex filename
   3299       0       0    3299     ce3 rbtree.o.before
   3308       0       0    3308     cec rbtree.o.after

+9 bytes.

Most of the difference seems minimal, involves an extra register move 
saving addresses in registers and using them there:

  449:  4c 8b 60 10             mov    0x10(%rax),%r12
- 44d:  49 39 fc                cmp    %rdi,%r12
- 450:  0f 84 aa 00 00 00       je     500 <__rb_insert_augmented+0x1a0>
- 456:  49 89 c6                mov    %rax,%r14
- 459:  4d 85 e4                test   %r12,%r12
- 45c:  4c 89 63 08             mov    %r12,0x8(%rbx)
- 460:  48 89 58 10             mov    %rbx,0x10(%rax)
- 464:  74 0b                   je     471 <__rb_insert_augmented+0x111>


  449:  4c 8b 60 10             mov    0x10(%rax),%r12
+ 44d:  48 89 c6                mov    %rax,%rsi
+ 450:  49 39 fc                cmp    %rdi,%r12
+ 453:  0f 84 97 00 00 00       je     4f0 <__rb_insert_augmented+0x190>
+ 459:  49 89 c6                mov    %rax,%r14
+ 45c:  4d 85 e4                test   %r12,%r12
+ 45f:  4c 89 63 08             mov    %r12,0x8(%rbx)
+ 463:  48 89 5e 10             mov    %rbx,0x10(%rsi)
+ 467:  74 0b                   je     474 <__rb_insert_augmented+0x114>

So here instead of using 0x10(%rax) again, GCC moved %rax into %rsi 
and used 0x10(%rsi). That seems to be plain compiler stupidity (that 
move to %rsi is senseless with or without volatile), and gcc might 
improve in the future.

In my (admittedly quick) comparison I saw no serious changes like 
extra memory loads or stack spills.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ