linux-kernel - Re: [RFC PATCH 1/4] hazptr: Add initial implementation of hazard pointers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZvY2zBiluLkqRSkc@boqun-archlinux>
Date: Thu, 26 Sep 2024 21:38:36 -0700
From: Boqun Feng <boqun.feng@...il.com>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
	Jonas Oberhauser <jonas.oberhauser@...weicloud.com>,
	linux-kernel@...r.kernel.org, rcu@...r.kernel.org,
	linux-mm@...ck.org, lkmm@...ts.linux.dev,
	"Paul E. McKenney" <paulmck@...nel.org>,
	Frederic Weisbecker <frederic@...nel.org>,
	Neeraj Upadhyay <neeraj.upadhyay@...nel.org>,
	Joel Fernandes <joel@...lfernandes.org>,
	Josh Triplett <josh@...htriplett.org>,
	Uladzislau Rezki <urezki@...il.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Lai Jiangshan <jiangshanlai@...il.com>,
	Zqiang <qiang.zhang1211@...il.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...hat.com>, Will Deacon <will@...nel.org>,
	Waiman Long <longman@...hat.com>,
	Mark Rutland <mark.rutland@....com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Kent Overstreet <kent.overstreet@...il.com>,
	Vlastimil Babka <vbabka@...e.cz>, maged.michael@...il.com,
	Neeraj Upadhyay <neeraj.upadhyay@....com>
Subject: Re: [RFC PATCH 1/4] hazptr: Add initial implementation of hazard
 pointers

On Fri, Sep 27, 2024 at 03:20:40AM +0200, Mathieu Desnoyers wrote:
> On 2024-09-26 18:12, Linus Torvalds wrote:
> > On Thu, 26 Sept 2024 at 08:54, Jonas Oberhauser
> > <jonas.oberhauser@...weicloud.com> wrote:
> > > 
> > > No, the issue introduced by the compiler optimization (or by your
> > > original patch) is that the CPU can speculatively load from the first
> > > pointer as soon as it has completed the load of that pointer:
> > 
> > You mean the compiler can do it. The inline asm has no impact on what
> > the CPU does. The conditional isn't a barrier for the actual hardware.
> > But once the compiler doesn't try to do it, the data dependency on the
> > address does end up being an ordering constraint on the hardware too
> > (I'm happy to say that I haven't heard from the crazies that want
> > value prediction in a long time).
> > 
> > Just use a barrier.  Or make sure to use the proper ordered memory
> > accesses when possible. Don't use an inline asm for the compare - we
> > don't even have anything insane like that as a portable helper, and we
> > shouldn't have it.
> 
> How does the compiler barrier help in any way here ?
> 
> I am concerned about the compiler SSA GVN (Global Value Numbering)
> optimizations, and I don't think a compiler barrier solves anything.
> (or I'm missing something obvious)

I think you're right, a compiler barrier doesn't help here:

	head = READ_ONCE(p);
	smp_mb();
	WRITE_ONCE(*slot, head);

	ptr = READ_ONCE(p);
	if (ptr != head) {
		...
	} else {
		barrier();
		return ptr;
	}

compilers can replace 'ptr' with 'head' because of the equality, and
even putting barrier() here cannot prevent compiler to rewrite the else
branch into:

	else {
		barrier();
		return head;
	}

because that's just using a different register, unrelated to memory
accesses.

Jonas, am I missing something subtle? Or this is different than what you
proposed?

Regards,
Boqun

> 
> I was concerned about the suggestion from Jonas to use "node2"
> rather than "node" after the equality check as a way to ensure
> the intended register is used to return the pointer, because after
> the SSA GVN optimization pass, AFAIU this won't help in any way.
> I have a set of examples below that show gcc use the result of the
> first load, and clang use the result of the second load (on
> both x86-64 and aarch64). Likewise when a load-acquire is used as
> second load, which I find odd. Hopefully mixing this optimization
> from gcc with speculation still abide by the memory model.
> 
> Only the asm goto approach ensures that gcc uses the result from
> the second load.
> 
[...]