netdev - Re: [PATCHv5 2/2] memory barrier: adding smp_mb__after

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 7 Jul 2009 11:04:06 -0400
From:	Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
To:	Oleg Nesterov <oleg@...hat.com>
Cc:	Jiri Olsa <jolsa@...hat.com>, Ingo Molnar <mingo@...e.hu>,
	Eric Dumazet <eric.dumazet@...il.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
	fbl@...hat.com, nhorman@...hat.com, davem@...hat.com,
	htejun@...il.com, jarkao2@...il.com, davidel@...ilserver.org
Subject: Re: [PATCHv5 2/2] memory barrier: adding smp_mb__after_lock

* Oleg Nesterov (oleg@...hat.com) wrote:
> On 07/07, Mathieu Desnoyers wrote:
> >
> > As with any optimization (and this is one that adds a semantic that will
> > just grow the memory barrier/locking rule complexity), it should come
> > with performance benchmarks showing real-life improvements.
> 
> Well, the same applies to smp_mb__xxx_atomic_yyy or smp_mb__before_clear_bit.
> 
> Imho the new helper is not worse, and it could be also used by
> try_to_wake_up(), __pollwake(), insert_work() at least.

It's basically related to Amdahl law. If the smp_mb is a small portion
of the overall read_lock cost, then it may not be worth it to remove it.
At the contrary, if the mb is a big portion of set/clear bit, then it's
worth it. We also have to consider the frequency at which these
operations are done to figure out the overall performance impact.
Also, locks imply cache-line bouncing, which are typically costly.
clear/set bit does not imply this as much. So the tradeoffs are very
different there.

So it's not as simple as "we do this for set/clear bit, we should
therefore do this for locks".

> 
> > Otherwise I'd recommend sticking to smp_mb() if this execution path is
> > not that critical, or to move to RCU if it's _that_ critical.
> >
> > A valid argument would be if the data structures protected are so
> > complex that RCU is out of question but still the few cycles saved by
> > removing a memory barrier are really significant.
> 
> Not sure I understand how RCU can help,
> 

Changing a read_lock to a rcu_read_lock would save the whole atomic
cache-line bouncing operation on that fast path. But it may imply data
structure redesign. So it is more for future development than current
kernel releases.

> > And even then, the
> > proper solution would be more something like a
> > __read_lock()+smp_mb+smp_mb+__read_unlock(), so we get the performance
> > improvements on architectures other than x86 as well.
> 
> Hmm. could you explain what you mean?
> 

Actually, thinking about it more, to appropriately support x86, as well
as powerpc, arm and mips, we would need something like:

read_lock_smp_mb()

Which would be a read_lock with an included memory barrier.

Mathieu

> Oleg.
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html