linux-kernel - Re: [PATCH v2] netlink: Replace rhash

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150922161056.GB9761@mtj.duckdns.org>
Date:	Tue, 22 Sep 2015 12:10:56 -0400
From:	Tejun Heo <tj@...nel.org>
To:	Herbert Xu <herbert@...dor.apana.org.au>
Cc:	David Miller <davem@...emloft.net>, cwang@...pensource.com,
	tom@...bertland.com, kafai@...com, kernel-team@...com,
	linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
	torvalds@...ux-foundation.org, jiri@...nulli.us,
	nicolas.dichtel@...nd.com, tgraf@...g.ch, sfeldma@...il.com
Subject: Re: [PATCH v2] netlink: Replace rhash_portid with bound

Hello, Herbert.

On Tue, Sep 22, 2015 at 11:38:56AM +0800, Herbert Xu wrote:
> On Mon, Sep 21, 2015 at 02:20:22PM -0400, Tejun Heo wrote:
> > store_release and load_acquire are different from the usual memory
> > barriers and can't be paired this way.  You have to pair store_release
> > and load_acquire.  Besides, it isn't a particularly good idea to
> 
> OK I've decided to drop the acquire/release helpers as they don't
> help us at all and simply pessimises the code by using full memory
> barriers (on some architectures) where only a write or read barrier
> is needed.

That's a pentium pro era errata.  Virtually no working machine is
affected by that anymore and nobody builds kernel with that option.
In most cases, store_release and load_acquire are cheaper as they're
more specific.  On x86, store_release and load_acquire boil down to
compiler reordering barriers.  You're running in the opposite
direction.

> > There's no reason to be overly smart here.  This isn't a crazy hot
> > path, write barriers tend to be very cheap, store_release more so.
> > Please just do smp_store_release() and note what it's paired with.
> 
> It's not about being overly smart.  It's about actually understanding
> what's going on with the code.  I've seen too many instances of
> people simply sprinkling synchronisation primitives around without
> any knowledge of what is happening underneath, which is just a recipe
> for creating hard-to-debug races.

I mean, read this thread.  It's one subtle breakage after another, one
confusion after another.  The problematic usages of memory barriers
are usually of two types.

1. Misunderstand what barriers do and fail to, most frequently, pair
   the barriers correctly.  This leads to things like lone smp_wmb()s
   which don't do anything but provide false sense of security.

2. The usage itself is correct but not properly documented and it's
   not clear what's being synchronized.  Because there's nothing
   inherently pairing the matching barrier pairs, w/o proper
   documentation, it can be very challenging to track down what is
   being synchronized making it difficult to tell this case from 1.
   Note that this is another reason smp_store_release() and
   smp_load_acquire() are just better.  Their specificity not only
   makes them lighter but also makes it a lot easier to track down
   what's going on.

So, when using barriers, we want to first pick the most specific
barrier pairs that suit the use case and clearly identify the matching
pairs and describe any subtlties if there's any.  Here, it's a
straight-forward writer to reader interlocking.

Once we meet the above criteria, we do want to be as simple as the use
case allows it to be.  e.g. if there are multiples readers,
encapsulate them and document them centrally.  If there are different
classes of readers and it's worthwhile to apply different
synchronization schemes to them, use separate helpers or clearly mark
and document exceptions with rationale.

> > I don't think you can skip load_acquire here just because this is the
> > second deref of the variable.  That doesn't change anything.  Race
> > condition could still happen between the first and second tests and
> > skipping the second would lead to the same kind of bug.
> 
> The reason this one is OK is because we do not use nlk->portid or
> try to get nlk from the hash table before we return to user-space.

What happens if somebody later adds code below that which happens to
use portid?  You're creating a booby trap and the patch isn't even
properly documenting what's going on.

> ---8<---
> The commit 1f770c0a09da855a2b51af6d19de97fb955eca85 ("netlink:
> Fix autobind race condition that leads to zero port ID") created
> some new races that can occur due to inconcsistencies between the
> two port IDs.
> 
> Tejun is right that a barrier is unavoidable.  Therefore I am
> reverting to the original patch that used a boolean to indicate
> that a user netlink socket has been bound.
> 
> Barriers have been added where necessary to ensure that a valid
> portid and the hashed socket is visible.
> 
> I have also changed netlink_insert to only return EBUSY if the
> socket is bound to a portid different to the requested one.  This
> combined with only reading nlk->bound once in netlink_bind fixes
> a race where two threads that bind the socket at the same time
> with different port IDs may both succeed.
> 
> Fixes: 1f770c0a09da ("netlink: Fix autobind race condition that leads to zero port ID")
> Reported-by: Tejun Heo <tj@...nel.org>
> Reported-by: Linus Torvalds <torvalds@...ux-foundation.org>
> Signed-off-by: Herbert Xu <herbert@...dor.apana.org.au>

This is a pretty misguided use of barriers.

Nacked-by: Tejun Heo <tj@...nel.org>

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/