netdev - Re: [PATCH 1/3] net: Fix possible race in peernet2id

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <54b5eb20-0357-d0f2-75ef-3de093c3a283@virtuozzo.com>
Date:   Fri, 22 Dec 2017 13:30:27 +0300
From:   Kirill Tkhai <ktkhai@...tuozzo.com>
To:     "Eric W. Biederman" <ebiederm@...ssion.com>
Cc:     netdev@...r.kernel.org, davem@...emloft.net, eric.dumazet@...il.com
Subject: Re: [PATCH 1/3] net: Fix possible race in peernet2id_alloc()

Hi, Eric,

On 21.12.2017 20:39, Eric W. Biederman wrote:
> Kirill Tkhai <ktkhai@...tuozzo.com> writes:
> 
>> peernet2id_alloc() is racy without rtnl_lock() as atomic_read(&peer->count)
>> under net->nsid_lock does not guarantee, peer is alive:
>>
>> rcu_read_lock()
>> peernet2id_alloc()                            ..
>>   spin_lock_bh(&net->nsid_lock)               ..
>>   atomic_read(&peer->count) == 1              ..
>>   ..                                          put_net()
>>   ..                                            cleanup_net()
>>   ..                                              for_each_net(tmp)
>>   ..                                                spin_lock_bh(&tmp->nsid_lock)
>>   ..                                                __peernet2id(tmp, net) == -1
>>   ..                                                    ..
>>   ..                                                    ..
>>     __peernet2id_alloc(alloc == true)                   ..
>>   ..                                                    ..
>> rcu_read_unlock()                                       ..
>> ..                                                synchronize_rcu()
>> ..                                                kmem_cache_free(net)
>>
>> After the above situation, net::netns_id contains id pointing to freed memory,
>> and any other dereferencing by the id will operate with this freed memory.
>>
>> Currently, peernet2id_alloc() is used under rtnl_lock() everywhere except
>> ovs_vport_cmd_fill_info(), and this race can't occur. But peernet2id_alloc()
>> is generic interface, and better we fix it before someone really starts
>> use it in wrong context.
> 
> So it comes down to this piece of code from ovs and just let me say ick.
> 	if (!net_eq(net, dev_net(vport->dev))) {
> 		int id = peernet2id_alloc(net, dev_net(vport->dev));
> 
> 		if (nla_put_s32(skb, OVS_VPORT_ATTR_NETNSID, id))
> 			goto nla_put_failure;
> 	}
> 
> Without the rtnl lock dev_net can cange between the test and the
> call of peernet2id_alloc.

The places like this are usually protected via netdevice notifier. There is
ovs_dp_device_notifier in openvswitch and it seems to be a handler for
the situation above. But I'm not sure it's safe, and it's not clear for
me why it does not take ovl_mutex to synchronize with ovs_vport_cmd_fill_info(),
and whether there is another synchronization.

> At first glance it looks like the bug is that we are running a control
> path of the networking stack without the rtnl lock. So it may be that
> ASSERT_RTNL() is the better fix.

Do you mean inserting ASSERT_RTNL() into peernet2id_alloc()?
If so, it's not need, as all nsid are protected via separate lock starting from

commit 95f38411df05 "netns: use a spin_lock to protect nsid management"

and commit de133464c9e7 "netns: make nsid_lock per net".

nsid primitives are aimed to be safe without respect to rtnl_lock().
My patch just adds some sanity checks to make them more safe.
 
> Given that it would be nice to reduce the scope of the rtnl lock this
> might not be a bad direction.  Let me see.
>> Is rtnl_notify safe without the rtnl lock?

It seems they have to be safe, as there is only sending a skb to netlink.
rtnl_notify() is used without rtnl_lock() in most other places of net code.

The only reason rtnl_lock() is needed in cleanup_net() is we unlink net from the list.
rtnl_lock() around nsid cleanup is leftover after nsid are made protected via
separate lock. So we may relax rtnl_lock() in cleanup_net() and I'm going to do that
in one of next patches (not yet sent).

>>
>> Signed-off-by: Kirill Tkhai <ktkhai@...tuozzo.com>
>> ---
>>  net/core/net_namespace.c |   23 +++++++++++++++++++----
>>  1 file changed, 19 insertions(+), 4 deletions(-)
>>
>> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
>> index 60a71be75aea..6a4eab438221 100644
>> --- a/net/core/net_namespace.c
>> +++ b/net/core/net_namespace.c
>> @@ -221,17 +221,32 @@ static void rtnl_net_notifyid(struct net *net, int cmd, int id);
>>   */
>>  int peernet2id_alloc(struct net *net, struct net *peer)
>>  {
>> -	bool alloc;
>> +	bool alloc = false, alive = false;
>>  	int id;
> 
>         ^^^ Perhaps we want "ASSERT_RTNL();" here?
>>  
>> -	if (atomic_read(&net->count) == 0)
>> -		return NETNSA_NSID_NOT_ASSIGNED;
> 
> Moving this hunk is of no benefit.  The code must be called with a valid
> reference to net.   Which means net->count is a fancy way of testing to
> see if the code is in cleanup_net.  In all other cases net->count should
> be non-zero and it should remain that way because of our caller must
> keep a reference.

Let me discuss about this.
This interface is exported to drivers, and they don't always follow the rules
about what is allowed or disallowed. This hunk adds sanity checks, which
protects us from bad drivers code etc. Some people want to use peernet2id_alloc()
in generic functions, which may be called in different context (also, for net
obtained from rcu list). Instead of every user makes such a test in its function,
this hunks introduces generic valid check, which allows people do not insert n+1
their own checks.

Also, unlocked check does not give performance advantages as later we take the lock
*almost always*. So, my suggestion is to make peernet2id_alloc() more safe for
end user.
 
>>  	spin_lock_bh(&net->nsid_lock);
>> -	alloc = atomic_read(&peer->count) == 0 ? false : true;
>> +	/* Spinlock guarantees we never hash a peer to net->netns_ids
>> +	 * after idr_destroy(&net->netns_ids) occurs in cleanup_net().
>> +	 */
>> +	if (atomic_read(&net->count) == 0) {
>> +		id = NETNSA_NSID_NOT_ASSIGNED;
>> +		goto unlock;
>> +	}
>> +	/*
>> +	 * When peer is obtained from RCU lists, we may race with
>> +	 * its cleanup. Check whether it's alive, and this guarantees
>> +	 * we never hash a peer back to net->netns_ids, after it has
>> +	 * just been idr_remove()'d from there in cleanup_net().
>> +	 */
>> +	if (maybe_get_net(peer))
>> +		alive = alloc = true;
> 
> Yes this does seem reasonable.  The more obvious looking code which
> would return NETNSA_NSID_NOT_ASSIGNED if the peer has a count of 0, is
> silly as it makes would make it appear that a peer is momentary outside
> of a network namespace when the peer is in fact moving from one network
> namespace to another.
>         
>>  	id = __peernet2id_alloc(net, peer, &alloc);
>> +unlock:
>>  	spin_unlock_bh(&net->nsid_lock);
>>  	if (alloc && id >= 0)
>>  		rtnl_net_notifyid(net, RTM_NEWNSID, id);
>                 ^^^^^^
>                 Is this safe without the rtnl lock?

Yes, it seems to be safe. Please, look above.

>> +	if (alive)
>> +		put_net(peer);
>>  	return id;
>>  }
>>  EXPORT_SYMBOL_GPL(peernet2id_alloc);

Kirill