linux-kernel - Re: [RFC PATCH linux-next] ns: do not allocate a new nsproxy at each call

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8738o1ovdi.fsf@xmission.com>
Date:	Wed, 16 Oct 2013 12:42:33 -0700
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Guillaume Gaudonville <guillaume.gaudonville@...nd.com>
Cc:	Guillaume Gaudonville <gaudonville@...nd.com>,
	linux-kernel@...r.kernel.org, serge.hallyn@...onical.com,
	akpm@...ux-foundation.org, viro@...iv.linux.org.uk,
	davem@...emloft.net, cmetcalf@...era.com
Subject: Re: [RFC PATCH linux-next] ns: do not allocate a new nsproxy at each call

Guillaume Gaudonville <guillaume.gaudonville@...nd.com> writes:

> On 10/15/2013 08:40 PM, Eric W. Biederman wrote:
>> "Guillaume Gaudonville" <gaudonville@...nd.com> writes:
>>
>>> Currently, at each call of setns system call a new nsproxy is allocated,
>>> the old nsproxy namespaces are copied into the new one and the old nsproxy
>>> is freed if the task was the only one to use it.
>>>
>>> It can creates large delays on hardware with large number of cpus since
>>> to free a nsproxy a synchronize_rcu() call is done.
>>>
>>> When a task is the only one to use a nsproxy, only the task can do an action
>>> that will make this nsproxy to be shared by another task or thread (fork,...).
>>> So when the refcount of the nsproxy is equal to 1, we can simply update the
>>> current nsproxy field without allocating a new one and freeing the old one.
>>>
>>> The install operations of each kind of namespace cannot fails, so there's no
>>> need to check for an error and calling ops->install().
>>>
>>> Tested on TileGX (36 cores) and Intel (32 cores).
>> This may be worth doing (I am a little scared of a design that has setns
>> on a fast path) but right now this isn't safe.
>>
>> Currently pidns_install ends with:
>>
>> 	put_pid_ns(nsproxy->pid_ns_for_children);
>> 	nsproxy->pid_ns_for_children = get_pid_ns(new);
>> 	return 0;
>>
>>
>> And netns_install ends with:
>>
>> 	put_net(nsproxy->net_ns);
>> 	nsproxy->net_ns = get_net(net);
>> 	return 0;
>>
>> The put before the set is not atomic and is not safe unless
>> the nsproxy is private.  I think this is fixable but it requires a more
>> indepth look at the code than you have done.

> My expectation was that nobody else but the task itself could increase the
> nsproxy refcount (ie. use it), if the refcount was equal to 1. So there
> was no possible races while playing with nsproxy in that case.
> Do you mean that someone else than the task itself could play with the
> nsproxy,
> even if its refcount is equal to 1?

It is not required to increase the nsproxy refcount to use nsproxy.  It
is possible to find a living task and look at it's nsproxy using just
rcu protection.  get_net_ns_by_pid is one example of a place where we do
that.

If the refcount was all that was protecting nsproxy the problematic
synchronize_rcu call would be unnecessary.

Look for task_nsproxy if you want to find other readers of the nsproxy
that don't increase the reference count.

>> Mind if I ask where this comes up?
> The issue has been seen on a daemon that is performing monitoring and
> configuration tasks in different netns.

Interesting...  In practice I would think that daemon by opening a few
choice netlink sockets would not need to change network namespaces at
all frequently, and you can also see the net information in /proc and
/sys without changing your default net.  So I am wondering about the
daemon.

That said if we can optimize things without getting ourselves into a
maintenance nightmare I am happy to see a patch optimizing things.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/