[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <9879eb2d-a8eb-cfc0-45b4-3cb28e97e156@virtuozzo.com>
Date: Fri, 17 Nov 2017 23:16:53 +0300
From: Kirill Tkhai <ktkhai@...tuozzo.com>
To: "Eric W. Biederman" <ebiederm@...ssion.com>
Cc: Cong Wang <xiyou.wangcong@...il.com>,
David Miller <davem@...emloft.net>, vyasevic@...hat.com,
kstewart@...uxfoundation.org, pombredanne@...b.com,
Vladislav Yasevich <vyasevich@...il.com>, mark.rutland@....com,
Greg KH <gregkh@...uxfoundation.org>,
Alexey Dobriyan <adobriyan@...il.com>,
Florian Westphal <fw@...len.de>,
Nicolas Dichtel <nicolas.dichtel@...nd.com>,
roman.kapl@...go.com, Paul Moore <paul@...l-moore.com>,
David Ahern <dsahern@...il.com>,
Daniel Borkmann <daniel@...earbox.net>,
lucien xin <lucien.xin@...il.com>,
Matthias Schiffer <mschiffer@...verse-factory.net>,
rshearma@...cade.com, LKML <linux-kernel@...r.kernel.org>,
Linux Kernel Network Developers <netdev@...r.kernel.org>,
avagin@...tuozzo.com, gorcunov@...tuozzo.com
Subject: Re: [PATCH] net: Convert net_mutex into rw_semaphore and down read it
on net->init/->exit
On 17.11.2017 21:52, Eric W. Biederman wrote:
> Kirill Tkhai <ktkhai@...tuozzo.com> writes:
>
>> On 15.11.2017 19:31, Eric W. Biederman wrote:
>>> Kirill Tkhai <ktkhai@...tuozzo.com> writes:
>>>
>>>> On 15.11.2017 12:51, Kirill Tkhai wrote:
>>>>> On 15.11.2017 06:19, Eric W. Biederman wrote:
>>>>>> Kirill Tkhai <ktkhai@...tuozzo.com> writes:
>>>>>>
>>>>>>> On 14.11.2017 21:39, Cong Wang wrote:
>>>>>>>> On Tue, Nov 14, 2017 at 5:53 AM, Kirill Tkhai <ktkhai@...tuozzo.com> wrote:
>>>>>>>>> @@ -406,7 +406,7 @@ struct net *copy_net_ns(unsigned long flags,
>>>>>>>>>
>>>>>>>>> get_user_ns(user_ns);
>>>>>>>>>
>>>>>>>>> - rv = mutex_lock_killable(&net_mutex);
>>>>>>>>> + rv = down_read_killable(&net_sem);
>>>>>>>>> if (rv < 0) {
>>>>>>>>> net_free(net);
>>>>>>>>> dec_net_namespaces(ucounts);
>>>>>>>>> @@ -421,7 +421,7 @@ struct net *copy_net_ns(unsigned long flags,
>>>>>>>>> list_add_tail_rcu(&net->list, &net_namespace_list);
>>>>>>>>> rtnl_unlock();
>>>>>>>>> }
>>>>>>>>> - mutex_unlock(&net_mutex);
>>>>>>>>> + up_read(&net_sem);
>>>>>>>>> if (rv < 0) {
>>>>>>>>> dec_net_namespaces(ucounts);
>>>>>>>>> put_user_ns(user_ns);
>>>>>>>>> @@ -446,7 +446,7 @@ static void cleanup_net(struct work_struct *work)
>>>>>>>>> list_replace_init(&cleanup_list, &net_kill_list);
>>>>>>>>> spin_unlock_irq(&cleanup_list_lock);
>>>>>>>>>
>>>>>>>>> - mutex_lock(&net_mutex);
>>>>>>>>> + down_read(&net_sem);
>>>>>>>>>
>>>>>>>>> /* Don't let anyone else find us. */
>>>>>>>>> rtnl_lock();
>>>>>>>>> @@ -486,7 +486,7 @@ static void cleanup_net(struct work_struct *work)
>>>>>>>>> list_for_each_entry_reverse(ops, &pernet_list, list)
>>>>>>>>> ops_free_list(ops, &net_exit_list);
>>>>>>>>>
>>>>>>>>> - mutex_unlock(&net_mutex);
>>>>>>>>> + up_read(&net_sem);
>>>>>>>>
>>>>>>>> After your patch setup_net() could run concurrently with cleanup_net(),
>>>>>>>> given that ops_exit_list() is called on error path of setup_net() too,
>>>>>>>> it means ops->exit() now could run concurrently if it doesn't have its
>>>>>>>> own lock. Not sure if this breaks any existing user.
>>>>>>>
>>>>>>> Yes, there will be possible concurrent ops->init() for a net namespace,
>>>>>>> and ops->exit() for another one. I hadn't found pernet operations, which
>>>>>>> have a problem with that. If they exist, they are hidden and not clear seen.
>>>>>>> The pernet operations in general do not touch someone else's memory.
>>>>>>> If suddenly there is one, KASAN should show it after a while.
>>>>>>
>>>>>> Certainly the use of hash tables shared between multiple network
>>>>>> namespaces would count. I don't rembmer how many of these we have but
>>>>>> there used to be quite a few.
>>>>>
>>>>> Could you please provide an example of hash tables, you mean?
>>>>
>>>> Ah, I see, it's dccp_hashinfo etc.
>>
>> JFI, I've checked dccp_hashinfo, and it seems to be safe.
>>
>>>
>>> The big one used to be the route cache. With resizable hash tables
>>> things may be getting better in that regard.
>>
>> I've checked some fib-related things, and wasn't able to find that.
>> Excuse me, could you please clarify, if it's an assumption, or
>> there is exactly a problem hash table, you know? Could you please
>> point it me more exactly, if it's so.
>
> Two things.
> 1) Hash tables are one case I know where we access data from multiple
> network namespaces. As such it can not be asserted that is no
> possibility for problems.
>
> 2) The responsible way to handle this is one patch for each set of
> methods explaining why those methods are safe to run in parallel.
>
> That ensures there is opportunity for review and people are going
> slowly enough that they actually look at these issues.
>
> The reason I want to see this broken up is that at 200ish sets of
> methods it is too much to review all at once.
Ok, it's possible to split the changes in 400 patches, but there is
a problem with three-state (no compile, module, built-in) drivers.
Git bisect won't work anyway. Please see the description of the problem
in cover message "[PATCH RFC 00/25] Replacing net_mutex with rw_semaphore"
I sent today.
> I completely agree that odds are that this can be made safe and that it
> is mostly likely already safe in practically every instance. My guess
> would be that if there are problems that need to be addressed they
> happen in one or two places and we need to find them. If possible I
> don't want to find them after the code has shipped in a stable release.
Kirill
Powered by blists - more mailing lists