linux-kernel - Re: [PATCH] net: Convert net_mutex into rw_semaphore and down read it on net->init/->exit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <9879eb2d-a8eb-cfc0-45b4-3cb28e97e156@virtuozzo.com>
Date:   Fri, 17 Nov 2017 23:16:53 +0300
From:   Kirill Tkhai <ktkhai@...tuozzo.com>
To:     "Eric W. Biederman" <ebiederm@...ssion.com>
Cc:     Cong Wang <xiyou.wangcong@...il.com>,
        David Miller <davem@...emloft.net>, vyasevic@...hat.com,
        kstewart@...uxfoundation.org, pombredanne@...b.com,
        Vladislav Yasevich <vyasevich@...il.com>, mark.rutland@....com,
        Greg KH <gregkh@...uxfoundation.org>,
        Alexey Dobriyan <adobriyan@...il.com>,
        Florian Westphal <fw@...len.de>,
        Nicolas Dichtel <nicolas.dichtel@...nd.com>,
        roman.kapl@...go.com, Paul Moore <paul@...l-moore.com>,
        David Ahern <dsahern@...il.com>,
        Daniel Borkmann <daniel@...earbox.net>,
        lucien xin <lucien.xin@...il.com>,
        Matthias Schiffer <mschiffer@...verse-factory.net>,
        rshearma@...cade.com, LKML <linux-kernel@...r.kernel.org>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>,
        avagin@...tuozzo.com, gorcunov@...tuozzo.com
Subject: Re: [PATCH] net: Convert net_mutex into rw_semaphore and down read it
 on net->init/->exit

On 17.11.2017 21:52, Eric W. Biederman wrote:
> Kirill Tkhai <ktkhai@...tuozzo.com> writes:
> 
>> On 15.11.2017 19:31, Eric W. Biederman wrote:
>>> Kirill Tkhai <ktkhai@...tuozzo.com> writes:
>>>
>>>> On 15.11.2017 12:51, Kirill Tkhai wrote:
>>>>> On 15.11.2017 06:19, Eric W. Biederman wrote:
>>>>>> Kirill Tkhai <ktkhai@...tuozzo.com> writes:
>>>>>>
>>>>>>> On 14.11.2017 21:39, Cong Wang wrote:
>>>>>>>> On Tue, Nov 14, 2017 at 5:53 AM, Kirill Tkhai <ktkhai@...tuozzo.com> wrote:
>>>>>>>>> @@ -406,7 +406,7 @@ struct net *copy_net_ns(unsigned long flags,
>>>>>>>>>
>>>>>>>>>         get_user_ns(user_ns);
>>>>>>>>>
>>>>>>>>> -       rv = mutex_lock_killable(&net_mutex);
>>>>>>>>> +       rv = down_read_killable(&net_sem);
>>>>>>>>>         if (rv < 0) {
>>>>>>>>>                 net_free(net);
>>>>>>>>>                 dec_net_namespaces(ucounts);
>>>>>>>>> @@ -421,7 +421,7 @@ struct net *copy_net_ns(unsigned long flags,
>>>>>>>>>                 list_add_tail_rcu(&net->list, &net_namespace_list);
>>>>>>>>>                 rtnl_unlock();
>>>>>>>>>         }
>>>>>>>>> -       mutex_unlock(&net_mutex);
>>>>>>>>> +       up_read(&net_sem);
>>>>>>>>>         if (rv < 0) {
>>>>>>>>>                 dec_net_namespaces(ucounts);
>>>>>>>>>                 put_user_ns(user_ns);
>>>>>>>>> @@ -446,7 +446,7 @@ static void cleanup_net(struct work_struct *work)
>>>>>>>>>         list_replace_init(&cleanup_list, &net_kill_list);
>>>>>>>>>         spin_unlock_irq(&cleanup_list_lock);
>>>>>>>>>
>>>>>>>>> -       mutex_lock(&net_mutex);
>>>>>>>>> +       down_read(&net_sem);
>>>>>>>>>
>>>>>>>>>         /* Don't let anyone else find us. */
>>>>>>>>>         rtnl_lock();
>>>>>>>>> @@ -486,7 +486,7 @@ static void cleanup_net(struct work_struct *work)
>>>>>>>>>         list_for_each_entry_reverse(ops, &pernet_list, list)
>>>>>>>>>                 ops_free_list(ops, &net_exit_list);
>>>>>>>>>
>>>>>>>>> -       mutex_unlock(&net_mutex);
>>>>>>>>> +       up_read(&net_sem);
>>>>>>>>
>>>>>>>> After your patch setup_net() could run concurrently with cleanup_net(),
>>>>>>>> given that ops_exit_list() is called on error path of setup_net() too,
>>>>>>>> it means ops->exit() now could run concurrently if it doesn't have its
>>>>>>>> own lock. Not sure if this breaks any existing user.
>>>>>>>
>>>>>>> Yes, there will be possible concurrent ops->init() for a net namespace,
>>>>>>> and ops->exit() for another one. I hadn't found pernet operations, which
>>>>>>> have a problem with that. If they exist, they are hidden and not clear seen.
>>>>>>> The pernet operations in general do not touch someone else's memory.
>>>>>>> If suddenly there is one, KASAN should show it after a while.
>>>>>>
>>>>>> Certainly the use of hash tables shared between multiple network
>>>>>> namespaces would count.  I don't rembmer how many of these we have but
>>>>>> there used to be quite a few.
>>>>>
>>>>> Could you please provide an example of hash tables, you mean?
>>>>
>>>> Ah, I see, it's dccp_hashinfo etc.
>>
>> JFI, I've checked dccp_hashinfo, and it seems to be safe.
>>
>>>
>>> The big one used to be the route cache.  With resizable hash tables
>>> things may be getting better in that regard.
>>
>> I've checked some fib-related things, and wasn't able to find that.
>> Excuse me, could you please clarify, if it's an assumption, or
>> there is exactly a problem hash table, you know? Could you please
>> point it me more exactly, if it's so.
> 
> Two things.
> 1) Hash tables are one case I know where we access data from multiple
>    network namespaces.  As such it can not be asserted that is no
>    possibility for problems.
> 
> 2) The responsible way to handle this is one patch for each set of
>    methods explaining why those methods are safe to run in parallel.
> 
>    That ensures there is opportunity for review and people are going
>    slowly enough that they actually look at these issues.
> 
> The reason I want to see this broken up is that at 200ish sets of
> methods it is too much to review all at once.

Ok, it's possible to split the changes in 400 patches, but there is
a problem with three-state (no compile, module, built-in) drivers.
Git bisect won't work anyway. Please see the description of the problem
in cover message "[PATCH RFC 00/25] Replacing net_mutex with rw_semaphore"
I sent today.
 
> I completely agree that odds are that this can be made safe and that it
> is mostly likely already safe in practically every instance.    My guess
> would be that if there are problems that need to be addressed they
> happen in one or two places and we need to find them.  If possible I
> don't want to find them after the code has shipped in a stable release.

Kirill