[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8eba464937d34d8330a82332ebd672eb@codeaurora.org>
Date: Mon, 29 Jun 2020 14:18:06 -0600
From: stranche@...eaurora.org
To: Cong Wang <xiyou.wangcong@...il.com>
Cc: David Miller <davem@...emloft.net>,
Linux Kernel Network Developers <netdev@...r.kernel.org>,
Pravin B Shelar <pshelar@....org>,
Subash Abhinov Kasiviswanathan <subashab@...eaurora.org>,
Johannes Berg <johannes.berg@...el.com>
Subject: Re: [PATCH net] genetlink: take netlink table lock when
(un)registering
On 2020-06-27 12:55, Cong Wang wrote:
> On Fri, Jun 26, 2020 at 5:32 PM Sean Tranchetti
> <stranche@...eaurora.org> wrote:
>>
>> A potential deadlock can occur during registering or unregistering a
>> new
>> generic netlink family between the main nl_table_lock and the cb_lock
>> where
>> each thread wants the lock held by the other, as demonstrated below.
>>
>> 1) Thread 1 is performing a netlink_bind() operation on a socket. As
>> part
>> of this call, it will call netlink_lock_table(), incrementing the
>> nl_table_users count to 1.
>> 2) Thread 2 is registering (or unregistering) a genl_family via the
>> genl_(un)register_family() API. The cb_lock semaphore will be taken
>> for
>> writing.
>> 3) Thread 1 will call genl_bind() as part of the bind operation to
>> handle
>> subscribing to GENL multicast groups at the request of the user. It
>> will
>> attempt to take the cb_lock semaphore for reading, but it will fail
>> and
>> be scheduled away, waiting for Thread 2 to finish the write.
>> 4) Thread 2 will call netlink_table_grab() during the (un)registration
>> call. However, as Thread 1 has incremented nl_table_users, it will
>> not
>> be able to proceed, and both threads will be stuck waiting for the
>> other.
>>
>> To avoid this scenario, the locks should be acquired in the same order
>> by
>> both threads. Since both the register and unregister functions need to
>> take
>> the nl_table_lock in their processing, it makes sense to explicitly
>> acquire
>> them before they lock the genl_mutex and the cb_lock. In
>> unregistering, no
>> other change is needed aside from this locking change.
>
> Like the kernel test robot reported, you can not call genl_lock_all
> while
> holding netlink_table_grab() which is effectively a write lock.
>
> To me, it seems genl_bind() can be just removed as there is no one
> in-tree uses family->mcast_bind(). Can you test the attached patch?
> It seems sufficient to fix this deadlock.
>
> Thanks.
Thanks Cong. Yes, removing the genl_bind()/genl_unbind() functions
eliminates the
potential for this deadlock. Adding Johannes here to comment on removing
these,
as the family->mcast_bind() capability added by commit c380d9a7afff
("genetlink: pass multicast bind/unbind to families") would be lost.
Powered by blists - more mailing lists