[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ecdad6a5-d766-4ff2-a8ad-b605ebb3811c@redhat.com>
Date: Tue, 12 Nov 2024 15:41:09 +0100
From: Paolo Abeni <pabeni@...hat.com>
To: Vadim Fedorenko <vadim.fedorenko@...ux.dev>,
Gilad Naaman <gnaaman@...venets.com>, Eric Dumazet <edumazet@...gle.com>
Cc: davem@...emloft.net, dsahern@...nel.org, horms@...nel.org,
kuba@...nel.org, kuniyu@...zon.com, netdev@...r.kernel.org
Subject: Re: [PATCH net-next v2] Avoid traversing addrconf hash on ifdown
On 11/11/24 13:07, Vadim Fedorenko wrote:
> On 11/11/2024 05:21, Gilad Naaman wrote:
>>> On 10/11/2024 06:53, Gilad Naaman wrote:
>>>>>> - spin_unlock_bh(&net->ipv6.addrconf_hash_lock);
>>>>>> + list_for_each_entry(ifa, &idev->addr_list, if_list) {
>>>>>> + addrconf_del_dad_work(ifa);
>>>>>> +
>>>>>> + /* combined flag + permanent flag decide if
>>>>>> + * address is retained on a down event
>>>>>> + */
>>>>>> + if (!keep_addr ||
>>>>>> + !(ifa->flags & IFA_F_PERMANENT) ||
>>>>>> + addr_is_local(&ifa->addr))
>>>>>> + hlist_del_init_rcu(&ifa->addr_lst);
>>>>>> }
>>>>>>
>>>>>> + spin_unlock(&net->ipv6.addrconf_hash_lock);
>>>>>> + read_unlock_bh(&idev->lock);
>>>>>
>>>>> Why is this read lock needed here? spinlock addrconf_hash_lock will
>>>>> block any RCU grace period to happen, so we can safely traverse
>>>>> idev->addr_list with list_for_each_entry_rcu()...
>>>>
>>>> Oh, sorry, I didn't realize the hash lock encompasses this one;
>>>> although it seems obvious in retrospect.
>>>>
>>>>>> +
>>>>>> write_lock_bh(&idev->lock);
>>>>>
>>>>> if we are trying to protect idev->addr_list against addition, then we
>>>>> have to extend write_lock scope. Otherwise it may happen that another
>>>>> thread will grab write lock between read_unlock and write_lock.
>>>>>
>>>>> Am I missing something?
>>>>
>>>> I wanted to ensure that access to `idev->addr_list` is performed under lock,
>>>> the same way it is done immediately afterwards;
>>>> No particular reason not to extend the existing lock, I just didn't think
>>>> about it.
>>>>
>>>> For what it's worth, the original code didn't have this protection either,
>>>> since the another thread could have grabbed the lock between
>>>> `spin_unlock_bh(&net->ipv6.addrconf_hash_lock);` of the last loop iteration,
>>>> and the `write_lock`.
>>>>
>>>> Should I extend the write_lock upwards, or just leave it off?
>>>
>>> Well, you are doing write manipulation with the list, which is protected
>>> by read-write lock. I would expect this lock to be held in write mode.
>>> And you have to protect hash map at the same time. So yes, write_lock
>>> and spin_lock altogether, I believe.
>>>
>>
>> Note that within the changed lines, the list itself is only iterated-on,
>> not manipulated.
>> The changes are to the `addr_lst` list, which is the hashtable, not the
>> list this lock protects.
>>
>> I'll send v3 with the write-lock extended.
>> Thank you!
>
> Reading it one more time, I'm not quite sure that locking hashmap
> spinlock under idev->lock in write mode is a good idea... We have to
> think more about it, maybe ask for another opinion. Looks like RTNL
> should protect idev->addr_list from modification while idev->lock is
> more about changes to idev, not only about addr_list.
>
> @Eric could you please shed some light on the locking schema here?
AFAICS idev->addr_list is (write) protected by write_lock(idev->lock),
while net->ipv6.inet6_addr_lst is protected by
spin_lock_bh(&net->ipv6.addrconf_hash_lock).
Extending the write_lock() scope will create a lock dependency between
the hashtable lock and the list lock, which in turn could cause more
problem in the future.
Note that idev->addr_list locking looks a bit fuzzy, as is traversed in
several places under the RCU lock only. I suggest finish the conversion
of idev->addr_list to RCU and do this additional traversal under RCU, too.
Cheers,
Paolo
Powered by blists - more mailing lists