[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ea009a4a-c9f2-4843-b84d-e6b72982228e@linux.dev>
Date: Sat, 9 Nov 2024 15:00:55 +0000
From: Vadim Fedorenko <vadim.fedorenko@...ux.dev>
To: Gilad Naaman <gnaaman@...venets.com>,
Kuniyuki Iwashima <kuniyu@...zon.com>, "David S. Miller"
<davem@...emloft.net>, David Ahern <dsahern@...nel.org>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>,
netdev@...r.kernel.org
Subject: Re: [PATCH net-next v2] Avoid traversing addrconf hash on ifdown
On 08/11/2024 05:25, Gilad Naaman wrote:
> struct inet6_dev already has a list of addresses owned by the device,
> enabling us to traverse this much shorter list, instead of scanning
> the entire hash-table.
>
> Signed-off-by: Gilad Naaman <gnaaman@...venets.com>
> ---
> Changes in v2:
> - Remove double BH sections
> - Styling fixes (extra {}, extra newline)
> ---
> net/ipv6/addrconf.c | 38 +++++++++++++++++---------------------
> 1 file changed, 17 insertions(+), 21 deletions(-)
>
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index d0a99710d65d..c6fbd634912a 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -3846,12 +3846,12 @@ static int addrconf_ifdown(struct net_device *dev, bool unregister)
> {
> unsigned long event = unregister ? NETDEV_UNREGISTER : NETDEV_DOWN;
> struct net *net = dev_net(dev);
> - struct inet6_dev *idev;
> struct inet6_ifaddr *ifa;
> LIST_HEAD(tmp_addr_list);
> + struct inet6_dev *idev;
> bool keep_addr = false;
> bool was_ready;
> - int state, i;
> + int state;
>
> ASSERT_RTNL();
>
> @@ -3890,28 +3890,24 @@ static int addrconf_ifdown(struct net_device *dev, bool unregister)
> }
>
> /* Step 2: clear hash table */
> - for (i = 0; i < IN6_ADDR_HSIZE; i++) {
> - struct hlist_head *h = &net->ipv6.inet6_addr_lst[i];
> + read_lock_bh(&idev->lock);
> + spin_lock(&net->ipv6.addrconf_hash_lock);>
> - spin_lock_bh(&net->ipv6.addrconf_hash_lock);
> -restart:
> - hlist_for_each_entry_rcu(ifa, h, addr_lst) {
> - if (ifa->idev == idev) {
> - addrconf_del_dad_work(ifa);
> - /* combined flag + permanent flag decide if
> - * address is retained on a down event
> - */
> - if (!keep_addr ||
> - !(ifa->flags & IFA_F_PERMANENT) ||
> - addr_is_local(&ifa->addr)) {
> - hlist_del_init_rcu(&ifa->addr_lst);
> - goto restart;
> - }
> - }
> - }
> - spin_unlock_bh(&net->ipv6.addrconf_hash_lock);
> + list_for_each_entry(ifa, &idev->addr_list, if_list) {
> + addrconf_del_dad_work(ifa);
> +
> + /* combined flag + permanent flag decide if
> + * address is retained on a down event
> + */
> + if (!keep_addr ||
> + !(ifa->flags & IFA_F_PERMANENT) ||
> + addr_is_local(&ifa->addr))
> + hlist_del_init_rcu(&ifa->addr_lst);
> }
>
> + spin_unlock(&net->ipv6.addrconf_hash_lock);
> + read_unlock_bh(&idev->lock);
Why is this read lock needed here? spinlock addrconf_hash_lock will
block any RCU grace period to happen, so we can safely traverse
idev->addr_list with list_for_each_entry_rcu()...
> +
> write_lock_bh(&idev->lock);
if we are trying to protect idev->addr_list against addition, then we
have to extend write_lock scope. Otherwise it may happen that another
thread will grab write lock between read_unlock and write_lock.
Am I missing something?
Powered by blists - more mailing lists