[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <80fbd73f-ce75-44bf-a444-116217a50c91@linux.alibaba.com>
Date: Fri, 25 Oct 2024 11:50:42 +0800
From: Philo Lu <lulie@...ux.alibaba.com>
To: Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org
Cc: willemdebruijn.kernel@...il.com, davem@...emloft.net,
edumazet@...gle.com, kuba@...nel.org, dsahern@...nel.org,
antony.antony@...unet.com, steffen.klassert@...unet.com,
linux-kernel@...r.kernel.org, dust.li@...ux.alibaba.com,
jakub@...udflare.com, fred.cc@...baba-inc.com,
yubing.qiuyubing@...baba-inc.com
Subject: Re: [PATCH v5 net-next 3/3] ipv4/udp: Add 4-tuple hash for connected
socket
On 2024/10/24 23:01, Paolo Abeni wrote:
> On 10/18/24 13:45, Philo Lu wrote:
> [...]
>> +/* In hash4, rehash can also happen in connect(), where hash4_cnt keeps unchanged. */
>> +static void udp4_rehash4(struct udp_table *udptable, struct sock *sk, u16 newhash4)
>> +{
>> + struct udp_hslot *hslot4, *nhslot4;
>> +
>> + hslot4 = udp_hashslot4(udptable, udp_sk(sk)->udp_lrpa_hash);
>> + nhslot4 = udp_hashslot4(udptable, newhash4);
>> + udp_sk(sk)->udp_lrpa_hash = newhash4;
>> +
>> + if (hslot4 != nhslot4) {
>> + spin_lock_bh(&hslot4->lock);
>> + hlist_del_init_rcu(&udp_sk(sk)->udp_lrpa_node);
>> + hslot4->count--;
>> + spin_unlock_bh(&hslot4->lock);
>> +
>> + synchronize_rcu();
>
> This deserve a comment explaining why it's needed. I had to dig in past
> revision to understand it.
>
Got it. And a short explanation here (see [1] for detail):
Here, we move a node from a hlist to another new one, i.e., update
node->next from the old hlist to the new hlist. For readers traversing
the old hlist, if we update node->next just when readers move onto the
moved node, then the readers also move to the new hlist. This is unexpected.
Reader(lookup) Writer(rehash)
----------------- ---------------
1. rcu_read_lock()
2. pos = sk;
3. hlist_del_init_rcu(sk, old_slot)
4. hlist_add_head_rcu(sk, new_slot)
5. pos = pos->next; <=
6. rcu_read_unlock()
[1]
https://lore.kernel.org/all/0fb425e0-5482-4cdf-9dc1-3906751f8f81@linux.alibaba.com/
>> +
>> + spin_lock_bh(&nhslot4->lock);
>> + hlist_add_head_rcu(&udp_sk(sk)->udp_lrpa_node, &nhslot4->head);
>> + nhslot4->count++;
>> + spin_unlock_bh(&nhslot4->lock);
>> + }
>> +}
>> +
>> +static void udp4_unhash4(struct udp_table *udptable, struct sock *sk)
>> +{
>> + struct udp_hslot *hslot2, *hslot4;
>> +
>> + if (udp_hashed4(sk)) {
>> + hslot2 = udp_hashslot2(udptable, udp_sk(sk)->udp_portaddr_hash);
>> + hslot4 = udp_hashslot4(udptable, udp_sk(sk)->udp_lrpa_hash);
>> +
>> + spin_lock(&hslot4->lock);
>> + hlist_del_init_rcu(&udp_sk(sk)->udp_lrpa_node);
>> + hslot4->count--;
>> + spin_unlock(&hslot4->lock);
>> +
>> + spin_lock(&hslot2->lock);
>> + udp_hash4_dec(hslot2);
>> + spin_unlock(&hslot2->lock);
>> + }
>> +}
>> +
>> +/* call with sock lock */
>> +static void udp4_hash4(struct sock *sk)
>> +{
>> + struct udp_hslot *hslot, *hslot2, *hslot4;
>> + struct net *net = sock_net(sk);
>> + struct udp_table *udptable;
>> + unsigned int hash;
>> +
>> + if (sk_unhashed(sk) || inet_sk(sk)->inet_rcv_saddr == htonl(INADDR_ANY))
>> + return;
>> +
>> + hash = udp_ehashfn(net, inet_sk(sk)->inet_rcv_saddr, inet_sk(sk)->inet_num,
>> + inet_sk(sk)->inet_daddr, inet_sk(sk)->inet_dport);
>> +
>> + udptable = net->ipv4.udp_table;
>> + if (udp_hashed4(sk)) {
>> + udp4_rehash4(udptable, sk, hash);
>
> It's unclear to me how we can enter this branch. Also it's unclear why
> here you don't need to call udp_hash4_inc()udp_hash4_dec, too. Why such
> accounting can't be placed in udp4_rehash4()?
>
It's possible that a connected udp socket _re-connect_ to another remote
address. Then, because the local address is not changed, hash2 and its
hash4_cnt keep unchanged. But rehash4 need to be done.
I'll also add a comment here.
> [...]
>> @@ -2031,6 +2180,19 @@ void udp_lib_rehash(struct sock *sk, u16 newhash)
>> spin_unlock(&nhslot2->lock);
>> }
>>
>> + if (udp_hashed4(sk)) {
>> + udp4_rehash4(udptable, sk, newhash4);
>> +
>> + if (hslot2 != nhslot2) {
>> + spin_lock(&hslot2->lock);
>> + udp_hash4_dec(hslot2);
>> + spin_unlock(&hslot2->lock);
>> +
>> + spin_lock(&nhslot2->lock);
>> + udp_hash4_inc(nhslot2);
>> + spin_unlock(&nhslot2->lock);
>> + }
>> + }
>> spin_unlock_bh(&hslot->lock);
>
> The udp4_rehash4() call above is in atomic context and could end-up
> calling synchronize_rcu() which is a blocking function. You must avoid that.
>
I see, synchronize_rcu() cannot be used with spinlock. However, I don't
have a good idea to solve it by now. Do you have any thoughts or
suggestions?
> Cheers,
>
> Paolo
Thanks for your reviewing, Paolo. I'll address all your concerns in the
next version.
--
Philo
Powered by blists - more mailing lists