[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ad78a2bb-9dc4-4f80-9011-b49fd721a425@redhat.com>
Date: Fri, 25 Oct 2024 11:02:13 +0200
From: Paolo Abeni <pabeni@...hat.com>
To: Philo Lu <lulie@...ux.alibaba.com>, netdev@...r.kernel.org
Cc: willemdebruijn.kernel@...il.com, davem@...emloft.net,
edumazet@...gle.com, kuba@...nel.org, dsahern@...nel.org,
antony.antony@...unet.com, steffen.klassert@...unet.com,
linux-kernel@...r.kernel.org, dust.li@...ux.alibaba.com,
jakub@...udflare.com, fred.cc@...baba-inc.com,
yubing.qiuyubing@...baba-inc.com
Subject: Re: [PATCH v5 net-next 3/3] ipv4/udp: Add 4-tuple hash for connected
socket
On 10/25/24 05:50, Philo Lu wrote:
> On 2024/10/24 23:01, Paolo Abeni wrote:
>> On 10/18/24 13:45, Philo Lu wrote:
>> [...]
>>> +/* In hash4, rehash can also happen in connect(), where hash4_cnt keeps unchanged. */
>>> +static void udp4_rehash4(struct udp_table *udptable, struct sock *sk, u16 newhash4)
>>> +{
>>> + struct udp_hslot *hslot4, *nhslot4;
>>> +
>>> + hslot4 = udp_hashslot4(udptable, udp_sk(sk)->udp_lrpa_hash);
>>> + nhslot4 = udp_hashslot4(udptable, newhash4);
>>> + udp_sk(sk)->udp_lrpa_hash = newhash4;
>>> +
>>> + if (hslot4 != nhslot4) {
>>> + spin_lock_bh(&hslot4->lock);
>>> + hlist_del_init_rcu(&udp_sk(sk)->udp_lrpa_node);
>>> + hslot4->count--;
>>> + spin_unlock_bh(&hslot4->lock);
>>> +
>>> + synchronize_rcu();
>>
>> This deserve a comment explaining why it's needed. I had to dig in past
>> revision to understand it.
>>
>
> Got it. And a short explanation here (see [1] for detail):
>
> Here, we move a node from a hlist to another new one, i.e., update
> node->next from the old hlist to the new hlist. For readers traversing
> the old hlist, if we update node->next just when readers move onto the
> moved node, then the readers also move to the new hlist. This is unexpected.
>
> Reader(lookup) Writer(rehash)
> ----------------- ---------------
> 1. rcu_read_lock()
> 2. pos = sk;
> 3. hlist_del_init_rcu(sk, old_slot)
> 4. hlist_add_head_rcu(sk, new_slot)
> 5. pos = pos->next; <=
> 6. rcu_read_unlock()
>
> [1]
> https://lore.kernel.org/all/0fb425e0-5482-4cdf-9dc1-3906751f8f81@linux.alibaba.com/
Thanks. AFAICS the problem that such thing could cause is a lookup
failure for a socket positioned later in the same chain when a previous
entry is moved on a different slot during a concurrent lookup.
I think that could be solved the same way TCP is handling such scenario:
using hlist_null RCU list for the hash4 bucket, checking that a failed
lookup ends in the same bucket where it started and eventually
reiterating from the original bucket.
Have a look at __inet_lookup_established() for a more descriptive
reference, especially:
https://elixir.bootlin.com/linux/v6.12-rc4/source/net/ipv4/inet_hashtables.c#L528
>>> +
>>> + spin_lock_bh(&nhslot4->lock);
>>> + hlist_add_head_rcu(&udp_sk(sk)->udp_lrpa_node, &nhslot4->head);
>>> + nhslot4->count++;
>>> + spin_unlock_bh(&nhslot4->lock);
>>> + }
>>> +}
>>> +
>>> +static void udp4_unhash4(struct udp_table *udptable, struct sock *sk)
>>> +{
>>> + struct udp_hslot *hslot2, *hslot4;
>>> +
>>> + if (udp_hashed4(sk)) {
>>> + hslot2 = udp_hashslot2(udptable, udp_sk(sk)->udp_portaddr_hash);
>>> + hslot4 = udp_hashslot4(udptable, udp_sk(sk)->udp_lrpa_hash);
>>> +
>>> + spin_lock(&hslot4->lock);
>>> + hlist_del_init_rcu(&udp_sk(sk)->udp_lrpa_node);
>>> + hslot4->count--;
>>> + spin_unlock(&hslot4->lock);
>>> +
>>> + spin_lock(&hslot2->lock);
>>> + udp_hash4_dec(hslot2);
>>> + spin_unlock(&hslot2->lock);
>>> + }
>>> +}
>>> +
>>> +/* call with sock lock */
>>> +static void udp4_hash4(struct sock *sk)
>>> +{
>>> + struct udp_hslot *hslot, *hslot2, *hslot4;
>>> + struct net *net = sock_net(sk);
>>> + struct udp_table *udptable;
>>> + unsigned int hash;
>>> +
>>> + if (sk_unhashed(sk) || inet_sk(sk)->inet_rcv_saddr == htonl(INADDR_ANY))
>>> + return;
>>> +
>>> + hash = udp_ehashfn(net, inet_sk(sk)->inet_rcv_saddr, inet_sk(sk)->inet_num,
>>> + inet_sk(sk)->inet_daddr, inet_sk(sk)->inet_dport);
>>> +
>>> + udptable = net->ipv4.udp_table;
>>> + if (udp_hashed4(sk)) {
>>> + udp4_rehash4(udptable, sk, hash);
>>
>> It's unclear to me how we can enter this branch. Also it's unclear why
>> here you don't need to call udp_hash4_inc()udp_hash4_dec, too. Why such
>> accounting can't be placed in udp4_rehash4()?
>>
>
> It's possible that a connected udp socket _re-connect_ to another remote
> address. Then, because the local address is not changed, hash2 and its
> hash4_cnt keep unchanged. But rehash4 need to be done.
> I'll also add a comment here.
Right, UDP socket could actually connect() successfully twice in a row
without a disconnect in between...
I almost missed the point that the ipv6 implementation is planned to
land afterwards.
I'm sorry, but I think that would be problematic - i.e. if ipv4 support
will land in 6.13, but ipv6 will not make it - due to time constraints -
we will have (at least a release with inconsistent behavior between ipv4
and ipv6. I think it will be better bundle such changes together.
Thanks,
Paolo
Powered by blists - more mailing lists