[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c1eca766-d5e7-4fd8-8ffa-9301f060d6c9@linux.alibaba.com>
Date: Sat, 26 Oct 2024 09:39:57 +0800
From: Philo Lu <lulie@...ux.alibaba.com>
To: Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org
Cc: willemdebruijn.kernel@...il.com, davem@...emloft.net,
edumazet@...gle.com, kuba@...nel.org, dsahern@...nel.org,
antony.antony@...unet.com, steffen.klassert@...unet.com,
linux-kernel@...r.kernel.org, dust.li@...ux.alibaba.com,
jakub@...udflare.com, fred.cc@...baba-inc.com,
yubing.qiuyubing@...baba-inc.com
Subject: Re: [PATCH v5 net-next 3/3] ipv4/udp: Add 4-tuple hash for connected
socket
On 2024/10/25 17:02, Paolo Abeni wrote:
> On 10/25/24 05:50, Philo Lu wrote:
>> On 2024/10/24 23:01, Paolo Abeni wrote:
>>> On 10/18/24 13:45, Philo Lu wrote:
>>> [...]
>>>> +/* In hash4, rehash can also happen in connect(), where hash4_cnt keeps unchanged. */
>>>> +static void udp4_rehash4(struct udp_table *udptable, struct sock *sk, u16 newhash4)
>>>> +{
>>>> + struct udp_hslot *hslot4, *nhslot4;
>>>> +
>>>> + hslot4 = udp_hashslot4(udptable, udp_sk(sk)->udp_lrpa_hash);
>>>> + nhslot4 = udp_hashslot4(udptable, newhash4);
>>>> + udp_sk(sk)->udp_lrpa_hash = newhash4;
>>>> +
>>>> + if (hslot4 != nhslot4) {
>>>> + spin_lock_bh(&hslot4->lock);
>>>> + hlist_del_init_rcu(&udp_sk(sk)->udp_lrpa_node);
>>>> + hslot4->count--;
>>>> + spin_unlock_bh(&hslot4->lock);
>>>> +
>>>> + synchronize_rcu();
>>>
>>> This deserve a comment explaining why it's needed. I had to dig in past
>>> revision to understand it.
>>>
>>
>> Got it. And a short explanation here (see [1] for detail):
>>
>> Here, we move a node from a hlist to another new one, i.e., update
>> node->next from the old hlist to the new hlist. For readers traversing
>> the old hlist, if we update node->next just when readers move onto the
>> moved node, then the readers also move to the new hlist. This is unexpected.
>>
>> Reader(lookup) Writer(rehash)
>> ----------------- ---------------
>> 1. rcu_read_lock()
>> 2. pos = sk;
>> 3. hlist_del_init_rcu(sk, old_slot)
>> 4. hlist_add_head_rcu(sk, new_slot)
>> 5. pos = pos->next; <=
>> 6. rcu_read_unlock()
>>
>> [1]
>> https://lore.kernel.org/all/0fb425e0-5482-4cdf-9dc1-3906751f8f81@linux.alibaba.com/
>
> Thanks. AFAICS the problem that such thing could cause is a lookup
> failure for a socket positioned later in the same chain when a previous
> entry is moved on a different slot during a concurrent lookup.
>
Yes, you're right.
> I think that could be solved the same way TCP is handling such scenario:
> using hlist_null RCU list for the hash4 bucket, checking that a failed
> lookup ends in the same bucket where it started and eventually
> reiterating from the original bucket.
>
> Have a look at __inet_lookup_established() for a more descriptive
> reference, especially:
>
> https://elixir.bootlin.com/linux/v6.12-rc4/source/net/ipv4/inet_hashtables.c#L528
>
Thank you! I'll try it in the next version.
>>>> +
...
>>>> +
>>>> +/* call with sock lock */
>>>> +static void udp4_hash4(struct sock *sk)
>>>> +{
>>>> + struct udp_hslot *hslot, *hslot2, *hslot4;
>>>> + struct net *net = sock_net(sk);
>>>> + struct udp_table *udptable;
>>>> + unsigned int hash;
>>>> +
>>>> + if (sk_unhashed(sk) || inet_sk(sk)->inet_rcv_saddr == htonl(INADDR_ANY))
>>>> + return;
>>>> +
>>>> + hash = udp_ehashfn(net, inet_sk(sk)->inet_rcv_saddr, inet_sk(sk)->inet_num,
>>>> + inet_sk(sk)->inet_daddr, inet_sk(sk)->inet_dport);
>>>> +
>>>> + udptable = net->ipv4.udp_table;
>>>> + if (udp_hashed4(sk)) {
>>>> + udp4_rehash4(udptable, sk, hash);
>>>
>>> It's unclear to me how we can enter this branch. Also it's unclear why
>>> here you don't need to call udp_hash4_inc()udp_hash4_dec, too. Why such
>>> accounting can't be placed in udp4_rehash4()?
>>>
>>
>> It's possible that a connected udp socket _re-connect_ to another remote
>> address. Then, because the local address is not changed, hash2 and its
>> hash4_cnt keep unchanged. But rehash4 need to be done.
>> I'll also add a comment here.
>
> Right, UDP socket could actually connect() successfully twice in a row
> without a disconnect in between...
>
> I almost missed the point that the ipv6 implementation is planned to
> land afterwards.
>
> I'm sorry, but I think that would be problematic - i.e. if ipv4 support
> will land in 6.13, but ipv6 will not make it - due to time constraints -
> we will have (at least a release with inconsistent behavior between ipv4
> and ipv6. I think it will be better bundle such changes together.
>
No problem. I can add ipv6 support in the next version too.
Thanks.
--
Philo
Powered by blists - more mailing lists