netdev - Re: [PATCH v5 net-next 3/3] ipv4/udp: Add 4-tuple hash for connected socket

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ad78a2bb-9dc4-4f80-9011-b49fd721a425@redhat.com>
Date: Fri, 25 Oct 2024 11:02:13 +0200
From: Paolo Abeni <pabeni@...hat.com>
To: Philo Lu <lulie@...ux.alibaba.com>, netdev@...r.kernel.org
Cc: willemdebruijn.kernel@...il.com, davem@...emloft.net,
 edumazet@...gle.com, kuba@...nel.org, dsahern@...nel.org,
 antony.antony@...unet.com, steffen.klassert@...unet.com,
 linux-kernel@...r.kernel.org, dust.li@...ux.alibaba.com,
 jakub@...udflare.com, fred.cc@...baba-inc.com,
 yubing.qiuyubing@...baba-inc.com
Subject: Re: [PATCH v5 net-next 3/3] ipv4/udp: Add 4-tuple hash for connected
 socket

On 10/25/24 05:50, Philo Lu wrote:
> On 2024/10/24 23:01, Paolo Abeni wrote:
>> On 10/18/24 13:45, Philo Lu wrote:
>> [...]
>>> +/* In hash4, rehash can also happen in connect(), where hash4_cnt keeps unchanged. */
>>> +static void udp4_rehash4(struct udp_table *udptable, struct sock *sk, u16 newhash4)
>>> +{
>>> +	struct udp_hslot *hslot4, *nhslot4;
>>> +
>>> +	hslot4 = udp_hashslot4(udptable, udp_sk(sk)->udp_lrpa_hash);
>>> +	nhslot4 = udp_hashslot4(udptable, newhash4);
>>> +	udp_sk(sk)->udp_lrpa_hash = newhash4;
>>> +
>>> +	if (hslot4 != nhslot4) {
>>> +		spin_lock_bh(&hslot4->lock);
>>> +		hlist_del_init_rcu(&udp_sk(sk)->udp_lrpa_node);
>>> +		hslot4->count--;
>>> +		spin_unlock_bh(&hslot4->lock);
>>> +
>>> +		synchronize_rcu();
>>
>> This deserve a comment explaining why it's needed. I had to dig in past
>> revision to understand it.
>>
> 
> Got it. And a short explanation here (see [1] for detail):
> 
> Here, we move a node from a hlist to another new one, i.e., update 
> node->next from the old hlist to the new hlist. For readers traversing 
> the old hlist, if we update node->next just when readers move onto the 
> moved node, then the readers also move to the new hlist. This is unexpected.
> 
>      Reader(lookup)     Writer(rehash)
>      -----------------  ---------------
> 1. rcu_read_lock()
> 2. pos = sk;
> 3.                     hlist_del_init_rcu(sk, old_slot)
> 4.                     hlist_add_head_rcu(sk, new_slot)
> 5. pos = pos->next; <=
> 6. rcu_read_unlock()
> 
> [1]
> https://lore.kernel.org/all/0fb425e0-5482-4cdf-9dc1-3906751f8f81@linux.alibaba.com/

Thanks. AFAICS the problem that such thing could cause is a lookup
failure for a socket positioned later in the same chain when a previous
entry is moved on a different slot during a concurrent lookup.

I think that could be solved the same way TCP is handling such scenario:
using hlist_null RCU list for the hash4 bucket, checking that a failed
lookup ends in the same bucket where it started and eventually
reiterating from the original bucket.

Have a look at __inet_lookup_established() for a more descriptive
reference, especially:

https://elixir.bootlin.com/linux/v6.12-rc4/source/net/ipv4/inet_hashtables.c#L528

>>> +
>>> +		spin_lock_bh(&nhslot4->lock);
>>> +		hlist_add_head_rcu(&udp_sk(sk)->udp_lrpa_node, &nhslot4->head);
>>> +		nhslot4->count++;
>>> +		spin_unlock_bh(&nhslot4->lock);
>>> +	}
>>> +}
>>> +
>>> +static void udp4_unhash4(struct udp_table *udptable, struct sock *sk)
>>> +{
>>> +	struct udp_hslot *hslot2, *hslot4;
>>> +
>>> +	if (udp_hashed4(sk)) {
>>> +		hslot2 = udp_hashslot2(udptable, udp_sk(sk)->udp_portaddr_hash);
>>> +		hslot4 = udp_hashslot4(udptable, udp_sk(sk)->udp_lrpa_hash);
>>> +
>>> +		spin_lock(&hslot4->lock);
>>> +		hlist_del_init_rcu(&udp_sk(sk)->udp_lrpa_node);
>>> +		hslot4->count--;
>>> +		spin_unlock(&hslot4->lock);
>>> +
>>> +		spin_lock(&hslot2->lock);
>>> +		udp_hash4_dec(hslot2);
>>> +		spin_unlock(&hslot2->lock);
>>> +	}
>>> +}
>>> +
>>> +/* call with sock lock */
>>> +static void udp4_hash4(struct sock *sk)
>>> +{
>>> +	struct udp_hslot *hslot, *hslot2, *hslot4;
>>> +	struct net *net = sock_net(sk);
>>> +	struct udp_table *udptable;
>>> +	unsigned int hash;
>>> +
>>> +	if (sk_unhashed(sk) || inet_sk(sk)->inet_rcv_saddr == htonl(INADDR_ANY))
>>> +		return;
>>> +
>>> +	hash = udp_ehashfn(net, inet_sk(sk)->inet_rcv_saddr, inet_sk(sk)->inet_num,
>>> +			   inet_sk(sk)->inet_daddr, inet_sk(sk)->inet_dport);
>>> +
>>> +	udptable = net->ipv4.udp_table;
>>> +	if (udp_hashed4(sk)) {
>>> +		udp4_rehash4(udptable, sk, hash);
>>
>> It's unclear to me how we can enter this branch. Also it's unclear why
>> here you don't need to call udp_hash4_inc()udp_hash4_dec, too. Why such
>> accounting can't be placed in udp4_rehash4()?
>>
> 
> It's possible that a connected udp socket _re-connect_ to another remote 
> address. Then, because the local address is not changed, hash2 and its 
> hash4_cnt keep unchanged. But rehash4 need to be done.
> I'll also add a comment here.

Right, UDP socket could actually connect() successfully twice in a row
without a disconnect in between...

I almost missed the point that the ipv6 implementation is planned to
land afterwards.

I'm sorry, but I think that would be problematic - i.e. if ipv4 support
will land in 6.13, but ipv6 will not make it - due to time constraints -
we will have (at least a release with inconsistent behavior between ipv4
and ipv6. I think it will be better bundle such changes together.

Thanks,

Paolo