linux-kernel - Re: [PATCH net-next v11 09/23] ovpn: implement basic RX path (UDP)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <10799332-b349-451b-a12b-efed43683357@openvpn.net>
Date: Mon, 2 Dec 2024 00:39:56 +0100
From: Antonio Quartulli <antonio@...nvpn.net>
To: Sabrina Dubroca <sd@...asysnail.net>,
 Sergey Ryazanov <ryazanov.s.a@...il.com>
Cc: Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
 Paolo Abeni <pabeni@...hat.com>, Donald Hunter <donald.hunter@...il.com>,
 Andrew Lunn <andrew@...n.ch>, netdev@...r.kernel.org,
 linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org,
 Shuah Khan <shuah@...nel.org>
Subject: Re: [PATCH net-next v11 09/23] ovpn: implement basic RX path (UDP)

On 29/11/2024 17:10, Sabrina Dubroca wrote:
> 2024-11-26, 02:32:38 +0200, Sergey Ryazanov wrote:
>> On 15.11.2024 17:02, Antonio Quartulli wrote:
>>> On 11/11/2024 02:54, Sergey Ryazanov wrote:
>>> [...]
>>>>> +    skb_reset_transport_header(skb);
>>>>> +    skb_probe_transport_header(skb);
>>>>> +    skb_reset_inner_headers(skb);
>>>>> +
>>>>> +    memset(skb->cb, 0, sizeof(skb->cb));
>>>>
>>>> Why do we need to zero the control buffer here?
>>>
>>> To avoid the next layer to assume the cb is clean while it is not.
>>> Other drivers do the same as well.
>>
>> AFAIR, there is no convention to clean the control buffer before the handing
>> over. The common practice is a bit opposite, programmer shall not assume
>> that the control buffer has been zeroed.
>>
>> Not a big deal to clean it here, we just can save some CPU cycles avoiding
>> it.
>>
>>> I think this was recommended by Sabrina as well.
>>
>> Curious. It's macsec that does not zero it, or I've not understood how it
>> was done.
> 
> I only remember discussing a case [1] where one function within ovpn
> was expecting a cleared skb->cb to behave correctly but the caller did
> not clear it. In general, as you said, clearing cb "to be nice to
> other layers" is not expected. Sorry if some comments I made were
> confusing.

No problem at all.
I misunderstood some statement and went the wrong route.
Thanks a lot Sergey for pointing this out.

I am only clearing the cb before usage as required by internal assumptions.

> 
> [1] https://lore.kernel.org/netdev/ZtXOw-NcL9lvwWa8@hog
> 
> 
>>>>> +struct ovpn_struct *ovpn_from_udp_sock(struct sock *sk)
>>>>> +{
>>>>> +    struct ovpn_socket *ovpn_sock;
>>>>> +
>>>>> +    if (unlikely(READ_ONCE(udp_sk(sk)->encap_type) !=
>>>>> UDP_ENCAP_OVPNINUDP))
>>>>> +        return NULL;
>>>>> +
>>>>> +    ovpn_sock = rcu_dereference_sk_user_data(sk);
>>>>> +    if (unlikely(!ovpn_sock))
>>>>> +        return NULL;
>>>>> +
>>>>> +    /* make sure that sk matches our stored transport socket */
>>>>> +    if (unlikely(!ovpn_sock->sock || sk != ovpn_sock->sock->sk))
>>>>> +        return NULL;
>>>>> +
>>>>> +    return ovpn_sock->ovpn;
>>>>
>>>> Now, returning of this pointer is safe. But the following TCP
>>>> transport support calls the socket release via a scheduled work.
>>>> What extends socket lifetime and makes it possible to receive a UDP
>>>> packet way after the interface private data release. Is it correct
>>>> assumption?
>>>
>>> Sorry you lost me when sayng "following *TCP* transp[ort support calls".
>>> This function is invoked only in UDP context.
>>> Was that a typ0?
>>
>> Yeah, you are right. The question sounds like a riddle. I should eventually
>> stop composing emails at midnight. Let me paraphrase it.
>>
>> The potential issue is tricky since we create it patch-by-patch.
>>
>> Up to this patch the socket releasing procedure looks solid and reliable.
>> E.g. the P2P netdev destroying:
>>
>>    ovpn_netdev_notifier_call(NETDEV_UNREGISTER)
>>      ovpn_peer_release_p2p
>>        ovpn_peer_del_p2p
>>          ovpn_peer_put
>>            ovpn_peer_release_kref
>>              ovpn_peer_release
>>                ovpn_socket_put
>>                  ovpn_socket_release_kref
>>                    ovpn_socket_detach
>>                      ovpn_udp_socket_detach
>>                        setup_udp_tunnel_sock
>>    netdev_run_todo
>>      rcu_barrier  <- no running ovpn_udp_encap_recv after this point
> 
> It's more the synchronize_net in unregister_netdevice_many_notify?
> rcu_barrier waits for pending kfree_rcu/call_rcu, synchronize_rcu
> waits for rcu_read_lock sections (see the comments for rcu_barrier and
> synchronize_rcu in kernel/rcu/tree.c).
> 
>>      free_netdev
>>
>> After the setup_udp_tunnel_sock() call no new ovpn_udp_encap_recv() will be
>> spawned. And after the rcu_barrier() all running ovpn_udp_encap_recv() will
>> be done. All good.
>>
>> Then, the following patch 'ovpn: implement TCP transport' disjoin
>> ovpn_socket_release_kref() and ovpn_socket_detach() by scheduling the socket
>> detach function call:
>>
>>    ovpn_socket_release_kref
>>      ovpn_socket_schedule_release
>>        schedule_work(&sock->work)
>>
>> And long time after the socket will be actually detached:
>>
>>    ovpn_socket_release_work
>>      ovpn_socket_detach
>>        ovpn_udp_socket_detach
>>          setup_udp_tunnel_sock
>>
>> And until this detaching will take a place, UDP handler can call
>> ovpn_udp_encap_recv() whatever number of times.
>>
>> So, we can end up with this scenario:
>>
>>    ovpn_netdev_notifier_call(NETDEV_UNREGISTER)
>>      ovpn_peer_release_p2p
>>        ovpn_peer_del_p2p
>>          ovpn_peer_put
>>            ovpn_peer_release_kref
>>              ovpn_peer_release
>>                ovpn_socket_put
>>                  ovpn_socket_release_kref
>>                    ovpn_socket_schedule_release
>>                      schedule_work(&sock->work)
>>    netdev_run_todo
>>      rcu_barrier
>>      free_netdev
>>
>>    ovpn_udp_encap_recv  <- called for an incoming UDP packet
>>      ovpn_from_udp_sock <- returns pointer to freed memory
>>      // Any access to ovpn pointer is the use-after-free
>>
>>    ovpn_socket_release_work  <- kernel finally ivoke the work
>>      ovpn_socket_detach
>>        ovpn_udp_socket_detach
>>          setup_udp_tunnel_sock
>>
>> To address the issue, I see two possible solutions:
>> 1. flush the workqueue somewhere before the netdev release
>> 2. set ovpn_sock->ovpn = NULL before scheduling the socket detach
> 
> Going with #2, we could fully split detach into a synchronous part and
> async part (with async not needed for UDP). detach_sync clears the
> pointers (CBs, strp_stop(), ovpn_sock->ovpn, setup_udp_tunnel_sock) so
> that no more packets will be sent through the ovpn driver.
> 
> Related to that topic, I'm not sure what's keeping a reference on the
> peer to guarantee it doesn't get freed before we're done with
> peer->tcp.tx_work at the end of ovpn_tcp_socket_detach. Maybe all this
> tcp stuff should move from the peer to ovpn_socket?

Good point.
It may make sense to move everything to ovpn_socket and avoid this extra 
dependency on the peer, while it is not needed at all.

I will play with it and see what comes out.

Thanks!

Regards,



-- 
Antonio Quartulli
OpenVPN Inc.