[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6de315a7-8ef1-4b5d-8adc-fcfae26f6f88@openvpn.net>
Date: Wed, 15 May 2024 14:54:49 +0200
From: Antonio Quartulli <antonio@...nvpn.net>
To: Sabrina Dubroca <sd@...asysnail.net>
Cc: netdev@...r.kernel.org, Jakub Kicinski <kuba@...nel.org>,
Sergey Ryazanov <ryazanov.s.a@...il.com>, Paolo Abeni <pabeni@...hat.com>,
Eric Dumazet <edumazet@...gle.com>, Andrew Lunn <andrew@...n.ch>,
Esben Haabendal <esben@...nix.com>
Subject: Re: [PATCH net-next v3 13/24] ovpn: implement TCP transport
On 15/05/2024 12:19, Sabrina Dubroca wrote:
> 2024-05-15, 00:11:28 +0200, Antonio Quartulli wrote:
>> On 14/05/2024 10:58, Sabrina Dubroca wrote:
>>>>> The UDP code differentiates "socket already owned by this interface"
>>>>> from "already taken by other user". That doesn't apply to TCP?
>>>>
>>>> This makes me wonder: how safe it is to interpret the user data as an object
>>>> of type ovpn_socket?
>>>>
>>>> When we find the user data already assigned, we don't know what was really
>>>> stored in there, right?
>>>> Technically this socket could have gone through another module which
>>>> assigned its own state.
>>>>
>>>> Therefore I think that what UDP does [ dereferencing ((struct ovpn_socket
>>>> *)user_data)->ovpn ] is probably not safe. Would you agree?
>>>
>>> Hmmm, yeah, I think you're right. If you checked encap_type ==
>>> UDP_ENCAP_OVPNINUDP before (sk_prot for TCP), then you'd know it's
>>> really your data. Basically call ovpn_from_udp_sock during attach if
>>> you want to check something beyond EBUSY.
>>
>> right. Maybe we can leave with simply reporting EBUSY and be done with it,
>> without adding extra checks and what not.
>
> I don't know. What was the reason for the EALREADY handling in udp.c
> and the corresponding refcount increase in ovpn_socket_new?
it's just me that likes to be verbose when doing error reporting.
But eventually the exact error is ignored and we release the reference.
From netlink.c:
342 peer->sock = ovpn_socket_new(sock, peer);
343 if (IS_ERR(peer->sock)) {
344 sockfd_put(sock);
345 peer->sock = NULL;
346 ret = -ENOTSOCK;
so no added value in distinguishing the two cases.
>
>
>>>>>> +int __init ovpn_tcp_init(void)
>>>>>> +{
>>>>>> + /* We need to substitute the recvmsg and the sock_is_readable
>>>>>> + * callbacks in the sk_prot member of the sock object for TCP
>>>>>> + * sockets.
>>>>>> + *
>>>>>> + * However sock->sk_prot is a pointer to a static variable and
>>>>>> + * therefore we can't directly modify it, otherwise every socket
>>>>>> + * pointing to it will be affected.
>>>>>> + *
>>>>>> + * For this reason we create our own static copy and modify what
>>>>>> + * we need. Then we make sk_prot point to this copy
>>>>>> + * (in ovpn_tcp_socket_attach())
>>>>>> + */
>>>>>> + ovpn_tcp_prot = tcp_prot;
>>>>>
>>>>> Don't you need a separate variant for IPv6, like TLS does?
>>>>
>>>> Never did so far.
>>>>
>>>> My wild wild wild guess: for the time this socket is owned by ovpn, we only
>>>> use callbacks that are IPvX agnostic, hence v4 vs v6 doesn't make any
>>>> difference.
>>>> When this socket is released, we reassigned the original prot.
>>>
>>> That seems a bit suspicious to me. For example, tcpv6_prot has a
>>> different backlog_rcv. And you don't control if the socket is detached
>>> before being closed, or which callbacks are needed. Your userspace
>>> client doesn't use them, but someone else's might.
>>>
>>>>>> + ovpn_tcp_prot.recvmsg = ovpn_tcp_recvmsg;
>>>>>
>>>>> You don't need to replace ->sendmsg as well? The userspace client is
>>>>> not expected to send messages?
>>>>
>>>> It is, but my assumption is that those packets will just go through the
>>>> socket as usual. No need to be handled by ovpn (those packets are not
>>>> encrypted/decrypted, like data traffic is).
>>>> And this is how it has worked so far.
>>>>
>>>> Makes sense?
>>>
>>> Two things come to mind:
>>>
>>> - userspace is expected to prefix the messages it inserts on the
>>> stream with the 2-byte length field? otherwise, the peer won't be
>>> able to parse them out of the stream
>>
>> correct. userspace sends those packets as if ovpn is not running, therefore
>> this happens naturally.
>
> ok.
>
>
>>> - I'm not convinced this would be safe wrt kernel writing partial
>>> messages. if ovpn_tcp_send_one doesn't send the full message, you
>>> could interleave two messages:
>>>
>>> +------+-------------------+------+--------+----------------+
>>> | len1 | (bytes from msg1) | len2 | (msg2) | (rest of msg1) |
>>> +------+-------------------+------+--------+----------------+
>>>
>>> and the RX side would parse that as:
>>>
>>> +------+-----------------------------------+------+---------
>>> | len1 | (bytes from msg1) | len2 | (msg2) | ???? | ...
>>> +------+-------------------+---------------+------+---------
>>>
>>> and try to interpret some random bytes out of either msg1 or msg2 as
>>> a length prefix, resulting in a broken stream.
>>
>> hm you are correct. if multiple sendmsg can overlap, then we might be in
>> troubles, but are we sure this can truly happen?
>
> What would prevent this? The kernel_sendmsg call in ovpn_tcp_send_one
> could send a partial message, and then what would stop userspace from
> sending its own message during the cond_resched from ovpn_tcp_tx_work?
I was under the impression that ovpn_tcp_send_one() would always send an
entire packet, but this may not be the case. So you're definitely right.
We may end up having interleaving sendmsg from kernelspace and userspace.
>
>>> The stream format looks identical to ESP in TCP [1] (2B length prefix
>>> followed by the actual message), so I think the espintcp code (both tx
>>> and rx, except for actual protocol parsing) should look very
>>> similar. The problems that need to be solved for both protocols are
>>> pretty much the same.
>>
>> ok, will have a look. maybe this will simplify the code even more and we
>> will get rid of some of the issues we were discussing above.
>
> I doubt dealing with possible interleaving will make the code simpler,
> but I think it has to be done.
Yap.
Thanks a lot for pointing this out and for the pointers you gave me.
>
--
Antonio Quartulli
OpenVPN Inc.
Powered by blists - more mailing lists