netdev - Re: [PATCH net-next v3 13/24] ovpn: implement TCP transport

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6de315a7-8ef1-4b5d-8adc-fcfae26f6f88@openvpn.net>
Date: Wed, 15 May 2024 14:54:49 +0200
From: Antonio Quartulli <antonio@...nvpn.net>
To: Sabrina Dubroca <sd@...asysnail.net>
Cc: netdev@...r.kernel.org, Jakub Kicinski <kuba@...nel.org>,
 Sergey Ryazanov <ryazanov.s.a@...il.com>, Paolo Abeni <pabeni@...hat.com>,
 Eric Dumazet <edumazet@...gle.com>, Andrew Lunn <andrew@...n.ch>,
 Esben Haabendal <esben@...nix.com>
Subject: Re: [PATCH net-next v3 13/24] ovpn: implement TCP transport

On 15/05/2024 12:19, Sabrina Dubroca wrote:
> 2024-05-15, 00:11:28 +0200, Antonio Quartulli wrote:
>> On 14/05/2024 10:58, Sabrina Dubroca wrote:
>>>>> The UDP code differentiates "socket already owned by this interface"
>>>>> from "already taken by other user". That doesn't apply to TCP?
>>>>
>>>> This makes me wonder: how safe it is to interpret the user data as an object
>>>> of type ovpn_socket?
>>>>
>>>> When we find the user data already assigned, we don't know what was really
>>>> stored in there, right?
>>>> Technically this socket could have gone through another module which
>>>> assigned its own state.
>>>>
>>>> Therefore I think that what UDP does [ dereferencing ((struct ovpn_socket
>>>> *)user_data)->ovpn ] is probably not safe. Would you agree?
>>>
>>> Hmmm, yeah, I think you're right. If you checked encap_type ==
>>> UDP_ENCAP_OVPNINUDP before (sk_prot for TCP), then you'd know it's
>>> really your data. Basically call ovpn_from_udp_sock during attach if
>>> you want to check something beyond EBUSY.
>>
>> right. Maybe we can leave with simply reporting EBUSY and be done with it,
>> without adding extra checks and what not.
> 
> I don't know. What was the reason for the EALREADY handling in udp.c
> and the corresponding refcount increase in ovpn_socket_new?

it's just me that likes to be verbose when doing error reporting.
But eventually the exact error is ignored and we release the reference. 
 From netlink.c:

342                 peer->sock = ovpn_socket_new(sock, peer);
343                 if (IS_ERR(peer->sock)) {
344                         sockfd_put(sock);
345                         peer->sock = NULL;
346                         ret = -ENOTSOCK;

so no added value in distinguishing the two cases.

> 
> 
>>>>>> +int __init ovpn_tcp_init(void)
>>>>>> +{
>>>>>> +	/* We need to substitute the recvmsg and the sock_is_readable
>>>>>> +	 * callbacks in the sk_prot member of the sock object for TCP
>>>>>> +	 * sockets.
>>>>>> +	 *
>>>>>> +	 * However sock->sk_prot is a pointer to a static variable and
>>>>>> +	 * therefore we can't directly modify it, otherwise every socket
>>>>>> +	 * pointing to it will be affected.
>>>>>> +	 *
>>>>>> +	 * For this reason we create our own static copy and modify what
>>>>>> +	 * we need. Then we make sk_prot point to this copy
>>>>>> +	 * (in ovpn_tcp_socket_attach())
>>>>>> +	 */
>>>>>> +	ovpn_tcp_prot = tcp_prot;
>>>>>
>>>>> Don't you need a separate variant for IPv6, like TLS does?
>>>>
>>>> Never did so far.
>>>>
>>>> My wild wild wild guess: for the time this socket is owned by ovpn, we only
>>>> use callbacks that are IPvX agnostic, hence v4 vs v6 doesn't make any
>>>> difference.
>>>> When this socket is released, we reassigned the original prot.
>>>
>>> That seems a bit suspicious to me. For example, tcpv6_prot has a
>>> different backlog_rcv. And you don't control if the socket is detached
>>> before being closed, or which callbacks are needed. Your userspace
>>> client doesn't use them, but someone else's might.
>>>
>>>>>> +	ovpn_tcp_prot.recvmsg = ovpn_tcp_recvmsg;
>>>>>
>>>>> You don't need to replace ->sendmsg as well? The userspace client is
>>>>> not expected to send messages?
>>>>
>>>> It is, but my assumption is that those packets will just go through the
>>>> socket as usual. No need to be handled by ovpn (those packets are not
>>>> encrypted/decrypted, like data traffic is).
>>>> And this is how it has worked so far.
>>>>
>>>> Makes sense?
>>>
>>> Two things come to mind:
>>>
>>> - userspace is expected to prefix the messages it inserts on the
>>>     stream with the 2-byte length field? otherwise, the peer won't be
>>>     able to parse them out of the stream
>>
>> correct. userspace sends those packets as if ovpn is not running, therefore
>> this happens naturally.
> 
> ok.
> 
> 
>>> - I'm not convinced this would be safe wrt kernel writing partial
>>>     messages. if ovpn_tcp_send_one doesn't send the full message, you
>>>     could interleave two messages:
>>>
>>>     +------+-------------------+------+--------+----------------+
>>>     | len1 | (bytes from msg1) | len2 | (msg2) | (rest of msg1) |
>>>     +------+-------------------+------+--------+----------------+
>>>
>>>     and the RX side would parse that as:
>>>
>>>     +------+-----------------------------------+------+---------
>>>     | len1 | (bytes from msg1) | len2 | (msg2) | ???? | ...
>>>     +------+-------------------+---------------+------+---------
>>>
>>>     and try to interpret some random bytes out of either msg1 or msg2 as
>>>     a length prefix, resulting in a broken stream.
>>
>> hm you are correct. if multiple sendmsg can overlap, then we might be in
>> troubles, but are we sure this can truly happen?
> 
> What would prevent this? The kernel_sendmsg call in ovpn_tcp_send_one
> could send a partial message, and then what would stop userspace from
> sending its own message during the cond_resched from ovpn_tcp_tx_work?

I was under the impression that ovpn_tcp_send_one() would always send an 
entire packet, but this may not be the case. So you're definitely right.

We may end up having interleaving sendmsg from kernelspace and userspace.

> 
>>> The stream format looks identical to ESP in TCP [1] (2B length prefix
>>> followed by the actual message), so I think the espintcp code (both tx
>>> and rx, except for actual protocol parsing) should look very
>>> similar. The problems that need to be solved for both protocols are
>>> pretty much the same.
>>
>> ok, will have a look. maybe this will simplify the code even more and we
>> will get rid of some of the issues we were discussing above.
> 
> I doubt dealing with possible interleaving will make the code simpler,
> but I think it has to be done.

Yap.

Thanks a lot for pointing this out and for the pointers you gave me.

> 

-- 
Antonio Quartulli
OpenVPN Inc.