[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZkSMPeSSS4VZxHrf@hog>
Date: Wed, 15 May 2024 12:19:41 +0200
From: Sabrina Dubroca <sd@...asysnail.net>
To: Antonio Quartulli <antonio@...nvpn.net>
Cc: netdev@...r.kernel.org, Jakub Kicinski <kuba@...nel.org>,
Sergey Ryazanov <ryazanov.s.a@...il.com>,
Paolo Abeni <pabeni@...hat.com>, Eric Dumazet <edumazet@...gle.com>,
Andrew Lunn <andrew@...n.ch>, Esben Haabendal <esben@...nix.com>
Subject: Re: [PATCH net-next v3 13/24] ovpn: implement TCP transport
2024-05-15, 00:11:28 +0200, Antonio Quartulli wrote:
> On 14/05/2024 10:58, Sabrina Dubroca wrote:
> > > > The UDP code differentiates "socket already owned by this interface"
> > > > from "already taken by other user". That doesn't apply to TCP?
> > >
> > > This makes me wonder: how safe it is to interpret the user data as an object
> > > of type ovpn_socket?
> > >
> > > When we find the user data already assigned, we don't know what was really
> > > stored in there, right?
> > > Technically this socket could have gone through another module which
> > > assigned its own state.
> > >
> > > Therefore I think that what UDP does [ dereferencing ((struct ovpn_socket
> > > *)user_data)->ovpn ] is probably not safe. Would you agree?
> >
> > Hmmm, yeah, I think you're right. If you checked encap_type ==
> > UDP_ENCAP_OVPNINUDP before (sk_prot for TCP), then you'd know it's
> > really your data. Basically call ovpn_from_udp_sock during attach if
> > you want to check something beyond EBUSY.
>
> right. Maybe we can leave with simply reporting EBUSY and be done with it,
> without adding extra checks and what not.
I don't know. What was the reason for the EALREADY handling in udp.c
and the corresponding refcount increase in ovpn_socket_new?
> > > > > +int __init ovpn_tcp_init(void)
> > > > > +{
> > > > > + /* We need to substitute the recvmsg and the sock_is_readable
> > > > > + * callbacks in the sk_prot member of the sock object for TCP
> > > > > + * sockets.
> > > > > + *
> > > > > + * However sock->sk_prot is a pointer to a static variable and
> > > > > + * therefore we can't directly modify it, otherwise every socket
> > > > > + * pointing to it will be affected.
> > > > > + *
> > > > > + * For this reason we create our own static copy and modify what
> > > > > + * we need. Then we make sk_prot point to this copy
> > > > > + * (in ovpn_tcp_socket_attach())
> > > > > + */
> > > > > + ovpn_tcp_prot = tcp_prot;
> > > >
> > > > Don't you need a separate variant for IPv6, like TLS does?
> > >
> > > Never did so far.
> > >
> > > My wild wild wild guess: for the time this socket is owned by ovpn, we only
> > > use callbacks that are IPvX agnostic, hence v4 vs v6 doesn't make any
> > > difference.
> > > When this socket is released, we reassigned the original prot.
> >
> > That seems a bit suspicious to me. For example, tcpv6_prot has a
> > different backlog_rcv. And you don't control if the socket is detached
> > before being closed, or which callbacks are needed. Your userspace
> > client doesn't use them, but someone else's might.
> >
> > > > > + ovpn_tcp_prot.recvmsg = ovpn_tcp_recvmsg;
> > > >
> > > > You don't need to replace ->sendmsg as well? The userspace client is
> > > > not expected to send messages?
> > >
> > > It is, but my assumption is that those packets will just go through the
> > > socket as usual. No need to be handled by ovpn (those packets are not
> > > encrypted/decrypted, like data traffic is).
> > > And this is how it has worked so far.
> > >
> > > Makes sense?
> >
> > Two things come to mind:
> >
> > - userspace is expected to prefix the messages it inserts on the
> > stream with the 2-byte length field? otherwise, the peer won't be
> > able to parse them out of the stream
>
> correct. userspace sends those packets as if ovpn is not running, therefore
> this happens naturally.
ok.
> > - I'm not convinced this would be safe wrt kernel writing partial
> > messages. if ovpn_tcp_send_one doesn't send the full message, you
> > could interleave two messages:
> >
> > +------+-------------------+------+--------+----------------+
> > | len1 | (bytes from msg1) | len2 | (msg2) | (rest of msg1) |
> > +------+-------------------+------+--------+----------------+
> >
> > and the RX side would parse that as:
> >
> > +------+-----------------------------------+------+---------
> > | len1 | (bytes from msg1) | len2 | (msg2) | ???? | ...
> > +------+-------------------+---------------+------+---------
> >
> > and try to interpret some random bytes out of either msg1 or msg2 as
> > a length prefix, resulting in a broken stream.
>
> hm you are correct. if multiple sendmsg can overlap, then we might be in
> troubles, but are we sure this can truly happen?
What would prevent this? The kernel_sendmsg call in ovpn_tcp_send_one
could send a partial message, and then what would stop userspace from
sending its own message during the cond_resched from ovpn_tcp_tx_work?
> > The stream format looks identical to ESP in TCP [1] (2B length prefix
> > followed by the actual message), so I think the espintcp code (both tx
> > and rx, except for actual protocol parsing) should look very
> > similar. The problems that need to be solved for both protocols are
> > pretty much the same.
>
> ok, will have a look. maybe this will simplify the code even more and we
> will get rid of some of the issues we were discussing above.
I doubt dealing with possible interleaving will make the code simpler,
but I think it has to be done.
--
Sabrina
Powered by blists - more mailing lists