lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZkMnpy3_T8YO3eHD@hog>
Date: Tue, 14 May 2024 10:58:15 +0200
From: Sabrina Dubroca <sd@...asysnail.net>
To: Antonio Quartulli <antonio@...nvpn.net>
Cc: netdev@...r.kernel.org, Jakub Kicinski <kuba@...nel.org>,
	Sergey Ryazanov <ryazanov.s.a@...il.com>,
	Paolo Abeni <pabeni@...hat.com>, Eric Dumazet <edumazet@...gle.com>,
	Andrew Lunn <andrew@...n.ch>, Esben Haabendal <esben@...nix.com>
Subject: Re: [PATCH net-next v3 13/24] ovpn: implement TCP transport

2024-05-14, 00:20:24 +0200, Antonio Quartulli wrote:
> On 13/05/2024 16:50, Sabrina Dubroca wrote:
> > 2024-05-06, 03:16:26 +0200, Antonio Quartulli wrote:
> > > diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c
> > > index 9ae9844dd281..a04d6e55a473 100644
> > > --- a/drivers/net/ovpn/main.c
> > > +++ b/drivers/net/ovpn/main.c
> > > @@ -23,6 +23,7 @@
> > >   #include "io.h"
> > >   #include "packet.h"
> > >   #include "peer.h"
> > > +#include "tcp.h"
> > >   /* Driver info */
> > >   #define DRV_DESCRIPTION	"OpenVPN data channel offload (ovpn)"
> > > @@ -247,8 +248,14 @@ static struct pernet_operations ovpn_pernet_ops = {
> > >   static int __init ovpn_init(void)
> > >   {
> > > -	int err = register_netdevice_notifier(&ovpn_netdev_notifier);
> > > +	int err = ovpn_tcp_init();
> > > +	if (err) {
> > 
> > ovpn_tcp_init cannot fail (and if it could, you'd need to clean up
> > when register_netdevice_notifier fails). I'd make ovpn_tcp_init void
> > and kill this check.
> 
> I like to have all init functions returning int by design, even though they
> may not fail.
> 
> But I can undersand this is not necessarily good practice (somebody will
> always ask "when does it fail?" and there will will be no answer, which is
> confusing)

Yes, pretty much.


> > > diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h
> > > index b5ff59a4b40f..ac4907705d98 100644
> > > --- a/drivers/net/ovpn/peer.h
> > > +++ b/drivers/net/ovpn/peer.h
> > > + * @tcp.raw_len: next packet length as read from the stream (TCP only)
> > > + * @tcp.skb: next packet being filled with data from the stream (TCP only)
> > > + * @tcp.offset: position of the next byte to write in the skb (TCP only)
> > > + * @tcp.data_len: next packet length converted to host order (TCP only)
> > 
> > It would be nice to add information about whether they're used for TX or RX.
> 
> they are all about "from the stream" and "to the skb", meaning that we are
> doing RX.
> Will make it more explicit.

Maybe group them in a struct rx?

> > > + * @tcp.sk_cb.sk_data_ready: pointer to original cb
> > > + * @tcp.sk_cb.sk_write_space: pointer to original cb
> > > + * @tcp.sk_cb.prot: pointer to original prot object
> > >    * @crypto: the crypto configuration (ciphers, keys, etc..)
> > >    * @dst_cache: cache for dst_entry used to send to peer
> > >    * @bind: remote peer binding
> > > @@ -59,6 +69,25 @@ struct ovpn_peer {
> > >   	struct ptr_ring netif_rx_ring;
> > >   	struct napi_struct napi;
> > >   	struct ovpn_socket *sock;
> > > +	/* state of the TCP reading. Needed to keep track of how much of a
> > > +	 * single packet has already been read from the stream and how much is
> > > +	 * missing
> > > +	 */
> > > +	struct {
> > > +		struct ptr_ring tx_ring;
> > > +		struct work_struct tx_work;
> > > +		struct work_struct rx_work;
> > > +
> > > +		u8 raw_len[sizeof(u16)];
> > 
> > Why not u16 or __be16 for this one?
> 
> because in this array we are putting the bytes as we get them from the
> stream.
> We may be at the point where one out of two bytes is available on the
> stream. For this reason I use an array to store this u16 byte by byte.
> 
> Once thw two bytes are ready, we convert the content in an actual int and
> store it in "data_len" (a few lines below).

Ok, I see. Hopefully you can switch to strparser and make this one go
away.


> > > diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c
> > > index e099a61b03fa..004db5b13663 100644
> > > --- a/drivers/net/ovpn/socket.c
> > > +++ b/drivers/net/ovpn/socket.c
> > > @@ -16,6 +16,7 @@
> > >   #include "packet.h"
> > >   #include "peer.h"
> > >   #include "socket.h"
> > > +#include "tcp.h"
> > >   #include "udp.h"
> > >   /* Finalize release of socket, called after RCU grace period */
> > > @@ -26,6 +27,8 @@ static void ovpn_socket_detach(struct socket *sock)
> > >   	if (sock->sk->sk_protocol == IPPROTO_UDP)
> > >   		ovpn_udp_socket_detach(sock);
> > > +	else if (sock->sk->sk_protocol == IPPROTO_TCP)
> > > +		ovpn_tcp_socket_detach(sock);
> > >   	sockfd_put(sock);
> > >   }
> > > @@ -69,6 +72,8 @@ static int ovpn_socket_attach(struct socket *sock, struct ovpn_peer *peer)
> > >   	if (sock->sk->sk_protocol == IPPROTO_UDP)
> > >   		ret = ovpn_udp_socket_attach(sock, peer->ovpn);
> > > +	else if (sock->sk->sk_protocol == IPPROTO_TCP)
> > > +		ret = ovpn_tcp_socket_attach(sock, peer);
> > >   	return ret;
> > >   }
> > > @@ -124,6 +129,21 @@ struct ovpn_socket *ovpn_socket_new(struct socket *sock, struct ovpn_peer *peer)
> > >   	ovpn_sock->sock = sock;
> > 
> > The line above this is:
> > 
> >      ovpn_sock->ovpn = peer->ovpn;
> > 
> > It's technically fine since you then overwrite this with peer in case
> > we're on TCP, but ovpn_sock->ovpn only exists on UDP since you moved
> > it into a union in this patch.
> 
> Yeah, I did not want to make another branch, but having a UDP specific case
> will make code easier to read.

Either that, or drop the union.


> > > diff --git a/drivers/net/ovpn/tcp.c b/drivers/net/ovpn/tcp.c
> > > new file mode 100644
> > > index 000000000000..84ad7cd4fc4f
> > > --- /dev/null
> > > +++ b/drivers/net/ovpn/tcp.c
> > > @@ -0,0 +1,511 @@
> > > +static int ovpn_tcp_read_sock(read_descriptor_t *desc, struct sk_buff *in_skb,
> > > +			      unsigned int in_offset, size_t in_len)
> > > +{
> > > +	struct sock *sk = desc->arg.data;
> > > +	struct ovpn_socket *sock;
> > > +	struct ovpn_skb_cb *cb;
> > > +	struct ovpn_peer *peer;
> > > +	size_t chunk, copied = 0;
> > > +	void *data;
> > > +	u16 len;
> > > +	int st;
> > > +
> > > +	rcu_read_lock();
> > > +	sock = rcu_dereference_sk_user_data(sk);
> > > +	rcu_read_unlock();
> > 
> > You can't just release rcu_read_lock and keep using sock (here and in
> > the rest of this file). Either you keep rcu_read_lock, or you can take
> > a reference on the ovpn_socket.
> 
> I was just staring at this today, after having worked on the
> rcu_read_lock/unlock for the peer get()s..
> 
> I thinkt the assumption was: if we are in this read_sock callback, it's
> impossible that the ovpn_socket was invalidated, because it gets invalidated
> upon detach, which also prevents any further calling of this callback. But
> this sounds racy, and I guess we should somewhat hold a reference..

ovpn_tcp_read_sock starts

detach
kfree_rcu(ovpn_socket)
...
ovpn_socket actually freed
...
ovpn_tcp_read_sock continues with freed ovpn_socket


I don't think anything in the current code prevents this.


> > > +/* Set TCP encapsulation callbacks */
> > > +int ovpn_tcp_socket_attach(struct socket *sock, struct ovpn_peer *peer)
> > > +{
> > > +	void *old_data;
> > > +	int ret;
> > > +
> > > +	INIT_WORK(&peer->tcp.tx_work, ovpn_tcp_tx_work);
> > > +
> > > +	ret = ptr_ring_init(&peer->tcp.tx_ring, OVPN_QUEUE_LEN, GFP_KERNEL);
> > > +	if (ret < 0) {
> > > +		netdev_err(peer->ovpn->dev, "cannot allocate TCP TX ring\n");
> > > +		return ret;
> > > +	}
> > > +
> > > +	peer->tcp.skb = NULL;
> > > +	peer->tcp.offset = 0;
> > > +	peer->tcp.data_len = 0;
> > > +
> > > +	write_lock_bh(&sock->sk->sk_callback_lock);
> > > +
> > > +	/* make sure no pre-existing encapsulation handler exists */
> > > +	rcu_read_lock();
> > > +	old_data = rcu_dereference_sk_user_data(sock->sk);
> > > +	rcu_read_unlock();
> > > +	if (old_data) {
> > > +		netdev_err(peer->ovpn->dev,
> > > +			   "provided socket already taken by other user\n");
> > > +		ret = -EBUSY;
> > > +		goto err;
> > 
> > The UDP code differentiates "socket already owned by this interface"
> > from "already taken by other user". That doesn't apply to TCP?
> 
> This makes me wonder: how safe it is to interpret the user data as an object
> of type ovpn_socket?
>
> When we find the user data already assigned, we don't know what was really
> stored in there, right?
> Technically this socket could have gone through another module which
> assigned its own state.
> 
> Therefore I think that what UDP does [ dereferencing ((struct ovpn_socket
> *)user_data)->ovpn ] is probably not safe. Would you agree?

Hmmm, yeah, I think you're right. If you checked encap_type ==
UDP_ENCAP_OVPNINUDP before (sk_prot for TCP), then you'd know it's
really your data. Basically call ovpn_from_udp_sock during attach if
you want to check something beyond EBUSY.

Once you're in your own callbacks, it should be safe. If some other
code sends packet with a non-ovpn socket to ovpn's ->encap_rcv,
something is really broken.

> > > +int __init ovpn_tcp_init(void)
> > > +{
> > > +	/* We need to substitute the recvmsg and the sock_is_readable
> > > +	 * callbacks in the sk_prot member of the sock object for TCP
> > > +	 * sockets.
> > > +	 *
> > > +	 * However sock->sk_prot is a pointer to a static variable and
> > > +	 * therefore we can't directly modify it, otherwise every socket
> > > +	 * pointing to it will be affected.
> > > +	 *
> > > +	 * For this reason we create our own static copy and modify what
> > > +	 * we need. Then we make sk_prot point to this copy
> > > +	 * (in ovpn_tcp_socket_attach())
> > > +	 */
> > > +	ovpn_tcp_prot = tcp_prot;
> > 
> > Don't you need a separate variant for IPv6, like TLS does?
> 
> Never did so far.
> 
> My wild wild wild guess: for the time this socket is owned by ovpn, we only
> use callbacks that are IPvX agnostic, hence v4 vs v6 doesn't make any
> difference.
> When this socket is released, we reassigned the original prot.

That seems a bit suspicious to me. For example, tcpv6_prot has a
different backlog_rcv. And you don't control if the socket is detached
before being closed, or which callbacks are needed. Your userspace
client doesn't use them, but someone else's might.

> > > +	ovpn_tcp_prot.recvmsg = ovpn_tcp_recvmsg;
> > 
> > You don't need to replace ->sendmsg as well? The userspace client is
> > not expected to send messages?
> 
> It is, but my assumption is that those packets will just go through the
> socket as usual. No need to be handled by ovpn (those packets are not
> encrypted/decrypted, like data traffic is).
> And this is how it has worked so far.
> 
> Makes sense?

Two things come to mind:

- userspace is expected to prefix the messages it inserts on the
  stream with the 2-byte length field? otherwise, the peer won't be
  able to parse them out of the stream

- I'm not convinced this would be safe wrt kernel writing partial
  messages. if ovpn_tcp_send_one doesn't send the full message, you
  could interleave two messages:

  +------+-------------------+------+--------+----------------+
  | len1 | (bytes from msg1) | len2 | (msg2) | (rest of msg1) |
  +------+-------------------+------+--------+----------------+

  and the RX side would parse that as:

  +------+-----------------------------------+------+---------
  | len1 | (bytes from msg1) | len2 | (msg2) | ???? | ...     
  +------+-------------------+---------------+------+---------

  and try to interpret some random bytes out of either msg1 or msg2 as
  a length prefix, resulting in a broken stream.


The stream format looks identical to ESP in TCP [1] (2B length prefix
followed by the actual message), so I think the espintcp code (both tx
and rx, except for actual protocol parsing) should look very
similar. The problems that need to be solved for both protocols are
pretty much the same.

[1] https://www.rfc-editor.org/rfc/rfc8229#section-3

-- 
Sabrina


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ