linux-kernel - Re: [PATCH net-next v18 20/25] ovpn: implement peer add/get/dump/delete via netlink

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z4qPjuK3_fQUYLJi@hog>
Date: Fri, 17 Jan 2025 18:12:46 +0100
From: Sabrina Dubroca <sd@...asysnail.net>
To: Antonio Quartulli <antonio@...nvpn.net>
Cc: ryazanov.s.a@...il.com, netdev@...r.kernel.org,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	Donald Hunter <donald.hunter@...il.com>,
	Shuah Khan <shuah@...nel.org>, Andrew Lunn <andrew+netdev@...n.ch>,
	Simon Horman <horms@...nel.org>, linux-kernel@...r.kernel.org,
	linux-kselftest@...r.kernel.org, Xiao Liang <shaw.leon@...il.com>
Subject: Re: [PATCH net-next v18 20/25] ovpn: implement peer
 add/get/dump/delete via netlink

2025-01-17, 13:59:35 +0100, Antonio Quartulli wrote:
> On 17/01/2025 12:48, Sabrina Dubroca wrote:
> > 2025-01-13, 10:31:39 +0100, Antonio Quartulli wrote:
> > >   int ovpn_nl_peer_new_doit(struct sk_buff *skb, struct genl_info *info)
> > >   {
> > > -	return -EOPNOTSUPP;
> > > +	struct nlattr *attrs[OVPN_A_PEER_MAX + 1];
> > > +	struct ovpn_priv *ovpn = info->user_ptr[0];
> > > +	struct ovpn_socket *ovpn_sock;
> > > +	struct socket *sock = NULL;
> > > +	struct ovpn_peer *peer;
> > > +	u32 sockfd, peer_id;
> > > +	int ret;
> > > +
> > > +	/* peers can only be added when the interface is up and running */
> > > +	if (!netif_running(ovpn->dev))
> > > +		return -ENETDOWN;
> > 
> > Since we're not under rtnl_lock here, the device could go down while
> > we're creating this peer, and we may end up with a down device that
> > has a peer anyway.
> 
> hmm, indeed. This means we must hold the rtnl_lock to prevent ending up in
> an inconsistent state.
> 
> > 
> > I'm not sure what this (and the peer flushing on NETDEV_DOWN) is
> > trying to accomplish. Is it a problem to keep peers when the netdevice
> > is down?
> 
> This is the result of my discussion with Sergey that started in v23 5/23:
> 
> https://lore.kernel.org/r/netdev/20241029-b4-ovpn-v11-5-de4698c73a25@openvpn.net/
> 
> The idea was to match operational state with actual connectivity to peer(s).
> 
> Originally I wanted to simply kee the carrier always on, but after further
> discussion (including the meaning of the openvpn option --persist-tun) we
> agreed on following the logic where an UP device has a peer connected (logic
> is slightly different between MP and P2P).
> 
> I am not extremely happy with the resulting complexity, but it seemed to be
> blocker for Sergey.

[after re-reading that discussion with Sergey]

I don't understand why "admin does 'ip link set tun0 down'" means "we
should get rid of all peers. For me the carrier situation goes the
other way: no peer, no carrier (as if I unplugged the cable from my
ethernet card), and it's independent of what the user does (ip link
set XXX up/down). You have that with netif_carrier_{on,off}, but
flushing peers when the admin does "ip link set tun0 down" is separate
IMO.

[...]
> > >   int ovpn_nl_peer_del_doit(struct sk_buff *skb, struct genl_info *info)
> > >   {
> > > -	return -EOPNOTSUPP;
> > > +	struct nlattr *attrs[OVPN_A_PEER_MAX + 1];
> > > +	struct ovpn_priv *ovpn = info->user_ptr[0];
> > > +	struct ovpn_peer *peer;
> > > +	u32 peer_id;
> > > +	int ret;
> > > +
> > > +	if (GENL_REQ_ATTR_CHECK(info, OVPN_A_PEER))
> > > +		return -EINVAL;
> > > +
> > > +	ret = nla_parse_nested(attrs, OVPN_A_PEER_MAX, info->attrs[OVPN_A_PEER],
> > > +			       ovpn_peer_nl_policy, info->extack);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], attrs,
> > > +			      OVPN_A_PEER_ID))
> > > +		return -EINVAL;
> > > +
> > > +	peer_id = nla_get_u32(attrs[OVPN_A_PEER_ID]);
> > > +	peer = ovpn_peer_get_by_id(ovpn, peer_id);
> > > +	if (!peer) {
> > > +		NL_SET_ERR_MSG_FMT_MOD(info->extack,
> > > +				       "cannot find peer with id %u", peer_id);
> > > +		return -ENOENT;
> > > +	}
> > > +
> > > +	netdev_dbg(ovpn->dev, "del peer %u\n", peer->id);
> > > +	ret = ovpn_peer_del(peer, OVPN_DEL_PEER_REASON_USERSPACE);
> > 
> > With the delayed socket release (which is similar to what was in v11,
> > but now with refcounting on the netdevice which should make
> > rtnl_link_unregister in ovpn_cleanup wait [*]), we may return to
> > userspace as if the peer was gone, but the socket hasn't been detached
> > yet.
> > 
> > A userspace application that tries to remove the peer and immediately
> > re-create it with the same socket could get EBUSY if the workqueue
> > hasn't done its job yet. That would be quite confusing to the
> > application.
> 
> This may happen only for TCP, because in the UDP case we would increase the
> refcounter and keep the socket attached.

Not if we're re-attaching to a different ovpn instance/netdevice.

> 
> However, re-attaching the same TCP socket is hardly going to happen (in TCP
> we have one socket per peer, therefore if the peer is going away, we're most
> likely killing the socket too).
> 
> This said, the complexity added by the completion seems quite tiny,
> therefore I'll add the code you are suggesting below.

Ok.

-- 
Sabrina