netdev - Re: [PATCH net-next v18 20/25] ovpn: implement peer add/get/dump/delete via netlink

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <33710520-5f4f-4d33-a28d-99dc64afc9c3@openvpn.net>
Date: Sun, 19 Jan 2025 14:12:05 +0100
From: Antonio Quartulli <antonio@...nvpn.net>
To: Sabrina Dubroca <sd@...asysnail.net>
Cc: ryazanov.s.a@...il.com, netdev@...r.kernel.org,
 Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
 Paolo Abeni <pabeni@...hat.com>, Donald Hunter <donald.hunter@...il.com>,
 Shuah Khan <shuah@...nel.org>, Andrew Lunn <andrew+netdev@...n.ch>,
 Simon Horman <horms@...nel.org>, linux-kernel@...r.kernel.org,
 linux-kselftest@...r.kernel.org, Xiao Liang <shaw.leon@...il.com>
Subject: Re: [PATCH net-next v18 20/25] ovpn: implement peer
 add/get/dump/delete via netlink

On 17/01/2025 18:12, Sabrina Dubroca wrote:
> 2025-01-17, 13:59:35 +0100, Antonio Quartulli wrote:
>> On 17/01/2025 12:48, Sabrina Dubroca wrote:
>>> 2025-01-13, 10:31:39 +0100, Antonio Quartulli wrote:
>>>>    int ovpn_nl_peer_new_doit(struct sk_buff *skb, struct genl_info *info)
>>>>    {
>>>> -	return -EOPNOTSUPP;
>>>> +	struct nlattr *attrs[OVPN_A_PEER_MAX + 1];
>>>> +	struct ovpn_priv *ovpn = info->user_ptr[0];
>>>> +	struct ovpn_socket *ovpn_sock;
>>>> +	struct socket *sock = NULL;
>>>> +	struct ovpn_peer *peer;
>>>> +	u32 sockfd, peer_id;
>>>> +	int ret;
>>>> +
>>>> +	/* peers can only be added when the interface is up and running */
>>>> +	if (!netif_running(ovpn->dev))
>>>> +		return -ENETDOWN;
>>>
>>> Since we're not under rtnl_lock here, the device could go down while
>>> we're creating this peer, and we may end up with a down device that
>>> has a peer anyway.
>>
>> hmm, indeed. This means we must hold the rtnl_lock to prevent ending up in
>> an inconsistent state.
>>
>>>
>>> I'm not sure what this (and the peer flushing on NETDEV_DOWN) is
>>> trying to accomplish. Is it a problem to keep peers when the netdevice
>>> is down?
>>
>> This is the result of my discussion with Sergey that started in v23 5/23:
>>
>> https://lore.kernel.org/r/netdev/20241029-b4-ovpn-v11-5-de4698c73a25@openvpn.net/
>>
>> The idea was to match operational state with actual connectivity to peer(s).
>>
>> Originally I wanted to simply kee the carrier always on, but after further
>> discussion (including the meaning of the openvpn option --persist-tun) we
>> agreed on following the logic where an UP device has a peer connected (logic
>> is slightly different between MP and P2P).
>>
>> I am not extremely happy with the resulting complexity, but it seemed to be
>> blocker for Sergey.
> 
> [after re-reading that discussion with Sergey]
> 
> I don't understand why "admin does 'ip link set tun0 down'" means "we
> should get rid of all peers. For me the carrier situation goes the
> other way: no peer, no carrier (as if I unplugged the cable from my
> ethernet card), and it's independent of what the user does (ip link
> set XXX up/down). You have that with netif_carrier_{on,off}, but
> flushing peers when the admin does "ip link set tun0 down" is separate
> IMO.

The reasoning was "the user is asking the VPN to go down - it should be 
assumed that from that moment on no VPN traffic whatsoever should flow 
in either direction".
Similarly to when you bring an Eth interface dwn - the phy link goes 
down as well.

Does it make sense?

> 
> [...]
>>>>    int ovpn_nl_peer_del_doit(struct sk_buff *skb, struct genl_info *info)
>>>>    {
>>>> -	return -EOPNOTSUPP;
>>>> +	struct nlattr *attrs[OVPN_A_PEER_MAX + 1];
>>>> +	struct ovpn_priv *ovpn = info->user_ptr[0];
>>>> +	struct ovpn_peer *peer;
>>>> +	u32 peer_id;
>>>> +	int ret;
>>>> +
>>>> +	if (GENL_REQ_ATTR_CHECK(info, OVPN_A_PEER))
>>>> +		return -EINVAL;
>>>> +
>>>> +	ret = nla_parse_nested(attrs, OVPN_A_PEER_MAX, info->attrs[OVPN_A_PEER],
>>>> +			       ovpn_peer_nl_policy, info->extack);
>>>> +	if (ret)
>>>> +		return ret;
>>>> +
>>>> +	if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], attrs,
>>>> +			      OVPN_A_PEER_ID))
>>>> +		return -EINVAL;
>>>> +
>>>> +	peer_id = nla_get_u32(attrs[OVPN_A_PEER_ID]);
>>>> +	peer = ovpn_peer_get_by_id(ovpn, peer_id);
>>>> +	if (!peer) {
>>>> +		NL_SET_ERR_MSG_FMT_MOD(info->extack,
>>>> +				       "cannot find peer with id %u", peer_id);
>>>> +		return -ENOENT;
>>>> +	}
>>>> +
>>>> +	netdev_dbg(ovpn->dev, "del peer %u\n", peer->id);
>>>> +	ret = ovpn_peer_del(peer, OVPN_DEL_PEER_REASON_USERSPACE);
>>>
>>> With the delayed socket release (which is similar to what was in v11,
>>> but now with refcounting on the netdevice which should make
>>> rtnl_link_unregister in ovpn_cleanup wait [*]), we may return to
>>> userspace as if the peer was gone, but the socket hasn't been detached
>>> yet.
>>>
>>> A userspace application that tries to remove the peer and immediately
>>> re-create it with the same socket could get EBUSY if the workqueue
>>> hasn't done its job yet. That would be quite confusing to the
>>> application.
>>
>> This may happen only for TCP, because in the UDP case we would increase the
>> refcounter and keep the socket attached.
> 
> Not if we're re-attaching to a different ovpn instance/netdevice.

Right.
One more reason to go with the completion logic.

> 
>>
>> However, re-attaching the same TCP socket is hardly going to happen (in TCP
>> we have one socket per peer, therefore if the peer is going away, we're most
>> likely killing the socket too).
>>
>> This said, the complexity added by the completion seems quite tiny,
>> therefore I'll add the code you are suggesting below.
> 
> Ok.

Working on it!

Thanks!
Regards,


-- 
Antonio Quartulli
OpenVPN Inc.