[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <33710520-5f4f-4d33-a28d-99dc64afc9c3@openvpn.net>
Date: Sun, 19 Jan 2025 14:12:05 +0100
From: Antonio Quartulli <antonio@...nvpn.net>
To: Sabrina Dubroca <sd@...asysnail.net>
Cc: ryazanov.s.a@...il.com, netdev@...r.kernel.org,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Donald Hunter <donald.hunter@...il.com>,
Shuah Khan <shuah@...nel.org>, Andrew Lunn <andrew+netdev@...n.ch>,
Simon Horman <horms@...nel.org>, linux-kernel@...r.kernel.org,
linux-kselftest@...r.kernel.org, Xiao Liang <shaw.leon@...il.com>
Subject: Re: [PATCH net-next v18 20/25] ovpn: implement peer
add/get/dump/delete via netlink
On 17/01/2025 18:12, Sabrina Dubroca wrote:
> 2025-01-17, 13:59:35 +0100, Antonio Quartulli wrote:
>> On 17/01/2025 12:48, Sabrina Dubroca wrote:
>>> 2025-01-13, 10:31:39 +0100, Antonio Quartulli wrote:
>>>> int ovpn_nl_peer_new_doit(struct sk_buff *skb, struct genl_info *info)
>>>> {
>>>> - return -EOPNOTSUPP;
>>>> + struct nlattr *attrs[OVPN_A_PEER_MAX + 1];
>>>> + struct ovpn_priv *ovpn = info->user_ptr[0];
>>>> + struct ovpn_socket *ovpn_sock;
>>>> + struct socket *sock = NULL;
>>>> + struct ovpn_peer *peer;
>>>> + u32 sockfd, peer_id;
>>>> + int ret;
>>>> +
>>>> + /* peers can only be added when the interface is up and running */
>>>> + if (!netif_running(ovpn->dev))
>>>> + return -ENETDOWN;
>>>
>>> Since we're not under rtnl_lock here, the device could go down while
>>> we're creating this peer, and we may end up with a down device that
>>> has a peer anyway.
>>
>> hmm, indeed. This means we must hold the rtnl_lock to prevent ending up in
>> an inconsistent state.
>>
>>>
>>> I'm not sure what this (and the peer flushing on NETDEV_DOWN) is
>>> trying to accomplish. Is it a problem to keep peers when the netdevice
>>> is down?
>>
>> This is the result of my discussion with Sergey that started in v23 5/23:
>>
>> https://lore.kernel.org/r/netdev/20241029-b4-ovpn-v11-5-de4698c73a25@openvpn.net/
>>
>> The idea was to match operational state with actual connectivity to peer(s).
>>
>> Originally I wanted to simply kee the carrier always on, but after further
>> discussion (including the meaning of the openvpn option --persist-tun) we
>> agreed on following the logic where an UP device has a peer connected (logic
>> is slightly different between MP and P2P).
>>
>> I am not extremely happy with the resulting complexity, but it seemed to be
>> blocker for Sergey.
>
> [after re-reading that discussion with Sergey]
>
> I don't understand why "admin does 'ip link set tun0 down'" means "we
> should get rid of all peers. For me the carrier situation goes the
> other way: no peer, no carrier (as if I unplugged the cable from my
> ethernet card), and it's independent of what the user does (ip link
> set XXX up/down). You have that with netif_carrier_{on,off}, but
> flushing peers when the admin does "ip link set tun0 down" is separate
> IMO.
The reasoning was "the user is asking the VPN to go down - it should be
assumed that from that moment on no VPN traffic whatsoever should flow
in either direction".
Similarly to when you bring an Eth interface dwn - the phy link goes
down as well.
Does it make sense?
>
> [...]
>>>> int ovpn_nl_peer_del_doit(struct sk_buff *skb, struct genl_info *info)
>>>> {
>>>> - return -EOPNOTSUPP;
>>>> + struct nlattr *attrs[OVPN_A_PEER_MAX + 1];
>>>> + struct ovpn_priv *ovpn = info->user_ptr[0];
>>>> + struct ovpn_peer *peer;
>>>> + u32 peer_id;
>>>> + int ret;
>>>> +
>>>> + if (GENL_REQ_ATTR_CHECK(info, OVPN_A_PEER))
>>>> + return -EINVAL;
>>>> +
>>>> + ret = nla_parse_nested(attrs, OVPN_A_PEER_MAX, info->attrs[OVPN_A_PEER],
>>>> + ovpn_peer_nl_policy, info->extack);
>>>> + if (ret)
>>>> + return ret;
>>>> +
>>>> + if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], attrs,
>>>> + OVPN_A_PEER_ID))
>>>> + return -EINVAL;
>>>> +
>>>> + peer_id = nla_get_u32(attrs[OVPN_A_PEER_ID]);
>>>> + peer = ovpn_peer_get_by_id(ovpn, peer_id);
>>>> + if (!peer) {
>>>> + NL_SET_ERR_MSG_FMT_MOD(info->extack,
>>>> + "cannot find peer with id %u", peer_id);
>>>> + return -ENOENT;
>>>> + }
>>>> +
>>>> + netdev_dbg(ovpn->dev, "del peer %u\n", peer->id);
>>>> + ret = ovpn_peer_del(peer, OVPN_DEL_PEER_REASON_USERSPACE);
>>>
>>> With the delayed socket release (which is similar to what was in v11,
>>> but now with refcounting on the netdevice which should make
>>> rtnl_link_unregister in ovpn_cleanup wait [*]), we may return to
>>> userspace as if the peer was gone, but the socket hasn't been detached
>>> yet.
>>>
>>> A userspace application that tries to remove the peer and immediately
>>> re-create it with the same socket could get EBUSY if the workqueue
>>> hasn't done its job yet. That would be quite confusing to the
>>> application.
>>
>> This may happen only for TCP, because in the UDP case we would increase the
>> refcounter and keep the socket attached.
>
> Not if we're re-attaching to a different ovpn instance/netdevice.
Right.
One more reason to go with the completion logic.
>
>>
>> However, re-attaching the same TCP socket is hardly going to happen (in TCP
>> we have one socket per peer, therefore if the peer is going away, we're most
>> likely killing the socket too).
>>
>> This said, the complexity added by the completion seems quite tiny,
>> therefore I'll add the code you are suggesting below.
>
> Ok.
Working on it!
Thanks!
Regards,
--
Antonio Quartulli
OpenVPN Inc.
Powered by blists - more mailing lists