netdev - Re: [PATCH net-next v3 04/24] ovpn: add basic interface creation/destruction/management routines

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1d31ca80-055e-4601-91b6-b0dc38b721c7@openvpn.net>
Date: Thu, 9 May 2024 12:35:32 +0200
From: Antonio Quartulli <antonio@...nvpn.net>
To: Sabrina Dubroca <sd@...asysnail.net>
Cc: netdev@...r.kernel.org, Jakub Kicinski <kuba@...nel.org>,
 Sergey Ryazanov <ryazanov.s.a@...il.com>, Paolo Abeni <pabeni@...hat.com>,
 Eric Dumazet <edumazet@...gle.com>, Andrew Lunn <andrew@...n.ch>,
 Esben Haabendal <esben@...nix.com>
Subject: Re: [PATCH net-next v3 04/24] ovpn: add basic interface
 creation/destruction/management routines

On 09/05/2024 12:09, Sabrina Dubroca wrote:
> 2024-05-09, 10:25:44 +0200, Antonio Quartulli wrote:
>> On 08/05/2024 16:52, Sabrina Dubroca wrote:
>>> 2024-05-06, 03:16:17 +0200, Antonio Quartulli wrote:
>>>> diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c
>>>> index 33c0b004ce16..584cd7286aff 100644
>>>> --- a/drivers/net/ovpn/main.c
>>>> +++ b/drivers/net/ovpn/main.c
>>> [...]
>>>> +static void ovpn_struct_free(struct net_device *net)
>>>> +{
>>>> +	struct ovpn_struct *ovpn = netdev_priv(net);
>>>> +
>>>> +	rtnl_lock();
>>>
>>>    ->priv_destructor can run from register_netdevice (already under
>>> RTNL), this doesn't look right.
>>>
>>>> +	list_del(&ovpn->dev_list);
>>>
>>> And if this gets called from register_netdevice, the list_add from
>>> ovpn_iface_create hasn't run yet, so this will probably do strange
>>> things?
>>
>> Argh, again I haven't considered a failure in register_netdevice and you are
>> indeed right.
>>
>> Maybe it is better to call list_del() in the netdev notifier, upon
>> NETDEV_UNREGISTER event?
> 
> I'd like to avoid splitting the clean up code over so maybe different
> functions and called through different means. Keep it simple.
> 
> AFAICT the only reason you need this list is to delete your devices on
> netns exit, so if we can get rid of that the list can go away.

right.

> 
> 
>>>> +static int ovpn_net_open(struct net_device *dev)
>>>> +{
>>>> +	struct in_device *dev_v4 = __in_dev_get_rtnl(dev);
>>>> +
>>>> +	if (dev_v4) {
>>>> +		/* disable redirects as Linux gets confused by ovpn handling
>>>> +		 * same-LAN routing
>>>> +		 */
>>>> +		IN_DEV_CONF_SET(dev_v4, SEND_REDIRECTS, false);
>>>> +		IPV4_DEVCONF_ALL(dev_net(dev), SEND_REDIRECTS) = false;
>>>
>>> Jakub, are you ok with that? This feels a bit weird to have in the
>>> middle of a driver.
>>
>> Let me share what the problem is (copied from the email I sent to Andrew
>> Lunn as he was also curious about this):
>>
>> The reason for requiring this setting lies in the OpenVPN server acting as
>> relay point (star topology) for hosts in the same subnet.
>>
>> Example: given the a.b.c.0/24 IP network, you have .2 that in order to talk
>> to .3 must have its traffic relayed by .1 (the server).
>>
>> When the kernel (at .1) sees this traffic it will send the ICMP redirects,
>> because it believes that .2 should directly talk to .3 without passing
>> through .1.
> 
> So only the server would need to stop sending them, not the client?

correct

> (or the client would need to ignore them)
> But the kernel has no way of knowing if an ovpn device is on a client
> or a server?

the server knows if the interface is configured in P2P or MP (MultiPeer) 
mode. The latter is what requires redirects to be off, so we could at 
least add a check and switch them off only for MP ifaces.

> 
>> Of course it makes sense in a normal network with a classic broadcast
>> domain, but this is not the case in a VPN implemented as a star topology.
>>
>> Does it make sense?
> 
> It looks like the problem is that ovpn links are point-to-point
> (instead of a broadcast LAN kind of link where redirects would make
> sense), and the kernel doesn't handle it that way.

exactly

> 
>> The only way I see to fix this globally is to have an extra flag in the
>> netdevice signaling this peculiarity and thus disabling ICMP redirects
>> automatically.
>>
>> Note: wireguard has those lines too, as it probably needs to address the
>> same scenario.
> 
> I've noticed a lot of similarities in some bits I've looked at, and I
> hate that this is turning into another pile of duplicate code like
> vxlan/geneve, bond/team, etc :(

For starters, we could at least moves these few lines in some helper 
function and call it from both modules.

On the other hand, we could, like I suggested above, convert this into a 
netdev flag and let core handle the behaviour when the flag is set.

> 
> 
>>> [...]
>>>> +void ovpn_iface_destruct(struct ovpn_struct *ovpn)
>>>> +{
>>>> +	ASSERT_RTNL();
>>>> +
>>>> +	netif_carrier_off(ovpn->dev);
>>>> +
>>>> +	ovpn->registered = false;
>>>> +
>>>> +	unregister_netdevice(ovpn->dev);
>>>> +	synchronize_net();
>>>
>>> If this gets called from the loop in ovpn_netns_pre_exit, one
>>> synchronize_net per ovpn device would seem quite expensive.
>>
>> As per your other comment, maybe I should just remove the synchronize_net()
>> entirely since it'll be the core to take care of inflight packets?
> 
> There's a synchronize_net in unregister_netdevice_many_notify, so I'd
> say you can get rid of it here.

ok! Jakub was indeed suggesting that core should already take care of this.

Will remove it for good.

> 
> 
>>>>    static int ovpn_netdev_notifier_call(struct notifier_block *nb,
>>>>    				     unsigned long state, void *ptr)
>>>>    {
>>>>    	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
>>>> +	struct ovpn_struct *ovpn;
>>>>    	if (!ovpn_dev_is_valid(dev))
>>>>    		return NOTIFY_DONE;
>>>> +	ovpn = netdev_priv(dev);
>>>> +
>>>>    	switch (state) {
>>>>    	case NETDEV_REGISTER:
>>>> -		/* add device to internal list for later destruction upon
>>>> -		 * unregistration
>>>> -		 */
>>>> +		ovpn->registered = true;
>>>>    		break;
>>>>    	case NETDEV_UNREGISTER:
>>>> +		/* twiddle thumbs on netns device moves */
>>>> +		if (dev->reg_state != NETREG_UNREGISTERING)
>>>> +			break;
>>>> +
>>>>    		/* can be delivered multiple times, so check registered flag,
>>>>    		 * then destroy the interface
>>>>    		 */
>>>> +		if (!ovpn->registered)
>>>> +			return NOTIFY_DONE;
>>>> +
>>>> +		ovpn_iface_destruct(ovpn);
>>>
>>> Maybe I'm misunderstanding this code. Why do you want to manually
>>> destroy a device that is already going away?
>>
>> We need to perform some internal cleanup (i.e. release all peers).
>> I don't see how this can happen automatically, no?
> 
> That's what ->priv_destructor does, 

Not really.

Every peer object increases the netdev refcounter to the netdev, 
therefore we must first delete all peers in order to have 
netdevice->refcnt reach 0 (and then invoke priv_destructor).

So the idea is: upon UNREGISTER event we drop all resources and 
eventually (via RCU) all references to the netdev are also released, 
which in turn triggers the destructor.

makes sense?


> and it will be called ultimately
> by the unregister_netdevice call you have in ovpn_iface_destruct (in
> netdev_run_todo). Anyway, this UNREGISTER event is probably generated
> by unregister_netdevice_many_notify (basically a previous
> unregister_netdevice() call), so I don't know why you want to call
> unregister_netdevice again on the same device.

I believe I have seen this notification being triggered upon netns exit, 
but in that case the netdevice was not being removed from core.

Hence I decided to fully trigger the unregistration.

Expected?

I can repeat the test to be sure.

> 
> 
>>>> @@ -62,6 +210,24 @@ static struct notifier_block ovpn_netdev_notifier = {
>>>>    	.notifier_call = ovpn_netdev_notifier_call,
>>>>    };
>>>> +static void ovpn_netns_pre_exit(struct net *net)
>>>> +{
>>>> +	struct ovpn_struct *ovpn;
>>>> +
>>>> +	rtnl_lock();
>>>> +	list_for_each_entry(ovpn, &dev_list, dev_list) {
>>>> +		if (dev_net(ovpn->dev) != net)
>>>> +			continue;
>>>> +
>>>> +		ovpn_iface_destruct(ovpn);
>>>
>>> Is this needed? On netns destruction all devices within the ns will be
>>> destroyed by the networking core.
>>
>> Before implementing ovpn_netns_pre_exit() this way, upon namespace deletion
>> the ovpn interface was being moved to the global namespace.
> 
> Crap it's only the devices with ->rtnl_link_ops that get killed by the
> core. 

exactly! this goes hand to hand with my comment above: event delivered 
but interface not destroyed.

> Because you create your devices via genl (which I'm not a fan
> of, even if it's a bit nicer for userspace having a single netlink api
> to deal with),

Originally I had implemented the rtnl_link_ops, but the (meaningful) 
objection was that a user is never supposed to create an ovpn iface by 
himself, but there should always be an openvpn process running in 
userspace. Hence the restriction to genl only.

> default_device_exit_batch/default_device_exit_net think
> ovpn devices are real NICs and move them back to init_net instead of
> destroying them.
> 
> Maybe we can extend the condition in default_device_exit_net with a
> new flag so that ovpn devices get destroyed by the core, even without
> rtnl_link_ops?

Thanks for pointing out the function responsible for this decision.
How would you extend the check though?

Alternatively, what if ovpn simply registers an empty rtnl_link_ops with 
netns_fund set to false? That should make the condition happy, while 
keeping ovpn genl-only


Thanks a lot


-- 
Antonio Quartulli
OpenVPN Inc.