netdev - Re: [PATCH net-next v3 04/24] ovpn: add basic interface creation/destruction/management routines

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b537eb9d-ddab-4b39-bcea-2b8781507a8c@openvpn.net>
Date: Thu, 9 May 2024 15:25:21 +0200
From: Antonio Quartulli <antonio@...nvpn.net>
To: Sabrina Dubroca <sd@...asysnail.net>
Cc: netdev@...r.kernel.org, Jakub Kicinski <kuba@...nel.org>,
 Sergey Ryazanov <ryazanov.s.a@...il.com>, Paolo Abeni <pabeni@...hat.com>,
 Eric Dumazet <edumazet@...gle.com>, Andrew Lunn <andrew@...n.ch>,
 Esben Haabendal <esben@...nix.com>
Subject: Re: [PATCH net-next v3 04/24] ovpn: add basic interface
 creation/destruction/management routines

By the way, thank you very much for taking the time to have this 
constructive discussion. I really appreciate it!

On 09/05/2024 14:16, Sabrina Dubroca wrote:
> 2024-05-09, 12:35:32 +0200, Antonio Quartulli wrote:
>> On 09/05/2024 12:09, Sabrina Dubroca wrote:
>>> 2024-05-09, 10:25:44 +0200, Antonio Quartulli wrote:
>>>> On 08/05/2024 16:52, Sabrina Dubroca wrote:
>>>>> 2024-05-06, 03:16:17 +0200, Antonio Quartulli wrote:
>>>>>>     static int ovpn_netdev_notifier_call(struct notifier_block *nb,
>>>>>>     				     unsigned long state, void *ptr)
>>>>>>     {
>>>>>>     	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
>>>>>> +	struct ovpn_struct *ovpn;
>>>>>>     	if (!ovpn_dev_is_valid(dev))
>>>>>>     		return NOTIFY_DONE;
>>>>>> +	ovpn = netdev_priv(dev);
>>>>>> +
>>>>>>     	switch (state) {
>>>>>>     	case NETDEV_REGISTER:
>>>>>> -		/* add device to internal list for later destruction upon
>>>>>> -		 * unregistration
>>>>>> -		 */
>>>>>> +		ovpn->registered = true;
>>>>>>     		break;
>>>>>>     	case NETDEV_UNREGISTER:
>>>>>> +		/* twiddle thumbs on netns device moves */
>>>>>> +		if (dev->reg_state != NETREG_UNREGISTERING)
>>>>>> +			break;
>>>>>> +
>>>>>>     		/* can be delivered multiple times, so check registered flag,
>>>>>>     		 * then destroy the interface
>>>>>>     		 */
>>>>>> +		if (!ovpn->registered)
>>>>>> +			return NOTIFY_DONE;
>>>>>> +
>>>>>> +		ovpn_iface_destruct(ovpn);
>>>>>
>>>>> Maybe I'm misunderstanding this code. Why do you want to manually
>>>>> destroy a device that is already going away?
>>>>
>>>> We need to perform some internal cleanup (i.e. release all peers).
>>>> I don't see how this can happen automatically, no?
>>>
>>> That's what ->priv_destructor does,
>>
>> Not really.
>>
>> Every peer object increases the netdev refcounter to the netdev, therefore
>> we must first delete all peers in order to have netdevice->refcnt reach 0
>> (and then invoke priv_destructor).
> 
> Oh, I see. I'm still trying to wrap my head around all the objects and
> components of your driver.
> 
>> So the idea is: upon UNREGISTER event we drop all resources and eventually
>> (via RCU) all references to the netdev are also released, which in turn
>> triggers the destructor.
>>
>> makes sense?
> 
> That part, yes, thanks for explaining. Do you really need the peers to
> hold a reference on the netdevice? With my limited understanding, it
> seems the peers are sub-objects of the netdevice.
> 
>>> and it will be called ultimately
>>> by the unregister_netdevice call you have in ovpn_iface_destruct (in
>>> netdev_run_todo). Anyway, this UNREGISTER event is probably generated
>>> by unregister_netdevice_many_notify (basically a previous
>>> unregister_netdevice() call), so I don't know why you want to call
>>> unregister_netdevice again on the same device.
>>
>> I believe I have seen this notification being triggered upon netns exit, but
>> in that case the netdevice was not being removed from core.
> 
> Sure, but you have a comment about that and you're filtering that
> event, so I'm ignoring this case.

You're right..now I wonder if my observation was made before I 
introduced that check...

> 
>> Hence I decided to fully trigger the unregistration.
> 
> That's the bit that doesn't make sense to me: the device is going
> away, so you trigger a manual unregister. Cleaning up some additional
> resources (peers etc), that makes sense. But calling
> unregister_netdevice (when you're most likely getting called from
> unregister_netdevice already, because I don't see other spots setting
> dev->reg_state = NETREG_UNREGISTERING) is what I don't get. And I
> wonder why you're not hitting the BUG_ON in
> unregister_netdevice_many_notify:
> 
>      BUG_ON(dev->reg_state != NETREG_REGISTERED);

I think because we have our ovpn->registered check.
It ensures that we don't call ovpn_iface_destruct more than once.

But now, that I implemented the rtnl_link_ops I can confirm I am hitting 
the BUG_ON. And now it makes sense.

I presume that now I can I simply remove the call to 
unregister_netdevice() from ovpn_iface_destruct() and move it to 
ovpn_nl_del_iface_doit().

This way, upon netns exit, the real UNREGISTER handler (triggered thanks 
to rtnl_link_ops) will still perform the destruct, but won't try to 
schedule an UNREGISTER event again.

> 
> 
>>>>>> @@ -62,6 +210,24 @@ static struct notifier_block ovpn_netdev_notifier = {
>>>>>>     	.notifier_call = ovpn_netdev_notifier_call,
>>>>>>     };
>>>>>> +static void ovpn_netns_pre_exit(struct net *net)
> 
> BTW, in case you end up keeping this function, it should have
> __net_exit annotation (see for example ipv4_frags_exit_net).

ACK, but thanks to the rtnl_link_ops trick we are definitely ditching it.
> 
>>>>>> +{
>>>>>> +	struct ovpn_struct *ovpn;
>>>>>> +
>>>>>> +	rtnl_lock();
>>>>>> +	list_for_each_entry(ovpn, &dev_list, dev_list) {
>>>>>> +		if (dev_net(ovpn->dev) != net)
>>>>>> +			continue;
>>>>>> +
>>>>>> +		ovpn_iface_destruct(ovpn);
>>>>>
>>>>> Is this needed? On netns destruction all devices within the ns will be
>>>>> destroyed by the networking core.
>>>>
>>>> Before implementing ovpn_netns_pre_exit() this way, upon namespace deletion
>>>> the ovpn interface was being moved to the global namespace.
>>>
>>> Crap it's only the devices with ->rtnl_link_ops that get killed by the
>>> core.
>>
>> exactly! this goes hand to hand with my comment above: event delivered but
>> interface not destroyed.
> 
> There's no event sent to ovpn_netdev_notifier_call in that case (well,
> only the fake "unregister" out of the current netns that you're
> ignoring). Otherwise, you wouldn't need ovpn_netns_pre_exit.

Yeah you're right. I think I wanted to conclude the same thing but my 
brain was unable to produce a meaningful sentence.

> 
>>> Because you create your devices via genl (which I'm not a fan
>>> of, even if it's a bit nicer for userspace having a single netlink api
>>> to deal with),
>>
>> Originally I had implemented the rtnl_link_ops, but the (meaningful)
>> objection was that a user is never supposed to create an ovpn iface by
>> himself, but there should always be an openvpn process running in userspace.
>> Hence the restriction to genl only.
> 
> Sorry, but how does genl prevent a user from creating the ovpn
> interface manually? Whatever API you define, anyone who manages to
> come up with the right netlink message will be able to create an
> interface. You can't stop people from using your API without your
> official client.

I don't want to prevent people from creating ovpn ifaces the way they like.
I just don't see how the rtnl_link API can be useful, other than 
allowing users to execute 'ip link add/del..'.
And by design that is not a usecase we want to support, because once the 
iface is created, nothing will happen if there is no userspace software 
driving it (no matter if it is openvpn or anything else).

When explaining this decision, I like to make a comparison to virtual 
802.11/wifi ifaces.
They also lack rtnl_link (AFAIR) as they also require some userspace 
software to handle them in order to be useful.

All this said, having everything in one place looks cleaner too :)

> 
>>> default_device_exit_batch/default_device_exit_net think
>>> ovpn devices are real NICs and move them back to init_net instead of
>>> destroying them.
>>>
>>> Maybe we can extend the condition in default_device_exit_net with a
>>> new flag so that ovpn devices get destroyed by the core, even without
>>> rtnl_link_ops?
>>
>> Thanks for pointing out the function responsible for this decision.
>> How would you extend the check though?
>>
>> Alternatively, what if ovpn simply registers an empty rtnl_link_ops with
>> netns_fund set to false? That should make the condition happy, while keeping
>> ovpn genl-only
> 
> Yes. I was thinking about adding a flag to the device, because I
> wasn't sure an almost empty rtnl_link_ops could be handled safely, but
> it seems ok. ovs does it, see commit 5b9e7e160795 ("openvswitch:
> introduce rtnl ops stub"). And, as that commit message says, "ip -d
> link show" would also show that the device is of type openvpn (or
> ovpn, whatever you put in ops->kind), which would be nice.

I just coded something along those lines.

It seems pretty clean and we don't need to touch core (+ the bonus of 
having the name in "ip -d link")....and the iface does get destroyed 
upon netns exit! :-)

I am grasping much better how all these APIs work together now.

Thanks!

-- 
Antonio Quartulli
OpenVPN Inc.