[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b537eb9d-ddab-4b39-bcea-2b8781507a8c@openvpn.net>
Date: Thu, 9 May 2024 15:25:21 +0200
From: Antonio Quartulli <antonio@...nvpn.net>
To: Sabrina Dubroca <sd@...asysnail.net>
Cc: netdev@...r.kernel.org, Jakub Kicinski <kuba@...nel.org>,
Sergey Ryazanov <ryazanov.s.a@...il.com>, Paolo Abeni <pabeni@...hat.com>,
Eric Dumazet <edumazet@...gle.com>, Andrew Lunn <andrew@...n.ch>,
Esben Haabendal <esben@...nix.com>
Subject: Re: [PATCH net-next v3 04/24] ovpn: add basic interface
creation/destruction/management routines
By the way, thank you very much for taking the time to have this
constructive discussion. I really appreciate it!
On 09/05/2024 14:16, Sabrina Dubroca wrote:
> 2024-05-09, 12:35:32 +0200, Antonio Quartulli wrote:
>> On 09/05/2024 12:09, Sabrina Dubroca wrote:
>>> 2024-05-09, 10:25:44 +0200, Antonio Quartulli wrote:
>>>> On 08/05/2024 16:52, Sabrina Dubroca wrote:
>>>>> 2024-05-06, 03:16:17 +0200, Antonio Quartulli wrote:
>>>>>> static int ovpn_netdev_notifier_call(struct notifier_block *nb,
>>>>>> unsigned long state, void *ptr)
>>>>>> {
>>>>>> struct net_device *dev = netdev_notifier_info_to_dev(ptr);
>>>>>> + struct ovpn_struct *ovpn;
>>>>>> if (!ovpn_dev_is_valid(dev))
>>>>>> return NOTIFY_DONE;
>>>>>> + ovpn = netdev_priv(dev);
>>>>>> +
>>>>>> switch (state) {
>>>>>> case NETDEV_REGISTER:
>>>>>> - /* add device to internal list for later destruction upon
>>>>>> - * unregistration
>>>>>> - */
>>>>>> + ovpn->registered = true;
>>>>>> break;
>>>>>> case NETDEV_UNREGISTER:
>>>>>> + /* twiddle thumbs on netns device moves */
>>>>>> + if (dev->reg_state != NETREG_UNREGISTERING)
>>>>>> + break;
>>>>>> +
>>>>>> /* can be delivered multiple times, so check registered flag,
>>>>>> * then destroy the interface
>>>>>> */
>>>>>> + if (!ovpn->registered)
>>>>>> + return NOTIFY_DONE;
>>>>>> +
>>>>>> + ovpn_iface_destruct(ovpn);
>>>>>
>>>>> Maybe I'm misunderstanding this code. Why do you want to manually
>>>>> destroy a device that is already going away?
>>>>
>>>> We need to perform some internal cleanup (i.e. release all peers).
>>>> I don't see how this can happen automatically, no?
>>>
>>> That's what ->priv_destructor does,
>>
>> Not really.
>>
>> Every peer object increases the netdev refcounter to the netdev, therefore
>> we must first delete all peers in order to have netdevice->refcnt reach 0
>> (and then invoke priv_destructor).
>
> Oh, I see. I'm still trying to wrap my head around all the objects and
> components of your driver.
>
>> So the idea is: upon UNREGISTER event we drop all resources and eventually
>> (via RCU) all references to the netdev are also released, which in turn
>> triggers the destructor.
>>
>> makes sense?
>
> That part, yes, thanks for explaining. Do you really need the peers to
> hold a reference on the netdevice? With my limited understanding, it
> seems the peers are sub-objects of the netdevice.
>
>>> and it will be called ultimately
>>> by the unregister_netdevice call you have in ovpn_iface_destruct (in
>>> netdev_run_todo). Anyway, this UNREGISTER event is probably generated
>>> by unregister_netdevice_many_notify (basically a previous
>>> unregister_netdevice() call), so I don't know why you want to call
>>> unregister_netdevice again on the same device.
>>
>> I believe I have seen this notification being triggered upon netns exit, but
>> in that case the netdevice was not being removed from core.
>
> Sure, but you have a comment about that and you're filtering that
> event, so I'm ignoring this case.
You're right..now I wonder if my observation was made before I
introduced that check...
>
>> Hence I decided to fully trigger the unregistration.
>
> That's the bit that doesn't make sense to me: the device is going
> away, so you trigger a manual unregister. Cleaning up some additional
> resources (peers etc), that makes sense. But calling
> unregister_netdevice (when you're most likely getting called from
> unregister_netdevice already, because I don't see other spots setting
> dev->reg_state = NETREG_UNREGISTERING) is what I don't get. And I
> wonder why you're not hitting the BUG_ON in
> unregister_netdevice_many_notify:
>
> BUG_ON(dev->reg_state != NETREG_REGISTERED);
I think because we have our ovpn->registered check.
It ensures that we don't call ovpn_iface_destruct more than once.
But now, that I implemented the rtnl_link_ops I can confirm I am hitting
the BUG_ON. And now it makes sense.
I presume that now I can I simply remove the call to
unregister_netdevice() from ovpn_iface_destruct() and move it to
ovpn_nl_del_iface_doit().
This way, upon netns exit, the real UNREGISTER handler (triggered thanks
to rtnl_link_ops) will still perform the destruct, but won't try to
schedule an UNREGISTER event again.
>
>
>>>>>> @@ -62,6 +210,24 @@ static struct notifier_block ovpn_netdev_notifier = {
>>>>>> .notifier_call = ovpn_netdev_notifier_call,
>>>>>> };
>>>>>> +static void ovpn_netns_pre_exit(struct net *net)
>
> BTW, in case you end up keeping this function, it should have
> __net_exit annotation (see for example ipv4_frags_exit_net).
ACK, but thanks to the rtnl_link_ops trick we are definitely ditching it.
>
>>>>>> +{
>>>>>> + struct ovpn_struct *ovpn;
>>>>>> +
>>>>>> + rtnl_lock();
>>>>>> + list_for_each_entry(ovpn, &dev_list, dev_list) {
>>>>>> + if (dev_net(ovpn->dev) != net)
>>>>>> + continue;
>>>>>> +
>>>>>> + ovpn_iface_destruct(ovpn);
>>>>>
>>>>> Is this needed? On netns destruction all devices within the ns will be
>>>>> destroyed by the networking core.
>>>>
>>>> Before implementing ovpn_netns_pre_exit() this way, upon namespace deletion
>>>> the ovpn interface was being moved to the global namespace.
>>>
>>> Crap it's only the devices with ->rtnl_link_ops that get killed by the
>>> core.
>>
>> exactly! this goes hand to hand with my comment above: event delivered but
>> interface not destroyed.
>
> There's no event sent to ovpn_netdev_notifier_call in that case (well,
> only the fake "unregister" out of the current netns that you're
> ignoring). Otherwise, you wouldn't need ovpn_netns_pre_exit.
Yeah you're right. I think I wanted to conclude the same thing but my
brain was unable to produce a meaningful sentence.
>
>>> Because you create your devices via genl (which I'm not a fan
>>> of, even if it's a bit nicer for userspace having a single netlink api
>>> to deal with),
>>
>> Originally I had implemented the rtnl_link_ops, but the (meaningful)
>> objection was that a user is never supposed to create an ovpn iface by
>> himself, but there should always be an openvpn process running in userspace.
>> Hence the restriction to genl only.
>
> Sorry, but how does genl prevent a user from creating the ovpn
> interface manually? Whatever API you define, anyone who manages to
> come up with the right netlink message will be able to create an
> interface. You can't stop people from using your API without your
> official client.
I don't want to prevent people from creating ovpn ifaces the way they like.
I just don't see how the rtnl_link API can be useful, other than
allowing users to execute 'ip link add/del..'.
And by design that is not a usecase we want to support, because once the
iface is created, nothing will happen if there is no userspace software
driving it (no matter if it is openvpn or anything else).
When explaining this decision, I like to make a comparison to virtual
802.11/wifi ifaces.
They also lack rtnl_link (AFAIR) as they also require some userspace
software to handle them in order to be useful.
All this said, having everything in one place looks cleaner too :)
>
>>> default_device_exit_batch/default_device_exit_net think
>>> ovpn devices are real NICs and move them back to init_net instead of
>>> destroying them.
>>>
>>> Maybe we can extend the condition in default_device_exit_net with a
>>> new flag so that ovpn devices get destroyed by the core, even without
>>> rtnl_link_ops?
>>
>> Thanks for pointing out the function responsible for this decision.
>> How would you extend the check though?
>>
>> Alternatively, what if ovpn simply registers an empty rtnl_link_ops with
>> netns_fund set to false? That should make the condition happy, while keeping
>> ovpn genl-only
>
> Yes. I was thinking about adding a flag to the device, because I
> wasn't sure an almost empty rtnl_link_ops could be handled safely, but
> it seems ok. ovs does it, see commit 5b9e7e160795 ("openvswitch:
> introduce rtnl ops stub"). And, as that commit message says, "ip -d
> link show" would also show that the device is of type openvpn (or
> ovpn, whatever you put in ops->kind), which would be nice.
I just coded something along those lines.
It seems pretty clean and we don't need to touch core (+ the bonus of
having the name in "ip -d link")....and the iface does get destroyed
upon netns exit! :-)
I am grasping much better how all these APIs work together now.
Thanks!
--
Antonio Quartulli
OpenVPN Inc.
Powered by blists - more mailing lists