[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a2e2533f-701b-8faf-5d9d-c107f1e70387@virtuozzo.com>
Date: Fri, 16 Mar 2018 17:41:27 +0300
From: Kirill Tkhai <ktkhai@...tuozzo.com>
To: Sowmini Varadhan <sowmini.varadhan@...cle.com>
Cc: santosh.shilimkar@...cle.com, davem@...emloft.net,
netdev@...r.kernel.org, linux-rdma@...r.kernel.org,
rds-devel@....oracle.com, edumazet@...gle.com
Subject: Re: [PATCH RFC RFC] rds: Use NETDEV_UNREGISTER in rds_tcp_dev_event()
(then kill NETDEV_UNREGISTER_FINAL)
On 16.03.2018 17:36, Kirill Tkhai wrote:
> On 16.03.2018 16:53, Sowmini Varadhan wrote:
>>
>> Found my previous question:
>>
>> https://www.mail-archive.com/netdev@vger.kernel.org/msg72330.html
>>
>> (see section about "Comments are specifically ivinted.."
>
> I see, thanks.
>
>>> This is not a problem, and rds-tcp is not the only pernet_subsys registering
>>> a socket. It's OK to close it from .exit method. There are many examples,
>>> let me point you to icmp_sk_ops as one of them. But it's not the only.
>>
>> I'm not averse to changing this to NETDEV_UNREGISTER
>> as long as it works for the 2 test cases below- you
>> can test it by using rds-ping from rds-tools rpm, to
>> be used from/to init_net, from/to the netns against
>> some external machine (i.e something not on the same
>> physical host)
>>
>>>> For rds-tcp, we need to be able to do the right thing in both of these
>>>> cases
>>>> 1. modprobe -r rds-tcp (cleanup of rds-tcp state should happen in
>>>> every namespace, including init_net)
>>>> 2. netns delete (rds_tcp.ko should remain loaded for other namespaces)
>>>
>>> The same as above, every pernet_subsys does this. It's not a problem.
>>> exit and exit_batch methods are called in both of the cases.
>>>
>>> Please, see __unregister_pernet_operations()->ops_exit_list for the details.
>>
>> I am familiar with ops_exit_list, but this is the sequence:
>> - when the module is loaded (or netns is started) it starts a
>> kernel listen socket on *.16385
>> - when you start the rds-pings above, it will create kernel
>> tcp connections from/to the 16385 in the netns. And it will
>> start socket keepalives for those connections. Each tcp
>> connection is associated with a rds_connection
>>
>> As I recall, when I wrote the initial patchset, my problem
>> was that in order to let the module unload make progress,
>> all these sockets had to be cleaned up. But to clean up these
>> sockets, net_device cleanup had to complete (should not have
>> any new incoming connections to the listen endpoint on a
>> non-loopback socket) so I ended up with a circular dependancy.
>
> Ah, I see the reasons. Please, see my proposition at the end of this letter.
>
>>> If we replace NETDEV_UNREGISTER_FINAL with NETDEV_UNREGISTER, the only change
>>> which happens is we call rds_tcp_kill_sock() earlier. So, it may be a reason
>>> of problems only if someone changes the list during the time between
>>> NETDEV_UNREGISTER and NETDEV_UNREGISTER_FINAL are called for loopback.
>>> But since this time noone related to this net can extend the list,
>>> there is no a problem to do that.
>>
>> Please share your patch, I can review it and maybe help to test
>> it..
>>
>> As I was trying to say in my RFC, I am quite open to ways to make
>> this cleanup more obvious
>
> How about something like this? Compile tested only.
>
> [PATCH]rds: Use pernet device to kill RDS sockets
>
> We register a new pernet device and use the fact,
> that loopback device is last unregistered device.
> So, on exit path, the new exit method will be called
> before loopback_dev destruction.
$git diff --patience gives better diff
[PATCH]rds: Use pernet device to kill RDS sockets
We register a new pernet device and use the fact,
that loopback device is last unregistered device.
So, on exit path, the new exit method will be called
before loopback_dev destruction.
Signed-off-by: Kirill Tkhai <ktkhai@...tuozzo.com>
---
diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index eb04e7fa2467..ec37868bf2dd 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -493,28 +493,11 @@ static void __net_exit rds_tcp_exit_net(struct net *net)
if (net != &init_net && rtn->ctl_table)
kfree(rtn->ctl_table);
-
- /* If rds_tcp_exit_net() is called as a result of netns deletion,
- * the rds_tcp_kill_sock() device notifier would already have cleaned
- * up the listen socket, thus there is no work to do in this function.
- *
- * If rds_tcp_exit_net() is called as a result of module unload,
- * i.e., due to rds_tcp_exit() -> unregister_pernet_subsys(), then
- * we do need to clean up the listen socket here.
- */
- if (rtn->rds_tcp_listen_sock) {
- struct socket *lsock = rtn->rds_tcp_listen_sock;
-
- rtn->rds_tcp_listen_sock = NULL;
- rds_tcp_listen_stop(lsock, &rtn->rds_tcp_accept_w);
- }
}
static struct pernet_operations rds_tcp_net_ops = {
.init = rds_tcp_init_net,
.exit = rds_tcp_exit_net,
- .id = &rds_tcp_netid,
- .size = sizeof(struct rds_tcp_net),
.async = true,
};
@@ -545,6 +528,27 @@ static void rds_tcp_kill_sock(struct net *net)
rds_conn_destroy(tc->t_cpath->cp_conn);
}
+static __net_init int rds_tcp_init_dev(struct net *net)
+{
+ struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid);
+
+ rtn->rds_tcp_listen_sock = NULL;
+ return 0;
+}
+
+static void __net_exit rds_tcp_exit_dev(struct net *net)
+{
+ rds_tcp_kill_sock(net);
+}
+
+static struct pernet_operations rds_tcp_dev_ops = {
+ .init = rds_tcp_init_dev,
+ .exit = rds_tcp_exit_dev,
+ .id = &rds_tcp_netid,
+ .size = sizeof(struct rds_tcp_net),
+ .async = true,
+};
+
void *rds_tcp_listen_sock_def_readable(struct net *net)
{
struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid);
@@ -556,29 +560,6 @@ void *rds_tcp_listen_sock_def_readable(struct net *net)
return lsock->sk->sk_user_data;
}
-static int rds_tcp_dev_event(struct notifier_block *this,
- unsigned long event, void *ptr)
-{
- struct net_device *dev = netdev_notifier_info_to_dev(ptr);
-
- /* rds-tcp registers as a pernet subys, so the ->exit will only
- * get invoked after network acitivity has quiesced. We need to
- * clean up all sockets to quiesce network activity, and use
- * the unregistration of the per-net loopback device as a trigger
- * to start that cleanup.
- */
- if (event == NETDEV_UNREGISTER_FINAL &&
- dev->ifindex == LOOPBACK_IFINDEX)
- rds_tcp_kill_sock(dev_net(dev));
-
- return NOTIFY_DONE;
-}
-
-static struct notifier_block rds_tcp_dev_notifier = {
- .notifier_call = rds_tcp_dev_event,
- .priority = -10, /* must be called after other network notifiers */
-};
-
/* when sysctl is used to modify some kernel socket parameters,this
* function resets the RDS connections in that netns so that we can
* restart with new parameters. The assumption is that such reset
@@ -624,9 +605,8 @@ static void rds_tcp_exit(void)
rds_tcp_set_unloading();
synchronize_rcu();
rds_info_deregister_func(RDS_INFO_TCP_SOCKETS, rds_tcp_tc_info);
+ unregister_pernet_device(&rds_tcp_dev_ops);
unregister_pernet_subsys(&rds_tcp_net_ops);
- if (unregister_netdevice_notifier(&rds_tcp_dev_notifier))
- pr_warn("could not unregister rds_tcp_dev_notifier\n");
rds_tcp_destroy_conns();
rds_trans_unregister(&rds_tcp_transport);
rds_tcp_recv_exit();
@@ -650,15 +630,13 @@ static int rds_tcp_init(void)
if (ret)
goto out_slab;
- ret = register_pernet_subsys(&rds_tcp_net_ops);
+ ret = register_pernet_device(&rds_tcp_dev_ops);
if (ret)
goto out_recv;
- ret = register_netdevice_notifier(&rds_tcp_dev_notifier);
- if (ret) {
- pr_warn("could not register rds_tcp_dev_notifier\n");
+ ret = register_pernet_subsys(&rds_tcp_net_ops);
+ if (ret)
goto out_pernet;
- }
rds_trans_register(&rds_tcp_transport);
@@ -667,7 +645,7 @@ static int rds_tcp_init(void)
goto out;
out_pernet:
- unregister_pernet_subsys(&rds_tcp_net_ops);
+ unregister_pernet_device(&rds_tcp_dev_ops);
out_recv:
rds_tcp_recv_exit();
out_slab:
Powered by blists - more mailing lists