[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d6cdfda8-73f2-47a7-9287-4869dc1b5726@virtuozzo.com>
Date: Fri, 16 Mar 2018 17:36:06 +0300
From: Kirill Tkhai <ktkhai@...tuozzo.com>
To: Sowmini Varadhan <sowmini.varadhan@...cle.com>
Cc: santosh.shilimkar@...cle.com, davem@...emloft.net,
netdev@...r.kernel.org, linux-rdma@...r.kernel.org,
rds-devel@....oracle.com, edumazet@...gle.com
Subject: Re: [PATCH RFC RFC] rds: Use NETDEV_UNREGISTER in rds_tcp_dev_event()
(then kill NETDEV_UNREGISTER_FINAL)
On 16.03.2018 16:53, Sowmini Varadhan wrote:
>
> Found my previous question:
>
> https://www.mail-archive.com/netdev@vger.kernel.org/msg72330.html
>
> (see section about "Comments are specifically ivinted.."
I see, thanks.
>> This is not a problem, and rds-tcp is not the only pernet_subsys registering
>> a socket. It's OK to close it from .exit method. There are many examples,
>> let me point you to icmp_sk_ops as one of them. But it's not the only.
>
> I'm not averse to changing this to NETDEV_UNREGISTER
> as long as it works for the 2 test cases below- you
> can test it by using rds-ping from rds-tools rpm, to
> be used from/to init_net, from/to the netns against
> some external machine (i.e something not on the same
> physical host)
>
>>> For rds-tcp, we need to be able to do the right thing in both of these
>>> cases
>>> 1. modprobe -r rds-tcp (cleanup of rds-tcp state should happen in
>>> every namespace, including init_net)
>>> 2. netns delete (rds_tcp.ko should remain loaded for other namespaces)
>>
>> The same as above, every pernet_subsys does this. It's not a problem.
>> exit and exit_batch methods are called in both of the cases.
>>
>> Please, see __unregister_pernet_operations()->ops_exit_list for the details.
>
> I am familiar with ops_exit_list, but this is the sequence:
> - when the module is loaded (or netns is started) it starts a
> kernel listen socket on *.16385
> - when you start the rds-pings above, it will create kernel
> tcp connections from/to the 16385 in the netns. And it will
> start socket keepalives for those connections. Each tcp
> connection is associated with a rds_connection
>
> As I recall, when I wrote the initial patchset, my problem
> was that in order to let the module unload make progress,
> all these sockets had to be cleaned up. But to clean up these
> sockets, net_device cleanup had to complete (should not have
> any new incoming connections to the listen endpoint on a
> non-loopback socket) so I ended up with a circular dependancy.
Ah, I see the reasons. Please, see my proposition at the end of this letter.
>> If we replace NETDEV_UNREGISTER_FINAL with NETDEV_UNREGISTER, the only change
>> which happens is we call rds_tcp_kill_sock() earlier. So, it may be a reason
>> of problems only if someone changes the list during the time between
>> NETDEV_UNREGISTER and NETDEV_UNREGISTER_FINAL are called for loopback.
>> But since this time noone related to this net can extend the list,
>> there is no a problem to do that.
>
> Please share your patch, I can review it and maybe help to test
> it..
>
> As I was trying to say in my RFC, I am quite open to ways to make
> this cleanup more obvious
How about something like this? Compile tested only.
[PATCH]rds: Use pernet device to kill RDS sockets
We register a new pernet device and use the fact,
that loopback device is last unregistered device.
So, on exit path, the new exit method will be called
before loopback_dev destruction.
---
diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index eb04e7fa2467..ec37868bf2dd 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -493,28 +493,11 @@ static void __net_exit rds_tcp_exit_net(struct net *net)
if (net != &init_net && rtn->ctl_table)
kfree(rtn->ctl_table);
-
- /* If rds_tcp_exit_net() is called as a result of netns deletion,
- * the rds_tcp_kill_sock() device notifier would already have cleaned
- * up the listen socket, thus there is no work to do in this function.
- *
- * If rds_tcp_exit_net() is called as a result of module unload,
- * i.e., due to rds_tcp_exit() -> unregister_pernet_subsys(), then
- * we do need to clean up the listen socket here.
- */
- if (rtn->rds_tcp_listen_sock) {
- struct socket *lsock = rtn->rds_tcp_listen_sock;
-
- rtn->rds_tcp_listen_sock = NULL;
- rds_tcp_listen_stop(lsock, &rtn->rds_tcp_accept_w);
- }
}
static struct pernet_operations rds_tcp_net_ops = {
.init = rds_tcp_init_net,
.exit = rds_tcp_exit_net,
- .id = &rds_tcp_netid,
- .size = sizeof(struct rds_tcp_net),
.async = true,
};
@@ -545,40 +528,38 @@ static void rds_tcp_kill_sock(struct net *net)
rds_conn_destroy(tc->t_cpath->cp_conn);
}
-void *rds_tcp_listen_sock_def_readable(struct net *net)
+static __net_init int rds_tcp_init_dev(struct net *net)
{
struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid);
- struct socket *lsock = rtn->rds_tcp_listen_sock;
- if (!lsock)
- return NULL;
+ rtn->rds_tcp_listen_sock = NULL;
+ return 0;
+}
- return lsock->sk->sk_user_data;
+static void __net_exit rds_tcp_exit_dev(struct net *net)
+{
+ rds_tcp_kill_sock(net);
}
-static int rds_tcp_dev_event(struct notifier_block *this,
- unsigned long event, void *ptr)
+static struct pernet_operations rds_tcp_dev_ops = {
+ .init = rds_tcp_init_dev,
+ .exit = rds_tcp_exit_dev,
+ .id = &rds_tcp_netid,
+ .size = sizeof(struct rds_tcp_net),
+ .async = true,
+};
+
+void *rds_tcp_listen_sock_def_readable(struct net *net)
{
- struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+ struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid);
+ struct socket *lsock = rtn->rds_tcp_listen_sock;
- /* rds-tcp registers as a pernet subys, so the ->exit will only
- * get invoked after network acitivity has quiesced. We need to
- * clean up all sockets to quiesce network activity, and use
- * the unregistration of the per-net loopback device as a trigger
- * to start that cleanup.
- */
- if (event == NETDEV_UNREGISTER_FINAL &&
- dev->ifindex == LOOPBACK_IFINDEX)
- rds_tcp_kill_sock(dev_net(dev));
+ if (!lsock)
+ return NULL;
- return NOTIFY_DONE;
+ return lsock->sk->sk_user_data;
}
-static struct notifier_block rds_tcp_dev_notifier = {
- .notifier_call = rds_tcp_dev_event,
- .priority = -10, /* must be called after other network notifiers */
-};
-
/* when sysctl is used to modify some kernel socket parameters,this
* function resets the RDS connections in that netns so that we can
* restart with new parameters. The assumption is that such reset
@@ -624,9 +605,8 @@ static void rds_tcp_exit(void)
rds_tcp_set_unloading();
synchronize_rcu();
rds_info_deregister_func(RDS_INFO_TCP_SOCKETS, rds_tcp_tc_info);
+ unregister_pernet_device(&rds_tcp_dev_ops);
unregister_pernet_subsys(&rds_tcp_net_ops);
- if (unregister_netdevice_notifier(&rds_tcp_dev_notifier))
- pr_warn("could not unregister rds_tcp_dev_notifier\n");
rds_tcp_destroy_conns();
rds_trans_unregister(&rds_tcp_transport);
rds_tcp_recv_exit();
@@ -650,15 +630,13 @@ static int rds_tcp_init(void)
if (ret)
goto out_slab;
- ret = register_pernet_subsys(&rds_tcp_net_ops);
+ ret = register_pernet_device(&rds_tcp_dev_ops);
if (ret)
goto out_recv;
- ret = register_netdevice_notifier(&rds_tcp_dev_notifier);
- if (ret) {
- pr_warn("could not register rds_tcp_dev_notifier\n");
+ ret = register_pernet_subsys(&rds_tcp_net_ops);
+ if (ret)
goto out_pernet;
- }
rds_trans_register(&rds_tcp_transport);
@@ -667,7 +645,7 @@ static int rds_tcp_init(void)
goto out;
out_pernet:
- unregister_pernet_subsys(&rds_tcp_net_ops);
+ unregister_pernet_device(&rds_tcp_dev_ops);
out_recv:
rds_tcp_recv_exit();
out_slab:
Powered by blists - more mailing lists