lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d6cdfda8-73f2-47a7-9287-4869dc1b5726@virtuozzo.com>
Date:   Fri, 16 Mar 2018 17:36:06 +0300
From:   Kirill Tkhai <ktkhai@...tuozzo.com>
To:     Sowmini Varadhan <sowmini.varadhan@...cle.com>
Cc:     santosh.shilimkar@...cle.com, davem@...emloft.net,
        netdev@...r.kernel.org, linux-rdma@...r.kernel.org,
        rds-devel@....oracle.com, edumazet@...gle.com
Subject: Re: [PATCH RFC RFC] rds: Use NETDEV_UNREGISTER in rds_tcp_dev_event()
 (then kill NETDEV_UNREGISTER_FINAL)

On 16.03.2018 16:53, Sowmini Varadhan wrote:
> 
> Found my previous question:
> 
> https://www.mail-archive.com/netdev@vger.kernel.org/msg72330.html
> 
> (see section about "Comments are specifically ivinted.."

I see, thanks.

>> This is not a problem, and rds-tcp is not the only pernet_subsys registering
>> a socket. It's OK to close it from .exit method. There are many examples,
>> let me point you to icmp_sk_ops as one of them. But it's not the only.
> 
> I'm not averse to changing this to NETDEV_UNREGISTER
> as long as it works for the 2 test cases below- you 
> can test it by using rds-ping from rds-tools rpm, to
> be used from/to init_net, from/to the netns  against
> some external machine (i.e something not on the same
> physical host)
> 
>>> For rds-tcp, we need to be able to do the right thing in both of these
>>> cases
>>> 1. modprobe -r rds-tcp (cleanup of rds-tcp state should happen in
>>>    every namespace, including init_net)
>>> 2. netns delete (rds_tcp.ko should remain loaded for other namespaces)
>>
>> The same as above, every pernet_subsys does this. It's not a problem.
>> exit and exit_batch methods are called in both of the cases.
>>
>> Please, see __unregister_pernet_operations()->ops_exit_list for the details.
> 
> I am familiar with ops_exit_list, but this is the sequence:
> - when the module is loaded (or netns is started) it starts a 
>   kernel listen socket on *.16385
> - when you start the rds-pings above, it will create kernel
>   tcp connections from/to the 16385 in the netns. And it will
>   start socket keepalives for those connections. Each tcp 
>   connection is associated with a rds_connection
> 
> As I recall, when I wrote the initial patchset, my problem
> was that in order to let the module unload make progress,
> all these sockets had to be cleaned up. But to clean up these
> sockets, net_device cleanup had to complete (should not have
> any new incoming connections to the listen endpoint on a 
> non-loopback socket) so I ended up with a circular dependancy.

Ah, I see the reasons. Please, see my proposition at the end of this letter.
 
>> If we replace NETDEV_UNREGISTER_FINAL with NETDEV_UNREGISTER, the only change
>> which happens is we call rds_tcp_kill_sock() earlier. So, it may be a reason
>> of problems only if someone changes the list during the time between
>> NETDEV_UNREGISTER and NETDEV_UNREGISTER_FINAL are called for loopback.
>> But since this time noone related to this net can extend the list,
>> there is no a problem to do that.
> 
> Please share your patch, I can review it and maybe help to test
> it..
> 
> As I was trying to say in my RFC, I am quite open to ways to make
> this cleanup more obvious

How about something like this? Compile tested only.

[PATCH]rds: Use pernet device to kill RDS sockets

We register a new pernet device and use the fact,
that loopback device is last unregistered device.
So, on exit path, the new exit method will be called
before loopback_dev destruction.
    
---
diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index eb04e7fa2467..ec37868bf2dd 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -493,28 +493,11 @@ static void __net_exit rds_tcp_exit_net(struct net *net)
 
 	if (net != &init_net && rtn->ctl_table)
 		kfree(rtn->ctl_table);
-
-	/* If rds_tcp_exit_net() is called as a result of netns deletion,
-	 * the rds_tcp_kill_sock() device notifier would already have cleaned
-	 * up the listen socket, thus there is no work to do in this function.
-	 *
-	 * If rds_tcp_exit_net() is called as a result of module unload,
-	 * i.e., due to rds_tcp_exit() -> unregister_pernet_subsys(), then
-	 * we do need to clean up the listen socket here.
-	 */
-	if (rtn->rds_tcp_listen_sock) {
-		struct socket *lsock = rtn->rds_tcp_listen_sock;
-
-		rtn->rds_tcp_listen_sock = NULL;
-		rds_tcp_listen_stop(lsock, &rtn->rds_tcp_accept_w);
-	}
 }
 
 static struct pernet_operations rds_tcp_net_ops = {
 	.init = rds_tcp_init_net,
 	.exit = rds_tcp_exit_net,
-	.id = &rds_tcp_netid,
-	.size = sizeof(struct rds_tcp_net),
 	.async = true,
 };
 
@@ -545,40 +528,38 @@ static void rds_tcp_kill_sock(struct net *net)
 		rds_conn_destroy(tc->t_cpath->cp_conn);
 }
 
-void *rds_tcp_listen_sock_def_readable(struct net *net)
+static __net_init int rds_tcp_init_dev(struct net *net)
 {
 	struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid);
-	struct socket *lsock = rtn->rds_tcp_listen_sock;
 
-	if (!lsock)
-		return NULL;
+	rtn->rds_tcp_listen_sock = NULL;
+	return 0;
+}
 
-	return lsock->sk->sk_user_data;
+static void __net_exit rds_tcp_exit_dev(struct net *net)
+{
+	rds_tcp_kill_sock(net);
 }
 
-static int rds_tcp_dev_event(struct notifier_block *this,
-			     unsigned long event, void *ptr)
+static struct pernet_operations rds_tcp_dev_ops = {
+	.init = rds_tcp_init_dev,
+	.exit = rds_tcp_exit_dev,
+	.id = &rds_tcp_netid,
+	.size = sizeof(struct rds_tcp_net),
+	.async = true,
+};
+
+void *rds_tcp_listen_sock_def_readable(struct net *net)
 {
-	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+	struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid);
+	struct socket *lsock = rtn->rds_tcp_listen_sock;
 
-	/* rds-tcp registers as a pernet subys, so the ->exit will only
-	 * get invoked after network acitivity has quiesced. We need to
-	 * clean up all sockets  to quiesce network activity, and use
-	 * the unregistration of the per-net loopback device as a trigger
-	 * to start that cleanup.
-	 */
-	if (event == NETDEV_UNREGISTER_FINAL &&
-	    dev->ifindex == LOOPBACK_IFINDEX)
-		rds_tcp_kill_sock(dev_net(dev));
+	if (!lsock)
+		return NULL;
 
-	return NOTIFY_DONE;
+	return lsock->sk->sk_user_data;
 }
 
-static struct notifier_block rds_tcp_dev_notifier = {
-	.notifier_call        = rds_tcp_dev_event,
-	.priority = -10, /* must be called after other network notifiers */
-};
-
 /* when sysctl is used to modify some kernel socket parameters,this
  * function  resets the RDS connections in that netns  so that we can
  * restart with new parameters.  The assumption is that such reset
@@ -624,9 +605,8 @@ static void rds_tcp_exit(void)
 	rds_tcp_set_unloading();
 	synchronize_rcu();
 	rds_info_deregister_func(RDS_INFO_TCP_SOCKETS, rds_tcp_tc_info);
+	unregister_pernet_device(&rds_tcp_dev_ops);
 	unregister_pernet_subsys(&rds_tcp_net_ops);
-	if (unregister_netdevice_notifier(&rds_tcp_dev_notifier))
-		pr_warn("could not unregister rds_tcp_dev_notifier\n");
 	rds_tcp_destroy_conns();
 	rds_trans_unregister(&rds_tcp_transport);
 	rds_tcp_recv_exit();
@@ -650,15 +630,13 @@ static int rds_tcp_init(void)
 	if (ret)
 		goto out_slab;
 
-	ret = register_pernet_subsys(&rds_tcp_net_ops);
+	ret = register_pernet_device(&rds_tcp_dev_ops);
 	if (ret)
 		goto out_recv;
 
-	ret = register_netdevice_notifier(&rds_tcp_dev_notifier);
-	if (ret) {
-		pr_warn("could not register rds_tcp_dev_notifier\n");
+	ret = register_pernet_subsys(&rds_tcp_net_ops);
+	if (ret)
 		goto out_pernet;
-	}
 
 	rds_trans_register(&rds_tcp_transport);
 
@@ -667,7 +645,7 @@ static int rds_tcp_init(void)
 	goto out;
 
 out_pernet:
-	unregister_pernet_subsys(&rds_tcp_net_ops);
+	unregister_pernet_device(&rds_tcp_dev_ops);
 out_recv:
 	rds_tcp_recv_exit();
 out_slab:

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ