netdev - Re: [PATCH RFC RFC] rds: Use NETDEV_UNREGISTER in rds_tcp_dev_event() (then kill NETDEV_UNREGISTER

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <82d6e954-8043-078c-266c-2f1ac992f864@virtuozzo.com>
Date:   Fri, 16 Mar 2018 21:14:06 +0300
From:   Kirill Tkhai <ktkhai@...tuozzo.com>
To:     Sowmini Varadhan <sowmini.varadhan@...cle.com>
Cc:     santosh.shilimkar@...cle.com, davem@...emloft.net,
        netdev@...r.kernel.org, linux-rdma@...r.kernel.org,
        rds-devel@....oracle.com, edumazet@...gle.com
Subject: Re: [PATCH RFC RFC] rds: Use NETDEV_UNREGISTER in rds_tcp_dev_event()
 (then kill NETDEV_UNREGISTER_FINAL)

On 16.03.2018 20:29, Sowmini Varadhan wrote:
> 
> I had taken some of this offline, but it occurs to me
> that some of these notes should be saved to the netdev archives, 
> in case this question pops up again in a few years.
> 
> When I run your patch, I get a repeatable panic by doing
>   modprobe rds-tcp
>   ip netns create blue
> the panic is because we are finding a null trn in rds_tcp_init_net.

I did the second version and sent you. Have you tried it?

> I think there's something very disturbed about calling
> register_pernet_operations() twice, once via 
> register_pernet_device() and again via register_pernet_subsys().

Calling netdevice handler for every event is more disturbing,
as number of events is several times bigger, than one more
pernet exit method.

> I suspect this has everything to do with the panic but I have
> not had time to debug every little detail here.

You speak in the way I forced you to spend a lot of time debugging
my patches. But I sent you only one with a warn it's not tested.
(but twice -- since the second time I used --patience diff option
to make idea more visible for you). It's strange to hear this from
you.

> In general, rds_tcp is not a network device, it is a kernel
> module.  That is the fundamental problem here. 
> 
> To repeat the comments form net_namespace.h:
>  * Network interfaces need to be removed from a dying netns _before_
>  * subsys notifiers can be called, as most of the network code cleanup
>  * (which is done from subsys notifiers) runs with the assumption that
>  * dev_remove_pack has been called so no new packets will arrive during
>  * and after the cleanup functions have been called.  dev_remove_pack
>  * is not per namespace so instead the guarantee of no more packets
>  * arriving in a network namespace is provided by ensuring that all
>  * network devices and all sockets have left the network namespace
>  * before the cleanup methods are called.
> 
> when the "blue" netns starts up, it creates at least one kernel listen
> socket on *.16385. This socket, and any other child/client sockets 
> created must be cleaned up before the cleanup_net can happen.
> 
> This is why I chose to call regster_pernet_subsys. Again, as per
> comments in net_namespace.h:
> 
>  * Use these carefully.  If you implement a network device and it
>  * needs per network namespace operations use device pernet operations,
>  * otherwise use pernet subsys operations.

This is not a strict rule, and currently I'm working on pernet_operations
synchronization. This commentary is for generic code, while your rds
is differ from the rest of ordinary pernet_operations.

> On (03/16/18 18:51), Kirill Tkhai wrote:
>>> Let's find another approach. Could you tell what problem we have in 
>>> case of rds_tcp_dev_ops is declared as pernet_device?
> 
> As above, rds-tcp is not a network device.

It's not an agrument. It's not a rule, what you read in the comment.
We need to use rules of reasonableness, and the generic rules do not
fit to your driver. So, it's need to search a solution. Be the only
user NETDEV_UNREGISTER_FINAL in the kernel does not look a good one.

>> One more question. Which time we take a reference of loopback device?
>> Is it possible before we created a net completely?
> 
> We dont take a reference on the loopback device. 
> We make sure none of the kernel sockets does a get_net() so
> that we dont block the cleanup_net, and then, when all
> the network interfaces have been taken down (loopback is
> the last one) we know there are no more packets coming in
> and out, so it is safe to dismantle all kernel sockets 
> created by rds-tcp.

So that, if RDS packets does not act on netdev refcount,
which circular dependence did you mean in your second
message (quote below)?

>As I recall, when I wrote the initial patchset, my problem
>was that in order to let the module unload make progress,
>all these sockets had to be cleaned up. But to clean up these
>sockets, net_device cleanup had to complete (should not have
>any new incoming connections to the listen endpoint on a 
>non-loopback socket) so I ended up with a circular dependancy.

Kirill