linux-kernel - Re: [PATCH 8/8] net: Implement socketat.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1286113441.3812.229.camel@bigi>
Date:	Sun, 03 Oct 2010 09:44:01 -0400
From:	jamal <hadi@...erus.ca>
To:	Daniel Lezcano <daniel.lezcano@...e.fr>
Cc:	Pavel Emelyanov <xemul@...allels.com>,
	"Eric W. Biederman" <ebiederm@...ssion.com>,
	linux-kernel@...r.kernel.org,
	Linux Containers <containers@...ts.osdl.org>,
	netdev@...r.kernel.org, netfilter-devel@...r.kernel.org,
	linux-fsdevel@...r.kernel.org,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Michael Kerrisk <mtk.manpages@...il.com>,
	Ulrich Drepper <drepper@...il.com>,
	Al Viro <viro@...IV.linux.org.uk>,
	David Miller <davem@...emloft.net>,
	"Serge E. Hallyn" <serge@...lyn.com>,
	Pavel Emelyanov <xemul@...nvz.org>,
	Ben Greear <greearb@...delatech.com>,
	Matt Helsley <matthltc@...ibm.com>,
	Jonathan Corbet <corbet@....net>,
	Sukadev Bhattiprolu <sukadev@...ux.vnet.ibm.com>,
	Jan Engelhardt <jengelh@...ozas.de>,
	Patrick McHardy <kaber@...sh.net>
Subject: Re: [PATCH 8/8] net: Implement socketat.

Hi Daniel,

Thanks for clarifying this ..

On Sat, 2010-10-02 at 23:13 +0200, Daniel Lezcano wrote:
> Just to clarify this point. You enter the namespace, create the socket
> and go back to the initial namespace (or create a new one). Further 
> operations can be made against this fd because it is the network 
> namespace stored in the sock struct which is used, not the current 
> process network namespace which is used at the socket creation only.
> 
> We can actually already do that by unsharing and then create a
> socket. 
> This socket will pin the namespace and can be used as a control socket
> for the namespace (assuming the socket domain will be ok for all the 
> operations).
>
> Jamal, I don't know what kind of application you want to use but if I 
> assume you want to create a process controlling 1024 netns, 

At the moment i am looking at 8K on a Nehalem with lots of RAM. They
will mostly be created at startup but some could be created afterwards.
Each will have its own netdevs etc. also created at startup (and some
other config that may happen later). 
Because startup time may accumulate, it is clearly important to me
to pick whatever scheme that reduces the number of calls...

> let's try to identificate what happen with setns and with socketat :
> 
> With setns:
> 
>      * open /proc/self/ns/net (1)
>      * unshare the netns
>      * open /proc/self/ns/net (2)
>      * setns (1)
>      * create a virtual network device
>      * move the virtual device to (2) (using the set netns by fd)
>      * unshare the netns
>      ...
> 
> With socketat:
> 
>      * open a socket (1)
>      * unshare the netns
>      * open a netlink with socketat(1) => (2)
>      * create a virtual device using (2) (at this point it is
> init_net_ns)
>      * move the virtual device to the current netns (using the set
> netns 
> by pid)
>      * open a socket (3)
>      * unshare the netns
>      ...
> 
> We have the same number of file descriptors kept opened. Except, with 
> setns we can bind mount the directory somewhere, that will pin the 
> namespace and then we can close the /proc/self/ns/net file descriptors
> and reopen them later.
> 

Ok, so a wrapper such as: create_socket_on(namespaceid)
will have generally less system calls with socketat()

> If your application has to do a lot of specific network processing, 
> during its life cycle, in different namespaces, the socketat syscall 
> will be better because it will reduce the number of syscalls but at
> the cost of keeping the file descriptors opened (potentially a big
> number). Otherwise, setns should fit your needs.

Makes sense. 

One thing still confuses me...
The app control point is in namespace0. I still want to be able to
"boot" namespaces first and maybe a few seconds later do a socketat()...
and create devices, tcp sockets etc. I suspect create_ns(namespace-name)
would involve:
     * open /proc/self/ns/net (namespace-name)
     * unshare the netns
Is this correct?

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/