[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1445605340.22974.140.camel@edumazet-glaptop2.roam.corp.google.com>
Date: Fri, 23 Oct 2015 06:02:20 -0700
From: Eric Dumazet <eric.dumazet@...il.com>
To: Casper.Dik@...cle.com
Cc: Al Viro <viro@...IV.linux.org.uk>,
Alan Burlison <Alan.Burlison@...cle.com>,
David Miller <davem@...emloft.net>, stephen@...workplumber.org,
netdev@...r.kernel.org, dholland-tech@...bsd.org
Subject: Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect
for sockets in accept(3)
On Fri, 2015-10-23 at 11:52 +0200, Casper.Dik@...cle.com wrote:
>
> >Ho-hum... It could even be made lockless in fast path; the problems I see
> >are
> > * descriptor-to-file lookup becomes unsafe in a lot of locking
> >conditions. Sure, most of that happens on the entry to some syscall, with
> >very light locking environment, but... auditing every sodding ioctl that
> >might be doing such lookups is an interesting exercise, and then there are
> >->mount() instances doing the same thing. And procfs accesses. Probably
> >nothing impossible to deal with, but nothing pleasant either.
>
> In the Solaris kernel code, the ioctl code is generally not handled a file
> descriptor but instead a file pointer (i.e., the lookup is done early in
> the system call).
>
> In those specific cases where a system call needs to convert a file
> descriptor to a file pointer, there is only one routines which can be used.
>
> > * memory footprint. In case of Linux on amd64 or sparc64,
> >main()
> >{
> > int i;
> > for (i = 0; i < 1<<24; dup2(0, i++)) // 16M descriptors
> > ;
> >}
> >will chew 132Mb of kernel data (16Mpointer + 32Mbit, assuming sufficient ulimit -n,
> >of course). How much will Solaris eat on the same?
>
> Yeah, that is a large amount of memory. Of course, the table is only
> sized when it is extended and there is a reason why there is a limit on
> file descriptors. But we're using more data per file descriptor entry.
>
>
> > * related to the above - how much cacheline sharing will that involve?
> >These per-descriptor use counts are bitch to pack, and giving each a cacheline
> >of its own... <shudder>
>
> As I said, we do actually use a lock and yes that means that you really
> want to have a single cache line for each and every entry. It does make
> it easy to have non-racy file description updates. You certainly do not
> want false sharing when there is a lot of contention.
>
> Other data is used to make sure that it only takes O(log(n)) to find the
> lowest available file descriptor entry. (Where n, I think, is the returned
> descriptor)
Yet another POSIX deficiency.
When a server deals with 10,000,000+ socks, we absolutely do not care of
this requirement.
O(log(n)) is still crazy if it involves O(log(n)) cache misses.
>
> Not contended locks aren't expensive. And all is done on a single cache
> line.
>
> One question about the Linux implementation: what happens when a socket in
> select is closed? I'm assuming that the kernel waits until "shutdown" is
> given or when a connection comes in?
>
> Is it a problem that you can "hide" your listening socket with a thread in
> accept()? I would think so (It would be visible in netstat but you can't
> easily find out why has it)
Again, netstat -p on a server with 10,000,000 sockets never completes.
Never try this unless you are desperate and want to avoid a reboot
maybe.
If you absolutely want to nuke a listener because of untrusted
applications, we better implement a proper syscall.
Android has such a facility.
Alternative would be to extend netlink (ss command from iproute2
package) to carry one pid per socket.
ss -atnp state listening
-> would not have to readlink (/proc/*/fd/*)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists