netdev - Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <201510230952.t9N9qYZJ021998@room101.nl.oracle.com>
Date:	Fri, 23 Oct 2015 11:52:34 +0200
From:	Casper.Dik@...cle.com
To:	Al Viro <viro@...IV.linux.org.uk>
cc:	Alan Burlison <Alan.Burlison@...cle.com>,
	David Miller <davem@...emloft.net>, eric.dumazet@...il.com,
	stephen@...workplumber.org, netdev@...r.kernel.org,
	dholland-tech@...bsd.org
Subject: Re: [Bug 106241] New: shutdown(3)/close(3) behaviour is incorrect for sockets in accept(3) 

>Ho-hum...  It could even be made lockless in fast path; the problems I see
>are
>	* descriptor-to-file lookup becomes unsafe in a lot of locking
>conditions.  Sure, most of that happens on the entry to some syscall, with
>very light locking environment, but... auditing every sodding ioctl that
>might be doing such lookups is an interesting exercise, and then there are
>->mount() instances doing the same thing.  And procfs accesses.  Probably
>nothing impossible to deal with, but nothing pleasant either.

In the Solaris kernel code, the ioctl code is generally not handled a file 
descriptor but instead a file pointer (i.e., the lookup is done early in 
the system call).

In those specific cases where a system call needs to convert a file 
descriptor to a file pointer, there is only one routines which can be used.

>	* memory footprint.  In case of Linux on amd64 or sparc64,
>main()
>{
>	int i;
>	for (i = 0; i < 1<<24; dup2(0, i++))	// 16M descriptors
>		;
>}
>will chew 132Mb of kernel data (16Mpointer + 32Mbit, assuming sufficient ulimit -n,
>of course).  How much will Solaris eat on the same?

Yeah, that is a large amount of memory.  Of course, the table is only 
sized when it is extended and there is a reason why there is a limit on 
file descriptors.  But we're using more data per file descriptor entry.

>	* related to the above - how much cacheline sharing will that involve?
>These per-descriptor use counts are bitch to pack, and giving each a cacheline
>of its own...  <shudder>

As I said, we do actually use a lock and yes that means that you really  
want to have a single cache line for each and every entry.  It does make 
it easy to have non-racy file description updates.  You certainly do not 
want false sharing when there is a lot of contention.

Other data is used to make sure that it only takes O(log(n)) to find the 
lowest available file descriptor entry.  (Where n, I think, is the returned
descriptor)

Not contended locks aren't expensive.  And all is done on a single cache 
line.

One question about the Linux implementation: what happens when a socket in 
select is closed?  I'm assuming that the kernel waits until "shutdown" is 
given or when a connection comes in?

Is it a problem that you can "hide" your listening socket with a thread in 
accept()?  I would think so (It would be visible in netstat but you can't 
easily find out why has it)

Casper

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html