netdev - Re: listen(2) backlog changes in or around Linux 3.1?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJgzZooJM7aj=K-idvdbwUOOmtDM6u-=gsVOs2OFZjiZwQP0zw@mail.gmail.com>
Date:	Thu, 18 Oct 2012 10:20:17 -0700
From:	enh <enh@...gle.com>
To:	Venkat Venkatsubra <venkat.x.venkatsubra@...cle.com>
Cc:	netdev@...r.kernel.org
Subject: Re: listen(2) backlog changes in or around Linux 3.1?

On Thu, Oct 18, 2012 at 9:53 AM, Venkat Venkatsubra
<venkat.x.venkatsubra@...cle.com> wrote:
> Correction. I don't see the client side receiving any abort/termination
> notification.
> They all remain on ESTABLISHED state on the client side.

yeah, that's what i see with netstat -t too.

in the meantime i'm working around this by connecting to one of
RFC5737's test networks
(https://android-review.googlesource.com/#/c/44563/), but i'd love to
at least understand what's going on here, even if it's just that i
have a fundamental misunderstanding of what the listen backlog is
supposed to mean.

> In tcpdump I don't see a FIN or RST coming from the server for the aborted
> connections.
>
> Venkat
>
>
> On 10/18/2012 11:00 AM, Venkat Venkatsubra wrote:
>>
>> Hi Elliott,
>>
>> I see the same behavior with your test program.
>> The connect() keeps succeeding even though accept() is not performed.
>> It pauses after 4 connections for a while and then periodically keeps
>> adding few (2 I think).
>>
>> But the server side end points are terminated too. You will see only the
>> first 2 sessions on the server side.
>> If you modify your test program to say read or poll the sockets you should
>> get a termination notification on them I think .
>>
>> The behavior overall looks fine in my opinion.  But it could be a change
>> of behavior for your test program.
>>
>> Venkat
>>
>> On 10/16/2012 6:31 PM, enh wrote:
>>>
>>> boiling things down to a short C++ program, i see that i can reproduce
>>> the behavior even on 2.6 kernels. if i run this, i see 4 connections
>>> immediately (3 + 1, as i'd expect)... but then about 10s later i see
>>> another 2. and every few seconds after that, i see another 2. i've let
>>> this run until i have hundreds of connect(2) calls that have returned,
>>> despite my small listen(2) backlog and the fact that i'm not
>>> accept(2)ing.
>>>
>>> so i guess the only thing that's changed with newer kernels is timing
>>> (hell, since i only see newer kernels on newer hardware, it might just
>>> be a hardware thing).
>>>
>>> and clearly i don't understand what the listen(2) backlog means any more.
>>>
>>> #include<netinet/ip.h>
>>> #include<netinet/tcp.h>
>>> #include<sys/types.h>
>>> #include<sys/socket.h>
>>> #include<iostream>
>>> #include<stdlib.h>
>>> #include<string.h>
>>> #include<errno.h>
>>>
>>> void dump_ti(int fd) {
>>>   tcp_info ti;
>>>   socklen_t tcp_info_length = sizeof(tcp_info);
>>>   int rc = getsockopt(fd, SOL_IP, TCP_INFO,&ti,&tcp_info_length);
>>>   if (rc == -1) {
>>>     std::cout<<  "getsockopt rc "<<  rc<<  ": "<<  strerror(errno)<<
>>> "\n";
>>>     return;
>>>   }
>>>
>>>   std::cout<<  "ti.tcpi_unacked="<<  ti.tcpi_unacked<<  "\n";
>>>   std::cout<<  "ti.tcpi_sacked="<<  ti.tcpi_sacked<<  "\n";
>>> }
>>>
>>> void connect_to(sockaddr_in&  sa) {
>>>   int s = socket(AF_INET, SOCK_STREAM, 0);
>>>   if (s == -1) {
>>>     abort();
>>>   }
>>>
>>>   int rc = connect(s, (sockaddr*)&sa, sizeof(sockaddr_in));
>>>   std::cout<<  "connect = "<<  rc<<  "\n";
>>> }
>>>
>>> int main() {
>>>   int ss = socket(AF_INET, SOCK_STREAM, 0);
>>>   std::cout<<  "socket fd "<<  ss<<  "\n";
>>>
>>>   sockaddr_in sa;
>>>   memset(&sa, 0, sizeof(sa));
>>>   sa.sin_family = AF_INET;
>>>   sa.sin_addr.s_addr = htonl(INADDR_ANY);
>>>   sa.sin_port = htons(9877);
>>>   int rc = bind(ss, (sockaddr*)&sa, sizeof(sa));
>>>   std::cout<<  "bind rc "<<  rc<<  ": "<<  strerror(errno)<<  "\n";
>>>   std::cout<<  "bind port "<<  sa.sin_port<<  "\n";
>>>
>>>   rc = listen(ss, 1);
>>>   std::cout<<  "listen rc "<<  rc<<  ": "<<  strerror(errno)<<  "\n";
>>>   dump_ti(ss);
>>>
>>>   while (true) {
>>>    connect_to(sa);
>>>    dump_ti(ss);
>>>   }
>>>
>>>   return 0;
>>> }
>>>
>>>
>>> On Mon, Oct 15, 2012 at 10:26 AM, enh<enh@...gle.com>  wrote:
>>>>
>>>> On Mon, Oct 15, 2012 at 10:12 AM, Venkat Venkatsubra
>>>> <venkat.x.venkatsubra@...cle.com>  wrote:
>>>>>
>>>>> On 10/12/2012 6:40 PM, enh wrote:
>>>>>>
>>>>>> i used to use the following hack to unit test connect timeouts: i'd
>>>>>> call listen(2) on a socket and then deliberately connect (backlog + 3)
>>>>>> sockets without accept(2)ing any of the connections. (why 3? because
>>>>>> Stevens told me so, and experiment backed him up. see figure 4.10 in
>>>>>> his UNIX Network Programming.)
>>>>>>
>>>>>> with "old" kernels, 2.6.35-ish to 3.0-ish, this worked great. my next
>>>>>> connect(2) to the same loopback port would hang indefinitely. i could
>>>>>> even unblock the connect by calling accept(2) in another thread. this
>>>>>> was awesome for testing.
>>>>>>
>>>>>> in 3.1 on ARM, 3.2 on x86 (Ubuntu desktop), and 3.4 on ARM, this no
>>>>>> longer works. it doesn't seem to be as simple as "the constant is no
>>>>>> longer 3". my tests are now flaky. sometimes they work like they used
>>>>>> to, and sometimes an extra connect(2) will succeed. (or, if i'm in
>>>>>> non-blocking mode, my poll(2) will return with the non-blocking socket
>>>>>> that's trying to connect now ready.)
>>>>>>
>>>>>> i'm guessing if this changed in 3.1 and is still changed in 3.4,
>>>>>> whatever's changed wasn't an accident. but i haven't been able to find
>>>>>> the right search terms to RTFM. i also finally got around to grepping
>>>>>> the kernel for the "+ 3", but wasn't able to find that. (so i'd be
>>>>>> interested to know where the old behavior came from too.)
>>>>>>
>>>>>> my least worst workaround at the moment is to use one of RFC5737's
>>>>>> test networks, but that requires that the device have a network
>>>>>> connection, otherwise my connect(2)s fail immediately with
>>>>>> ENETUNREACH, which is no use to me. also, unlike my old trick, i've
>>>>>> got no way to suddenly "unblock" a slow connect(2) (this is useful for
>>>>>> unit testing the code that does the poll(2) part of the usual
>>>>>> connect-with-timeout implementation).
>>>>>> https://android-review.googlesource.com/#/c/44563/
>>>>>>
>>>>>> hopefully someone here can shed some light on this? ideally someone
>>>>>> will have a workaround as good as my old trick. i realize i was
>>>>>> relying on undocumented behavior, and i'm happy to have to check
>>>>>> /proc/version and behave appropriately, but i'd really like a way to
>>>>>> keep my unit tests!
>>>>>>
>>>>>> thanks,
>>>>>>    elliott
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>>>> the body of a message to majordomo@...r.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>> Hi Elliott,
>>>>>
>>>>> In BSD I think the backlog used to be reset to 3/2 times that passed by
>>>>> the
>>>>> user. So, 2 becomes 3.
>>>>> Probably the 1/2 times increase was to accommodate the ones in
>>>>> partial/incomplete queue.
>>>>> In Linux is it possible you were getting the same behavior before the
>>>>> below
>>>>> commit ?
>>>>> Since the check used to be "backlog+1" a 2 will behave as 3 ?
>>>>
>>>> i don't think so, because with<= 3.0 kernels i used to have a backlog
>>>> of 1 and be able to make _4_ connections before my next connect would
>>>> hang. but this>  to>= change is at least something for me to
>>>> investigate...
>>>>
>>>>> commit 8488df894d05d6fa41c2bd298c335f944bb0e401
>>>>> Author: Wei Dong<weid@...css.fujitsu.com>
>>>>> Date:   Fri Mar 2 12:37:26 2007 -0800
>>>>>
>>>>>      [NET]: Fix bugs in "Whether sock accept queue is full" checking
>>>>>
>>>>>          when I use linux TCP socket, and find there is a bug in
>>>>> function
>>>>> sk_acceptq_is_full().
>>>>>
>>>>>          When a new SYN comes, TCP module first checks its validation.
>>>>> If
>>>>> valid,
>>>>>      send SYN,ACK to the client and add the sock to the syn hash table.
>>>>> Next
>>>>>      time if received the valid ACK for SYN,ACK from the client. server
>>>>> will
>>>>>      accept this connection and increase the sk->sk_ack_backlog --
>>>>> which is
>>>>>      done in function tcp_check_req().We check wether acceptq is full
>>>>> in
>>>>>      function tcp_v4_syn_recv_sock().
>>>>>
>>>>>      Consider an example:
>>>>>
>>>>>       After listen(sockfd, 1) system call, sk->sk_max_ack_backlog is
>>>>> set to
>>>>>      1. As we know, sk->sk_ack_backlog is initialized to 0. Assuming
>>>>> accept()
>>>>>      system call is not invoked now.
>>>>>
>>>>>      1. 1st connection comes. invoke sk_acceptq_is_full().
>>>>>       sk->sk_ack_backlog=0 sk->sk_max_ack_backlog=1, function return 0
>>>>> accept
>>>>> this connection.
>>>>>       Increase the sk->sk_ack_backlog
>>>>>      2. 2nd connection comes. invoke sk_acceptq_is_full().
>>>>>       sk->sk_ack_backlog=1 sk->sk_max_ack_backlog=1, function return 0
>>>>> accept
>>>>> this connection.
>>>>>       Increase the sk->sk_ack_backlog
>>>>>      3. 3rd connection comes. invoke sk_acceptq_is_full().
>>>>>       sk->sk_ack_backlog=2 sk->sk_max_ack_backlog=1, function return 1.
>>>>> Refuse this connection.
>>>>>
>>>>>      I think it has bugs. after listen system call.
>>>>> sk->sk_max_ack_backlog=1
>>>>>      but now it can accept 2 connections.
>>>>>
>>>>>      Signed-off-by: Wei Dong<weid@...css.fujitsu.com>
>>>>>      Signed-off-by: David S. Miller<davem@...emloft.net>
>>>>>
>>>>> Venkat
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>> the body of a message to majordomo@...r.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@...r.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>



-- 
Elliott Hughes - http://who/enh - http://jessies.org/~enh/
NIO, JNI, or bionic questions? Mail me/drop by/add me as a reviewer.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html