linux-kernel - Re: [PREEMPT-RT] [patch 4 14/22] timer: Switch to a non cascading wheel

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160816094600.GC23077@localhost.localdomain>
Date:	Tue, 16 Aug 2016 11:46:00 +0200
From:	Richard Cochran <richardcochran@...il.com>
To:	Jouni Malinen <jkmalinen@...il.com>
Cc:	rcochran@...utronix.de, Thomas Gleixner <tglx@...utronix.de>,
	Rik van Riel <riel@...hat.com>, Len Brown <lenb@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Frederic Weisbecker <fweisbec@...il.com>,
	LKML <linux-kernel@...r.kernel.org>,
	George Spelvin <linux@...encehorizons.net>,
	Josh Triplett <josh@...htriplett.org>,
	Chris Mason <clm@...com>, Eric Dumazet <edumazet@...gle.com>,
	rt@...utronix.de, Paul McKenney <paulmck@...ux.vnet.ibm.com>,
	Ingo Molnar <mingo@...nel.org>,
	Arjan van de Ven <arjan@...radead.org>, j <j@...fi>
Subject: Re: [PREEMPT-RT] [patch 4 14/22] timer: Switch to a non cascading
 wheel

Jouni,

If I understand the test correctly, then the slightly different kernel
timer behavior is ok, but the test isn't quite right.  Let explain
what I mean.

First off, reading test_ap_wps.py, the point of the test is to see if
ten simultaneous connections are possible.  I guess the server
implements a hard coded limit on the number of clients.  (BTW where is
the server loop?)

You said that the server also sets 'backlog' to ten.  The backlog
controls the size of the queue holding incoming connections that are
in the SYN_RCVD or ESTABLISHED state but have not yet been
accept(2)-ed by the server.  This is *not* the same as the number of
possible simultaneous connections.

On Sat, Aug 13, 2016 at 12:12:26PM +0300, Jouni Malinen wrote:
> Yes, it looks like a TCP connect() timeout. I use a significantly
> reduced timeout in the test scripts since they are run unattended and
> are supposed to terminate in reasonable amount of time.. That said,

I did not find where the client sets the one second timeout.  Where
does this happen?

> If I increase that 20 to 50, I get more of such about 1.03 second
> results at i=17, i=34, i=48..

Can you provide the timings when the test runs on the older kernel?

> Looking more at what exactly is happening at the TCP layer, this is
> likely related to the server behavior since listen() backlog is set to
> 10 and if there are 10 parallel connections, the last one if
> immediately closed before reading anything.

To clarify, when the backlog is exceed, the new connection is not
closed.  Instead, the SYN is simply ignored, and the client is expect
to re-transmit the SYN in the normal TCP fashion.

> Looking at a sniffer capture (*), the three-way TCP connection goes
> through fine for the first 15 connect() calls, but the 15th one does
> not get a response to SYN. This SYN is the frame 47 in the capture
> file with srcport == 60802. There is no SYN,ACK for it. The about one
> second unexpected time for connect() comes from this, i.e., the
> connection is completed only after the client side does TCP
> retransmission of the SYN (frame #77) a second later and the server
> side replies with RST,ACK (frame #78).

This is the expected behavior.

> So it looks like the issue is in one of the SYN,ACK frames getting
> completely lost..

No, the frame is not missing.  It was never sent because the backlog
was exceeded.

Here is what I suspect is happening.  By sending 20 SYN frames to a
port with a backlog of 10, it saturates the queue.  One SYN is ignored
by the kernel, and a race begins between the connect() timeout and the
SYN re-transmission.  If the client's re-transmitted SYN and then the
server's SYN,ACK returns before the connect timeout, then the call to
connect() succeeds.  With the new timer wheel, the result of the race
is different.

There a couple of ways to deal with this.  One is to increase the
backlog on the server side.  Another is to increase the connect()
timeout to a multiple of the re-transmission interval.

Thoughts?

Thanks,
Richard