lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 23 Oct 2017 21:11:38 -0400
From:   Neal Cardwell <ncardwell@...gle.com>
To:     Steve Ibanez <sibanez@...nford.edu>
Cc:     Netdev <netdev@...r.kernel.org>, Florian Westphal <fw@...len.de>,
        Mohammad Alizadeh <alizadeh@...il.mit.edu>
Subject: Re: Linux ECN Handling

On Mon, Oct 23, 2017 at 6:15 PM, Steve Ibanez <sibanez@...nford.edu> wrote:
> Hi All,
>
> I upgraded the kernel on all of our machines to Linux
> 4.13.8-041308-lowlatency. However, I'm still observing the same
> behavior where the source enters a timeout when the CWND=1MSS and it
> receives ECN marks.
>
> Here are the measured flow rates:
> <https://drive.google.com/file/d/0B-bt9QS-C3ONT0VXMUt6WHhKREE/view?usp=sharing>
>
> Here are snapshots of the packet traces at the sources when they both
> enter a timeout at t=1.6sec:
>
> 10.0.0.1 timeout event:
> <https://drive.google.com/file/d/0B-bt9QS-C3ONcl9WRnRPazg2ems/view?usp=sharing>
>
> 10.0.0.3 timeout event:
> <https://drive.google.com/file/d/0B-bt9QS-C3ONeDlxRjNXa0VzWm8/view?usp=sharing>
>
> Both still essentially follow the same sequence of events that I
> mentioned earlier:
> (1) receives an ACK for byte XYZ with the ECN flag set
> (2) stops sending for RTO_min=300ms
> (3) sends a retransmission for byte XYZ
>
> The cwnd samples reported by tcp_probe still indicate that the sources
> are reacting to the ECN marks more than once per window. Here are the
> cwnd samples at the same timeout event mentioned above:
> <https://drive.google.com/file/d/0B-bt9QS-C3ONdEZQdktpaW5JUm8/view?usp=sharing>
>
> Let me know if there is anything else you think I should try.

Sounds like perhaps cwnd is being set to 0 somewhere in this DCTCP
scenario. Would you be able to add printk statements in
tcp_init_cwnd_reduction(), tcp_cwnd_reduction(), and
tcp_end_cwnd_reduction(), printing the IP:port, tp->snd_cwnd, and
tp->snd_ssthresh?

Based on the output you may be able to figure out where cwnd is being
set to zero. If not, could you please post the printk output and
tcpdump traces (.pcap, headers-only is fine) from your tests?

thanks,
neal

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ