[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <781BA871-5D3D-4C89-9629-81345CC41C5C@amazon.com>
Date: Mon, 7 Dec 2020 16:09:44 +0000
From: "Mohamed Abuelfotoh, Hazem" <abuehaze@...zon.com>
To: Eric Dumazet <edumazet@...gle.com>
CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"stable@...r.kernel.org" <stable@...r.kernel.org>,
"ycheng@...gle.com" <ycheng@...gle.com>,
"ncardwell@...gle.com" <ncardwell@...gle.com>,
"weiwan@...gle.com" <weiwan@...gle.com>,
"Strohman, Andy" <astroh@...zon.com>,
"Herrenschmidt, Benjamin" <benh@...zon.com>
Subject: Re: [PATCH net-next] tcp: optimise receiver buffer autotuning initialisation
for high latency connections
>Since I can not reproduce this problem with another NIC on x86, I
>really wonder if this is not an issue with ENA driver on PowerPC
>perhaps ?
I am able to reproduce it on x86 based EC2 instances using ENA or Xen netfront or Intel ixgbevf driver on the receiver so it's not specific to ENA, we were able to easily reproduce it between 2 VMs running in virtual box on the same physical host considering the environment requirements I mentioned in my first e-mail.
What's the RTT between the sender & receiver in your reproduction? Are you using bbr on the sender side?
Thank you.
Hazem
On 07/12/2020, 15:26, "Eric Dumazet" <edumazet@...gle.com> wrote:
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
On Sat, Dec 5, 2020 at 1:03 PM Mohamed Abuelfotoh, Hazem
<abuehaze@...zon.com> wrote:
>
> Unfortunately few things are missing in this report.
>
> What is the RTT between hosts in your test ?
> >>>>>RTT in my test is 162 msec, but I am able to reproduce it with lower RTTs for example I could see the issue downloading from google endpoint with RTT of 16.7 msec, as mentioned in my previous e-mail the issue is reproducible whenever RTT exceeded 12msec given that the sender is using bbr.
>
> RTT between hosts where I run the iperf test.
> # ping 54.199.163.187
> PING 54.199.163.187 (54.199.163.187) 56(84) bytes of data.
> 64 bytes from 54.199.163.187: icmp_seq=1 ttl=33 time=162 ms
> 64 bytes from 54.199.163.187: icmp_seq=2 ttl=33 time=162 ms
> 64 bytes from 54.199.163.187: icmp_seq=3 ttl=33 time=162 ms
> 64 bytes from 54.199.163.187: icmp_seq=4 ttl=33 time=162 ms
>
> RTT between my EC2 instances and google endpoint.
> # ping 172.217.4.240
> PING 172.217.4.240 (172.217.4.240) 56(84) bytes of data.
> 64 bytes from 172.217.4.240: icmp_seq=1 ttl=101 time=16.7 ms
> 64 bytes from 172.217.4.240: icmp_seq=2 ttl=101 time=16.7 ms
> 64 bytes from 172.217.4.240: icmp_seq=3 ttl=101 time=16.7 ms
> 64 bytes from 172.217.4.240: icmp_seq=4 ttl=101 time=16.7 ms
>
> What driver is used at the receiving side ?
> >>>>>>I am using ENA driver version version: 2.2.10g on the receiver with scatter gathering enabled.
>
> # ethtool -k eth0 | grep scatter-gather
> scatter-gather: on
> tx-scatter-gather: on
> tx-scatter-gather-fraglist: off [fixed]
This ethtool output refers to TX scatter gather, which is not relevant
for this bug.
I see ENA driver might use 16 KB per incoming packet (if ENA_PAGE_SIZE is 16 KB)
Since I can not reproduce this problem with another NIC on x86, I
really wonder if this is not an issue with ENA driver on PowerPC
perhaps ?
Amazon Web Services EMEA SARL, 38 avenue John F. Kennedy, L-1855 Luxembourg, R.C.S. Luxembourg B186284
Amazon Web Services EMEA SARL, Irish Branch, One Burlington Plaza, Burlington Road, Dublin 4, Ireland, branch registration number 908705
Powered by blists - more mailing lists