lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 25 Mar 2022 09:45:14 +0100 (CET)
From:   Vincent Ray <vray@...rayinc.com>
To:     linyunsheng <linyunsheng@...wei.com>
Cc:     vladimir oltean <vladimir.oltean@....com>, kuba <kuba@...nel.org>,
        davem <davem@...emloft.net>, Samuel Jones <sjones@...rayinc.com>,
        netdev <netdev@...r.kernel.org>,
        方国炬 <guoju.fgj@...baba-inc.com>
Subject: Re: packet stuck in qdisc

OK I'll try that, thank you LinYun.

(I'm sorry for the delay in my answers, I haven't been able to try your debug patch yet because I've had other problems with my setup, preventing me from reproducing the issue in the first place, but it should be ok soon)

----- Original Message -----
From: "linyunsheng" <linyunsheng@...wei.com>
To: "Vincent Ray" <vray@...rayinc.com>, "vladimir oltean" <vladimir.oltean@....com>, "kuba" <kuba@...nel.org>, "davem" <davem@...emloft.net>
Cc: "Samuel Jones" <sjones@...rayinc.com>, "netdev" <netdev@...r.kernel.org>, "方国炬" <guoju.fgj@...baba-inc.com>
Sent: Friday, March 25, 2022 7:16:02 AM
Subject: Re: packet stuck in qdisc

On 2022/1/28 10:36, Yunsheng Lin wrote:
> On 2022/1/25 20:55, Vincent Ray wrote:
>> Dear kernel maintainers / developers,
>>
>> I work at Kalray where we are developping an NVME-over-TCP target controller board.
>> My setup is as such :
>> - a development workstation running Linux 5.x.y (the host)
>> - sending NVME-TCP traffic to our board, to which it is connected through a Mellanox NIC (Connect-X-5) and a 100G ETH cable
>>
>> While doing performance tests, using simple fio scenarios running over the regular kernel nvme-tcp driver on the host, we noticed important performance variations.
>> After some digging (using tcpdump on the host), we found that there were big "holes" in the tcp traffic sent by the host.
>> The scenario we observed is the following :
>> 1) a TCP segment gets lost (not sent by the host) on a particular TCP connection, leading to a gap in the seq numbers received by the board
>> 2) the board sends dup-acks and/or sacks (if configured) to signal this loss
>> 3) then, sometimes, the host stops emitting on that TCP connection for several seconds (as much as 14s observed)
>> 4) finally the host resumes emission, sending the missing packet
>> 5) then the TCP connection continues correctly with the appropriate throughput
>>
>> Such a scenario can be observed in the attached tcpdump (+ comments).
> 
> Hi,
>     Thanks for reporting the problem.

Hi,
   It seems guoju from alibaba has a similar problem as above.
   And they fixed it by adding a smp_mb() barrier between spin_unlock()
and test_bit() in qdisc_run_end(), please see if it fixes your problem.

> 
>>



To declare a filtering error, please use the following link : https://www.security-mail.net/reporter.php?mid=5ef9.623d5e27.9b9df.0&r=vray%40kalrayinc.com&s=linyunsheng%40huawei.com&o=Re%3A+packet+stuck+in+qdisc&verdict=C&c=7b4f9607053f62d4edea3c79310a8bd5d5e63628




Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ