netdev - Re: packet stuck in qdisc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <969086798.7658413.1648197914959.JavaMail.zimbra@kalray.eu>
Date:   Fri, 25 Mar 2022 09:45:14 +0100 (CET)
From:   Vincent Ray <vray@...rayinc.com>
To:     linyunsheng <linyunsheng@...wei.com>
Cc:     vladimir oltean <vladimir.oltean@....com>, kuba <kuba@...nel.org>,
        davem <davem@...emloft.net>, Samuel Jones <sjones@...rayinc.com>,
        netdev <netdev@...r.kernel.org>,
        方国炬 <guoju.fgj@...baba-inc.com>
Subject: Re: packet stuck in qdisc

OK I'll try that, thank you LinYun.

(I'm sorry for the delay in my answers, I haven't been able to try your debug patch yet because I've had other problems with my setup, preventing me from reproducing the issue in the first place, but it should be ok soon)

----- Original Message -----
From: "linyunsheng" <linyunsheng@...wei.com>
To: "Vincent Ray" <vray@...rayinc.com>, "vladimir oltean" <vladimir.oltean@....com>, "kuba" <kuba@...nel.org>, "davem" <davem@...emloft.net>
Cc: "Samuel Jones" <sjones@...rayinc.com>, "netdev" <netdev@...r.kernel.org>, "方国炬" <guoju.fgj@...baba-inc.com>
Sent: Friday, March 25, 2022 7:16:02 AM
Subject: Re: packet stuck in qdisc

On 2022/1/28 10:36, Yunsheng Lin wrote:
> On 2022/1/25 20:55, Vincent Ray wrote:
>> Dear kernel maintainers / developers,
>>
>> I work at Kalray where we are developping an NVME-over-TCP target controller board.
>> My setup is as such :
>> - a development workstation running Linux 5.x.y (the host)
>> - sending NVME-TCP traffic to our board, to which it is connected through a Mellanox NIC (Connect-X-5) and a 100G ETH cable
>>
>> While doing performance tests, using simple fio scenarios running over the regular kernel nvme-tcp driver on the host, we noticed important performance variations.
>> After some digging (using tcpdump on the host), we found that there were big "holes" in the tcp traffic sent by the host.
>> The scenario we observed is the following :
>> 1) a TCP segment gets lost (not sent by the host) on a particular TCP connection, leading to a gap in the seq numbers received by the board
>> 2) the board sends dup-acks and/or sacks (if configured) to signal this loss
>> 3) then, sometimes, the host stops emitting on that TCP connection for several seconds (as much as 14s observed)
>> 4) finally the host resumes emission, sending the missing packet
>> 5) then the TCP connection continues correctly with the appropriate throughput
>>
>> Such a scenario can be observed in the attached tcpdump (+ comments).
> 
> Hi,
>     Thanks for reporting the problem.

Hi,
   It seems guoju from alibaba has a similar problem as above.
   And they fixed it by adding a smp_mb() barrier between spin_unlock()
and test_bit() in qdisc_run_end(), please see if it fixes your problem.

> 
>>



To declare a filtering error, please use the following link : https://www.security-mail.net/reporter.php?mid=5ef9.623d5e27.9b9df.0&r=vray%40kalrayinc.com&s=linyunsheng%40huawei.com&o=Re%3A+packet+stuck+in+qdisc&verdict=C&c=7b4f9607053f62d4edea3c79310a8bd5d5e63628