lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 13 Apr 2020 08:03:57 +0300
From:   Leon Romanovsky <leon@...nel.org>
To:     David Miller <davem@...emloft.net>
Cc:     kuba@...nel.org, arjan@...ux.intel.com, xiyou.wangcong@...il.com,
        jhs@...atatu.com, jiri@...nulli.us, netdev@...r.kernel.org
Subject: Re: [PATCH net v1] net/sched: Don't print dump stack in event of
 transmission timeout

On Sun, Apr 12, 2020 at 09:19:25PM -0700, David Miller wrote:
>
> This is cause by a device"overwhelmed with traffic"?  Sounds like
> normal operation to me.
>
> That's a bug, and the driver handling the device with this problem
> should adjust how it implements TX timeouts to accomodate this.

>From the internal bug description, hope that it makes sense.

-----
A timeout may occur if the amount of the reported bytes higher than the queue limit,
in this case, the kernel closes the queue and only after getting a completion it wil
reopen it.

In the debug we saw that in some situations the driver gets a **delayed completion**,
completions arrive after **1 min**, therefore, the amount of queued bytes exceeds the
DQL max size.

As a result, the kernel after watchdog_timeo calls the driver's timeout function,
that prints timeout to dmesg.

After debugging the issue with FW to understand the root cause of the delayed completions
we understand that since the IB and the TCP traffic are running at the same service level (SL),
the same schedule queue schedules between all the QPs, and in this case if one of the IB QPs get
stuck because of congestion, all other QPs will be stuck (include the TCP QPs) until releasing
the stuck QP.
-----

User separates traffic to different SLs.

Thanks

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ