lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Thu, 28 Oct 2021 07:59:01 -0700
From:   Jörn Engel <joern@...estorage.com>
To:     Ariel Elior <aelior@...vell.com>
Cc:     Caleb Sander <csander@...estorage.com>,
        Eric Dumazet <eric.dumazet@...il.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "GR-everest-linux-l2@...vell.com" <GR-everest-linux-l2@...vell.com>
Subject: Re: [EXT] Re: [PATCH] qed: avoid spin loops in
 _qed_mcp_cmd_and_union()

On Thu, Oct 28, 2021 at 05:47:10AM +0000, Ariel Elior wrote:
>
> Indeed this function sends messages to the management FW, and may
> be invoked both from atomic contexts and from non atomic ones.
> CAN_SLEEP indicated whether it is permissible in the context from which
> it was invoked to sleep.

That is a rather unfortunate pattern.  I understand the desire for code
reuse, but the result is often to use udelay-loops that can take
seconds.  In case of unresponsive firmware you tend to always hit the
timeouts and incur maximum latency.

Since the scheduler is blocked on the local CPU for the time of the spin
loop and won't even bother migrating high-priority threads away - the
assumption is that the current thread will not loop for a long time -
the result can be pretty bad for latency-sensitive code.  You cannot
guarantee any latencies below the timeout of those loops, essentially.

Having a flag or some other means to switch between sleeping and
spinning would help to reduce the odds.  Avoiding calls from atomic
contexts would help even more.  Ideally I would like to remove all
such calls.  The only legitimate exceptions should be those handling
with high-volume packet RX/TX and never involve long-running loops.
Anything else can be handled from a kworker or similar.  If a 1s loop is
acceptable, waiting a few ms for the scheduler must also be acceptable.

Jörn

--
If a problem has a hardware solution, and a software solution,
do it in software.
-- Arnd Bergmann

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ