lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200805163425.6c13ef11@hermes.lan>
Date:   Wed, 5 Aug 2020 16:34:25 -0700
From:   Stephen Hemminger <stephen@...workplumber.org>
To:     Rasmus Villemoes <rasmus.villemoes@...vas.dk>
Cc:     Network Development <netdev@...r.kernel.org>
Subject: Re: rtnl_trylock() versus SCHED_FIFO lockup

On Wed, 5 Aug 2020 16:25:23 +0200
Rasmus Villemoes <rasmus.villemoes@...vas.dk> wrote:

> Hi,
> 
> We're seeing occasional lockups on an embedded board (running an -rt
> kernel), which I believe I've tracked down to the
> 
>             if (!rtnl_trylock())
>                     return restart_syscall();
> 
> in net/bridge/br_sysfs_br.c. The problem is that some SCHED_FIFO task
> writes a "1" to the /sys/class/net/foo/bridge/flush file, while some
> lower-priority SCHED_FIFO task happens to hold rtnl_lock(). When that
> happens, the higher-priority task is stuck in an eternal ERESTARTNOINTR
> loop, and the lower-priority task never gets runtime and thus cannot
> release the lock.
> 
> I've written a script that rather quickly reproduces this both on our
> target and my desktop machine (pinning everything on one CPU to emulate
> the uni-processor board), see below. Also, with this hacky patch

There is a reason for the trylock, it works around a priority inversion.

The real problem is expecting a SCHED_FIFO task to be safe with this
kind of network operation.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ