netdev - Re: rtnl_trylock() versus SCHED

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <191e0da8-178f-5f91-3d37-9b7cefb61352@prevas.dk>
Date:   Thu, 6 Aug 2020 11:17:30 +0200
From:   Rasmus Villemoes <rasmus.villemoes@...vas.dk>
To:     Stephen Hemminger <stephen@...workplumber.org>
Cc:     Network Development <netdev@...r.kernel.org>
Subject: Re: rtnl_trylock() versus SCHED_FIFO lockup

On 06/08/2020 01.34, Stephen Hemminger wrote:
> On Wed, 5 Aug 2020 16:25:23 +0200
> Rasmus Villemoes <rasmus.villemoes@...vas.dk> wrote:
> 
>> Hi,
>>
>> We're seeing occasional lockups on an embedded board (running an -rt
>> kernel), which I believe I've tracked down to the
>>
>>             if (!rtnl_trylock())
>>                     return restart_syscall();
>>
>> in net/bridge/br_sysfs_br.c. The problem is that some SCHED_FIFO task
>> writes a "1" to the /sys/class/net/foo/bridge/flush file, while some
>> lower-priority SCHED_FIFO task happens to hold rtnl_lock(). When that
>> happens, the higher-priority task is stuck in an eternal ERESTARTNOINTR
>> loop, and the lower-priority task never gets runtime and thus cannot
>> release the lock.
>>
>> I've written a script that rather quickly reproduces this both on our
>> target and my desktop machine (pinning everything on one CPU to emulate
>> the uni-processor board), see below. Also, with this hacky patch
> 
> There is a reason for the trylock, it works around a priority inversion.

Can you elaborate? It seems to me that it _causes_ a priority inversion
since priority inheritance doesn't have a chance to kick in.

> The real problem is expecting a SCHED_FIFO task to be safe with this
> kind of network operation.

Maybe. But ignoring the SCHED_FIFO/rt-prio stuff, it also seems a bit
odd to do what is essentially a busy-loop - yes, the restart_syscall()
allows signals to be delivered (including allowing the process to get
killed), but in the absence of any signals, the pattern essentially
boils down to

  while (!rtnl_trylock())
    ;

So even for regular tasks, this seems to needlessly hog the cpu.

I tried this

diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
index 0318a69888d4..e40e264f9b16 100644
--- a/net/bridge/br_sysfs_br.c
+++ b/net/bridge/br_sysfs_br.c
@@ -44,8 +44,8 @@ static ssize_t store_bridge_parm(struct device *d,
        if (endp == buf)
                return -EINVAL;

-       if (!rtnl_trylock())
-               return restart_syscall();
+       if (rtnl_lock_interruptible())
+               return -ERESTARTNOINTR;

        err = (*set)(br, val);
        if (!err)

with the obvious definition of rtnl_lock_interruptible(), and it makes
the problem go away. Isn't it better to sleep waiting for the lock (and
with -rt, giving proper priority boost) or a signal to arrive rather
than busy-looping back and forth between syscall entry point and the
trylock()?

I see quite a lot of

    if (mutex_lock_interruptible(...))
            return -ERESTARTSYS;

but for the rtnl_mutex, I see the trylock...restart_syscall pattern
being used in a couple of places. So there must be something special
about the rtnl_mutex?

Thanks,
Rasmus