netdev - rtnl_trylock() versus SCHED

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <b6eca125-351c-27c5-c34b-08c611ac2511@prevas.dk>
Date:   Wed, 5 Aug 2020 16:25:23 +0200
From:   Rasmus Villemoes <rasmus.villemoes@...vas.dk>
To:     Network Development <netdev@...r.kernel.org>
Subject: rtnl_trylock() versus SCHED_FIFO lockup

Hi,

We're seeing occasional lockups on an embedded board (running an -rt
kernel), which I believe I've tracked down to the

            if (!rtnl_trylock())
                    return restart_syscall();

in net/bridge/br_sysfs_br.c. The problem is that some SCHED_FIFO task
writes a "1" to the /sys/class/net/foo/bridge/flush file, while some
lower-priority SCHED_FIFO task happens to hold rtnl_lock(). When that
happens, the higher-priority task is stuck in an eternal ERESTARTNOINTR
loop, and the lower-priority task never gets runtime and thus cannot
release the lock.

I've written a script that rather quickly reproduces this both on our
target and my desktop machine (pinning everything on one CPU to emulate
the uni-processor board), see below. Also, with this hacky patch

diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
index 0318a69888d4..df8078c023d2 100644
--- a/net/bridge/br_sysfs_br.c
+++ b/net/bridge/br_sysfs_br.c
@@ -36,6 +36,7 @@ static ssize_t store_bridge_parm(struct device *d,
        char *endp;
        unsigned long val;
        int err;
+       static unsigned int restarts;

        if (!ns_capable(dev_net(br->dev)->user_ns, CAP_NET_ADMIN))
                return -EPERM;
@@ -44,8 +45,14 @@ static ssize_t store_bridge_parm(struct device *d,
        if (endp == buf)
                return -EINVAL;

-       if (!rtnl_trylock())
-               return restart_syscall();
+       if (!rtnl_trylock()) {
+               restarts++;
+               if (restarts < 100)
+                       return restart_syscall();
+               pr_err("too many restarts, doing unconditional
rtnl_lock()\n");
+               rtnl_lock();
+       }
+       restarts = 0;

        err = (*set)(br, val);
        if (!err)

priority inheritance kicks in and boosts the lower-prio thread so the
lockup doesn't happen. But I'm failing to come up with a proper solution.

Thoughts?

Thanks,
Rasmus

Reproducer:

#!/bin/bash

dev=br-test

flusher() {
    # $$ doesn't work as expected in subshells
    read -r pid _ < /proc/self/stat
    echo "flusher: PID $pid"
    chrt -f -p 20 $pid
    while true ; do
        echo 1 > /sys/class/net/${dev}/bridge/flush
        sleep .15
    done
    exit 0
}

worker() {
    read -r pid _ < /proc/self/stat
    echo "worker: PID $pid"
    chrt -f -p 10 $pid
    while true ; do
        read -n 1 -u 12
        ip addr add 200.201.202.203/24 dev ${dev}
        ip addr del 200.201.202.203/24 dev ${dev}
        echo -n . >&21
    done
    exit 0
}

taskset -p 1 $$
chrt -f -p 30 $$

tmpdir=$(mktemp -d)
mkfifo ${tmpdir}/a
mkfifo ${tmpdir}/b

exec 12<> ${tmpdir}/a
exec 21<> ${tmpdir}/b

ip link add name $dev type bridge

( flusher ) &
flusher_pid=$!
( worker ) &
worker_pid=$!

sleep .1
printf '\n'

count=0
while ! [ -e /tmp/stop ] && [ $count -lt 1000 ]; do
    echo -n . >&12
    read -n 1 -u 21 -t 10
    if [ $? -gt 0 ] ; then
        printf '\nlockup?!\n'
        sleep 20
        break
    fi
    count=$((count+1))
    printf '\r%4d' $count
    sleep .02
done

kill $flusher_pid
kill $worker_pid

wait

rm -rf $tmpdir
ip link del $dev type bridge