linux-kernel - RT_GROUP_SCHED throttling blocks unthrottled RT tasks?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAD=FV=UKyJLhDEKxhqrit16kvLfi+g0DyYKL6bLJ35fO7NCTsg@mail.gmail.com>
Date:   Fri, 5 Nov 2021 10:44:22 -0700
From:   Doug Anderson <dianders@...omium.org>
To:     Peter Zijlstra <peterz@...radead.org>,
        Steven Rostedt <rostedt@...dmis.org>
Cc:     Joel Fernandes <joelaf@...gle.com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: RT_GROUP_SCHED throttling blocks unthrottled RT tasks?

Hi,

I'm seeing a strange behavior that I _think_ is a bug. I'm hoping that
some of the scheduling experts can tell me if I'm just
misunderstanding or if this is truly a bug. To see it, I do this:

--

# Allow 1000 us more of RT at system and top cgroup
old_rt=$(cat /proc/sys/kernel/sched_rt_runtime_us)
echo $((old_rt + 1000)) > /proc/sys/kernel/sched_rt_runtime_us
old_rt=$(cat /sys/fs/cgroup/cpu/cpu.rt_runtime_us)
echo $((old_rt + 1000)) > /sys/fs/cgroup/cpu/cpu.rt_runtime_us

# Give the 1000 us to my own group
mkdir /sys/fs/cgroup/cpu/doug
echo 1000 > /sys/fs/cgroup/cpu/doug/cpu.rt_runtime_us

# Fork off a bunch of spinny things
for i in $(seq 13); do
  python -c "while True: pass"&
done

# Make my spinny things RT and put in my group
# (assumes no other python is running!)
for pid in $(ps aux | grep python | grep -v grep | awk '{print $2}'); do
  echo $pid >> /sys/fs/cgroup/cpu/doug/tasks
  chrt -p -f 99 $pid
done

--

As expected, the spinny python tasks are pretty much throttled down to
0 in the above (they get 1 ms out of 1 second).

However, _the bug_ is that the above basically causes all _other_ RT
things in my system to stop functioning. I'm on an ARM Chromebook
(sc7180-trogdor) and we communicate to our EC on a "realtime" thread
due to SPI timing requirements. The above commands appear to starve
the EC's communication task and (as far as I can tell) other RT tasks
in the system.

Notably:

a) My EC comms slow to a crawl (eventually one gets through).

b) "top" shows stuff like this:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+
COMMAND
  179 root     -51   0       0      0      0 R 100.0   0.0   0:31.79
cros_ec_spi_hig
  180 root     -51   0       0      0      0 R  95.2   0.0   0:50.19
irq/169-chromeo
  184 root     -51   0       0      0      0 R  95.2   0.0   0:13.24
spi10
  221 root      -2   0       0      0      0 R  95.2   0.0   0:50.57
ring0

c) If I give my spinny tasks just a little bit more time than 1 ms
then I get a hung task.

When I'm testing the above, the non-RT stuff in the system continues
to work OK though. I can even go in and kill all my python tasks and
the system returns to normal.

I tried gathering some tracing. One bit that might (?) be relevant:

 cros_ec_spi_hig-179     [000] d.h5  1495.305919: sched_waking:
comm=kworker/4:2 pid=5232 prio=120 target_cpu=004
 cros_ec_spi_hig-179     [000] d.h6  1495.305926: sched_wakeup:
comm=kworker/4:2 pid=5232 prio=120 target_cpu=004
          <idle>-0       [001] d.H5  1495.309113: sched_waking:
comm=sugov:6 pid=2658 prio=-1 target_cpu=006
          <idle>-0       [001] d.H6  1495.309119: sched_wakeup:
comm=sugov:6 pid=2658 prio=-1 target_cpu=006
 cros_ec_spi_hig-179     [000] d.h5  1495.309336: sched_waking:
comm=sugov:6 pid=2658 prio=-1 target_cpu=006
 cros_ec_spi_hig-179     [000] d.h6  1495.309341: sched_wakeup:
comm=sugov:6 pid=2658 prio=-1 target_cpu=006
          <idle>-0       [001] d.H5  1495.312137: sched_waking:
comm=sugov:6 pid=2658 prio=-1 target_cpu=006
          <idle>-0       [001] d.H6  1495.312142: sched_wakeup:
comm=sugov:6 pid=2658 prio=-1 target_cpu=006
 cros_ec_spi_hig-179     [000] d.h5  1495.312859: sched_waking:
comm=sugov:6 pid=2658 prio=-1 target_cpu=006
 cros_ec_spi_hig-179     [000] d.h6  1495.312870: sched_wakeup:
comm=sugov:6 pid=2658 prio=-1 target_cpu=006

My best guess is that there's some bug in the scheduler where it just
loops constantly picking an unthrottled RT task but then incorrectly
decides that it's throttled and thus doesn't run it.

Most of my testing has been on the chromeos-5.4 kernel, but just in
case I tried a vanilla v5.15 kernel and I could reproduce the same
problems.

Anyway, if I'm just doing something stupid then I appologize of the
noise. If the above should work and you need me to gather more logging
/ try any experiments, I'm happy to do so.

Thanks!

-Doug