linux-kernel - Re: RT_GROUP_SCHED throttling blocks unthrottled RT tasks?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAD=FV=X35OW+YRvspk=M-W+qcLObCxzGc7dJsvAZSXSyCZsv1Q@mail.gmail.com>
Date:   Mon, 15 Nov 2021 17:13:02 -0800
From:   Doug Anderson <dianders@...omium.org>
To:     Peter Zijlstra <peterz@...radead.org>,
        Steven Rostedt <rostedt@...dmis.org>
Cc:     Joel Fernandes <joelaf@...gle.com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: RT_GROUP_SCHED throttling blocks unthrottled RT tasks?

Hi,

On Fri, Nov 5, 2021 at 10:44 AM Doug Anderson <dianders@...omium.org> wrote:
>
> Hi,
>
> I'm seeing a strange behavior that I _think_ is a bug. I'm hoping that
> some of the scheduling experts can tell me if I'm just
> misunderstanding or if this is truly a bug. To see it, I do this:
>
> --
>
> # Allow 1000 us more of RT at system and top cgroup
> old_rt=$(cat /proc/sys/kernel/sched_rt_runtime_us)
> echo $((old_rt + 1000)) > /proc/sys/kernel/sched_rt_runtime_us
> old_rt=$(cat /sys/fs/cgroup/cpu/cpu.rt_runtime_us)
> echo $((old_rt + 1000)) > /sys/fs/cgroup/cpu/cpu.rt_runtime_us
>
> # Give the 1000 us to my own group
> mkdir /sys/fs/cgroup/cpu/doug
> echo 1000 > /sys/fs/cgroup/cpu/doug/cpu.rt_runtime_us
>
> # Fork off a bunch of spinny things
> for i in $(seq 13); do
>   python -c "while True: pass"&
> done
>
> # Make my spinny things RT and put in my group
> # (assumes no other python is running!)
> for pid in $(ps aux | grep python | grep -v grep | awk '{print $2}'); do
>   echo $pid >> /sys/fs/cgroup/cpu/doug/tasks
>   chrt -p -f 99 $pid
> done
>
> --
>
> As expected, the spinny python tasks are pretty much throttled down to
> 0 in the above (they get 1 ms out of 1 second).
>
> However, _the bug_ is that the above basically causes all _other_ RT
> things in my system to stop functioning. I'm on an ARM Chromebook
> (sc7180-trogdor) and we communicate to our EC on a "realtime" thread
> due to SPI timing requirements. The above commands appear to starve
> the EC's communication task and (as far as I can tell) other RT tasks
> in the system.
>
> Notably:
>
> a) My EC comms slow to a crawl (eventually one gets through).
>
> b) "top" shows stuff like this:
>
>   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+
> COMMAND
>   179 root     -51   0       0      0      0 R 100.0   0.0   0:31.79
> cros_ec_spi_hig
>   180 root     -51   0       0      0      0 R  95.2   0.0   0:50.19
> irq/169-chromeo
>   184 root     -51   0       0      0      0 R  95.2   0.0   0:13.24
> spi10
>   221 root      -2   0       0      0      0 R  95.2   0.0   0:50.57
> ring0
>
> c) If I give my spinny tasks just a little bit more time than 1 ms
> then I get a hung task.
>
>
> When I'm testing the above, the non-RT stuff in the system continues
> to work OK though. I can even go in and kill all my python tasks and
> the system returns to normal.
>
> I tried gathering some tracing. One bit that might (?) be relevant:
>
>  cros_ec_spi_hig-179     [000] d.h5  1495.305919: sched_waking:
> comm=kworker/4:2 pid=5232 prio=120 target_cpu=004
>  cros_ec_spi_hig-179     [000] d.h6  1495.305926: sched_wakeup:
> comm=kworker/4:2 pid=5232 prio=120 target_cpu=004
>           <idle>-0       [001] d.H5  1495.309113: sched_waking:
> comm=sugov:6 pid=2658 prio=-1 target_cpu=006
>           <idle>-0       [001] d.H6  1495.309119: sched_wakeup:
> comm=sugov:6 pid=2658 prio=-1 target_cpu=006
>  cros_ec_spi_hig-179     [000] d.h5  1495.309336: sched_waking:
> comm=sugov:6 pid=2658 prio=-1 target_cpu=006
>  cros_ec_spi_hig-179     [000] d.h6  1495.309341: sched_wakeup:
> comm=sugov:6 pid=2658 prio=-1 target_cpu=006
>           <idle>-0       [001] d.H5  1495.312137: sched_waking:
> comm=sugov:6 pid=2658 prio=-1 target_cpu=006
>           <idle>-0       [001] d.H6  1495.312142: sched_wakeup:
> comm=sugov:6 pid=2658 prio=-1 target_cpu=006
>  cros_ec_spi_hig-179     [000] d.h5  1495.312859: sched_waking:
> comm=sugov:6 pid=2658 prio=-1 target_cpu=006
>  cros_ec_spi_hig-179     [000] d.h6  1495.312870: sched_wakeup:
> comm=sugov:6 pid=2658 prio=-1 target_cpu=006
>
> My best guess is that there's some bug in the scheduler where it just
> loops constantly picking an unthrottled RT task but then incorrectly
> decides that it's throttled and thus doesn't run it.
>
> Most of my testing has been on the chromeos-5.4 kernel, but just in
> case I tried a vanilla v5.15 kernel and I could reproduce the same
> problems.
>
>
> Anyway, if I'm just doing something stupid then I appologize of the
> noise. If the above should work and you need me to gather more logging
> / try any experiments, I'm happy to do so.

I spent a little more time here and:

a) Managed to come up with a test case that makes it easy for anyone
to reproduce the problem themselves.

b) Managed to come up with a short patch that "fixes" this problem for me.

Since I had a potential fix, I posted this as a patch. Maybe we can
move the discussion to my post of that patch, which will hopefully
show up at:

https://lore.kernel.org/r/20211115170241.1.I94825a614577505bd1a8be9aeff208a49cb39b3d@changeid

...oddly that patch isn't showing up (?). It should be "[PATCH]
sched/rt: Don't reschedule a throttled task even if it's higher
priority". If it's not there tomorrow, I can try to repost again.

-Doug