lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAD=FV=V8aB7_6tEq9SbW+vNC5Y7G+8JMsWXo6+zdtF3tmJ97EQ@mail.gmail.com>
Date:   Thu, 2 Dec 2021 10:05:39 -0800
From:   Doug Anderson <dianders@...omium.org>
To:     Qais Yousef <qais.yousef@....com>
Cc:     Joel Fernandes <joelaf@...gle.com>, Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Mel Gorman <mgorman@...e.de>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sched/rt: Don't reschedule a throttled task even if it's
 higher priority

Hi,

On Thu, Dec 2, 2021 at 3:11 AM Qais Yousef <qais.yousef@....com> wrote:
>
> Hi Joel, Doug
>
> On 11/30/21 18:36, Joel Fernandes wrote:
> > On Tue, Nov 30, 2021 at 11:30 AM Doug Anderson <dianders@...omium.org> wrote:
> > > I know this isn't crazy urgent and I'm happy to sit and twiddle my
> > > thumbs a bit longer if everyone is still sleepy from tryptophan, but
> > > I'm curious if anyone had a chance to look at this. Can anyone confirm
> > > that my script reproduces for them on something other than my system?
> >
> > Maybe +Qais Yousef can also chime in. Just to add, I think Doug's
>
> I could try :-)
>
> /me tries to find the original posting

Oops! I thought I had duplicated enough of the content in this patch.
Should have included a reference. It's:

https://lore.kernel.org/r/CAD=FV=UKyJLhDEKxhqrit16kvLfi+g0DyYKL6bLJ35fO7NCTsg@mail.gmail.com/


> > issue is that under RT group scheduling, he expects *other* well
> > behaved RT tasks to not be throttled while the tasks that are
> > misbehaving to be *gracefully* throttled. Gracefully being- it is
> > probably WAI that they are exceeding their runtime and so *should
> > naturally* be bandwidth-limited similar to deadline. I am not sure if
> > the design of RT throttling considers throttling as an anomaly or
> > something that's expected to happen. If it is expected to happen, then
> > I see Doug's point that the mechanism shouldn't affect other RT tasks.
>
> I don't know much about RT group throttling, but I am under the impression it
> is to give other sched classes a breather, not other lower priority RT tasks.
> RT still relies on admins knowing what they're doing when creating RT tasks and
> manage their priorities.

So I guess this is where my fundamental disagreement is. I would say
that one point of the throttling is to prevent an errant RT task (one
that's going crazy) from taking the system down with it. It seems a
bit strange to me that we'd want to protect the normal tasks (which
should, in general, be less urgent than _all_ the RT tasks) but not
lower priority RT tasks. In my case the lower priority RT tasks are
actually important enough that blocking them _does_ take the system
down.


> What I think is happening in your case is that you're spawning 13 busyloops at
> priority 99, assuming you're on a chromebook which probably has at best 8 CPUs,
> you're basically killing the machine and there's nothing the scheduler can do
> about it. You get what you asked for :-)

Well, I asked it to only let those 13 busyloops get 10 ms to run per 1
second at most (and then put them on ice). ...so in my mind I _didn't_
get what I asked for. ;-)


> If you spawn nr_cpus_id - 1 busyloops, the lower priority tasks should find
> 1 cpu to get their work done, and I don' expect a hang then. And the RT
> throttling should allow CFS to make some progress every now and then on the
> other CPUs. So you might end up in a slow system, but not hanging one
> hopefully.
>
> Note that Peter fixed the kernel so that it produces known RT priorities (as
> opposed to developers setting random values in the past):
>
>   * sched_set_fifo_low() ==> not really RT but needs to be above cfs. Runs at
>     priority 1.
>   * sched_set_fifo() ==> sets priority MAX_RT_PRIO/2 ==> 50
>
> So most system RT tasks fall into the 2nd category, which is reasonably high
> priority. And RT scheduling assumes if you set something higher than that,
> then it's your responsibility to make sure they don't starve :-)
>
> Yep, it's easy to shoot yourself in the foot with RT, that's why it's
> privileged op ;-)

For sure getting RT scheduling wrong will shoot you in the foot, but I
don't think what I did was wrong.

>From my understanding of the throttling, one big goal (maybe not the
only goal, but a big one) is to make sure that if a process goes
haywire (presumably trying to take 100% of the CPU) that the system
doesn't die and corrective action can be taken on the errant process.
I think I have shown that this goal is really only achieved if the
errant process is lower priority than all the other RT tasks in the
system. That doesn't seem like a reasonable limitation...

I would also say that with Peter's fix above the problem is perhaps
_more_ urgent? You just said that there's a whole bunch of kernel
tasks that are now created with the lowest RT priority. From your
description above this means "not really RT but needs to be above
cfs". If a sysadmin uses RT_GROUP_SCHED and a priority other than the
lowest priority then it's pretty much guaranteed that if the process
goes haywire that it will take down all of these important kernel
tasks. That can't be right, can it?

Maybe the answer is simply that RT_GROUP_SCHED and RT priority are
incompatible? I could submit a patch that forces all RT tasks to have
the same effective priority if RT_GROUP_SCHED is defined. :-/


> > > Does my patch seem sane?
> >
> > To be honest, I don't understand that bit enough :(
>
> Nope. The patch breaks RT priority scheduling. You basically allow a lower
> priority task to run before a higher priority one. Another code path will
> probably detect that and tries to correct it later, but still this is not
> right.

OK, I'll take your word for it. I'm still hoping someone can tell me a
better way to fix this?


-Doug

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ