linux-kernel - Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20080828141513.GC31444@goodmis.org>
Date:	Thu, 28 Aug 2008 10:15:13 -0400
From:	Steven Rostedt <rostedt@...dmis.org>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	linux-kernel@...r.kernel.org,
	Stefani Seibold <stefani@...bold.net>,
	Dario Faggioli <raistlin@...ux.it>,
	Nick Piggin <nickpiggin@...oo.com.au>,
	Max Krasnyansky <maxk@...lcomm.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

On Tue, Aug 19, 2008 at 01:05:57PM +0200, Ingo Molnar wrote:
> 
> * Peter Zijlstra <a.p.zijlstra@...llo.nl> wrote:
> 
> > Disable bandwidth control by default.
> > 
> > Signed-off-by: Peter Zijlstra <a.p.zijlstra@...llo.nl>
> > ---
> >  kernel/sched.c |   17 +++++++----------
> >  1 file changed, 7 insertions(+), 10 deletions(-)
> > 
> > Index: linux-2.6/kernel/sched.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/sched.c
> > +++ linux-2.6/kernel/sched.c
> > @@ -824,9 +824,9 @@ static __read_mostly int scheduler_runni
> >  
> >  /*
> >   * part of the period that we allow rt tasks to run in us.
> > - * default: 0.95s
> > + * default: inf
> >   */
> > -int sysctl_sched_rt_runtime = 950000;
> > +int sysctl_sched_rt_runtime = -1;
> 
> The fixes look good to me, but this enabling of infinite RT task lockups 
> is not an improvement.
> 
> The thing is, i got far more bugreports about locked up RT tasks where 
> the lockup was unintentional, than real bugreports about anyone 
> _intending_ for the whole box to come to a grinding halt because a 
> high-prio RT tasks is monopolizing the CPU.
> 
> In fact there's only been this artificial test so far.
> 
> So could you please just increase the chunking to 10 seconds or so, from 
> the current 1 second? Anyone locking up the system for more than 10 
> seconds via an RT task has to deal with many other issues already.
> 
> I.e. keep the system borderline debuggable (up to 10 seconds delays are 
> _not_ nice so people will notice) - but it's still a marked improvement 
> from completly locked up desktops.
> 
> And those who really need longer than 10 second periods can set it 
> higher, or even (if they want to live dangerously or run POSIX 
> conformance tests) make it infinite (set it to -1) - and will have to 
> deal with other things like the softlockup watchdog as well.

My biggest concern about adding a limit to FIFO is that an RT developer
would spend weeks trying to debug their system wondering why their
planned CPU RT hog, is being preempted by a non-RT task.

For this, if this time limit does kick in, we should at the very least
print something out to let the user know this happened. After all, this
is more of a safety net anyway, and if we are hitting the limit, the
user should be notified. Perhaps even tell the user that if this
behaviour is expected, to up the sysctl <var> by more.

Peter, another question. Is this limit for a single RT task running, or
all RT tasks. I'm assuming here that it is a single RT task. If you have
20 RT tasks all running, would this let non RT tasks in? In that case,
this could be even a bigger issues.

Thanks,

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/