linux-kernel - Re: [PATCH 6/6] sched: disabled rt-bandwidth by default

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080826102937.GA25732@elte.hu>
Date:	Tue, 26 Aug 2008 12:29:37 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Nick Piggin <nickpiggin@...oo.com.au>
Cc:	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	linux-kernel@...r.kernel.org,
	Stefani Seibold <stefani@...bold.net>,
	Dario Faggioli <raistlin@...ux.it>,
	Max Krasnyansky <maxk@...lcomm.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH 6/6] sched: disabled rt-bandwidth by default


* Nick Piggin <nickpiggin@...oo.com.au> wrote:

> On Tuesday 26 August 2008 19:30, Ingo Molnar wrote:
> > * Nick Piggin <nickpiggin@...oo.com.au> wrote:
> > > So... no reply to this? I'm really wondering how it's OK to break
> > > documented standards and previous Linux behaviour by default for
> > > something that it is trivial to solve in userspace? [...]
> >
> > I disagree 
> 
> Disagree with what? That it's a problem to basically break the 
> guarantee realtime SCHED_ policies have previously provided?

I think you are sticking to the rigid letter of some standard without 
seeing the bigger picture.

Firstly, please realize that to do a "successful" POSIX or other 
conformance run a default Linux distribution has to be tweaked and often 
crippled literally dozens and often hundreds of ways. In this case you 
also have to add one more entry to /etc/sysctl.conf, to allow RT tasks 
to monopolize CPU time. So you can still get the POSIX sticker if you 
want to - nothing changed about that.

Secondly, my big picture point is that our task is to make Linux more 
useful and more usable by default. You seem to be arguing that RT tasks 
should be allowed by default to monopolize all CPU time forever, and i 
disagree with that proposition.

But do _you_ actually use such runaway CPU-monopolizing RT tasks? Try it 
one day and you'll quickly meet various practical problems. Let a 
SCHED_FIFO:99 RT task run long enough and on all the main distributions 
you will get:

  BUG: soft lockup - CPU#1 stuck for 61s! [bash:3659]

But monopolizing any resource in a 100% way (which you are arguing for) 
is just not a generic Linux system and for years (seeing all the 
practical problems with it) we tried various methods to contain 
SCHED_FIFO tasks in the scheduler, none was really acceptable for 
mainline.

Peter's changes were clean and useful at last. There's lots of apps that 
use SCHED_FIFO for a short burst of activity, and 100% of the ones i 
know do not want to run for longer than 10 seconds.

Thirdly, your argument can only be consistent if you also argue for the 
softlockup watchdog to be disabled. Do you make that point?

> > and what do you mean by "trivial to solve in user-space"?
> 
> I mean that if some distro has turned on the RT scheduling ulimit by 
> default and now finds themselves with a local DoS for unpriviliged 
> users as a result, then either that distro should just make their init 
> scripts set the throttle and break the API themselves, or they should 
> start a watchdog at a higher priority than unprivileged user can set.

... but that's by far not the only usecase. Very frequently i've seen 
bugreports from people with runaway RT tasks (which tasks were running 
as root) where that runaway behavior was completely unintended. Audio 
apps or other apps getting into a loop and locking up the system.

Worse than that, such bugs prevented the system from being debugged by 
plain users. A runaway RT task that monopolizes the CPU will lock it up 
completely, requiring a hard reset or a power cycle. That can lose data, 
etc. If we allow it to lock up the CPU for up to 10 seconds it will 
still be noticed if that is unintentional (the system is very slow), but 
the problem can be debugged.

By making RT tasks not lock up like that by default and allowing them to 
'only' monopolize the CPU up to 10 seconds, we make the system more 
debuggable and more useful in general. It is a quite reasonable 
proposition that makes Linux useful in general, and you seem to be 
ignoring that practical angle altogether. It's not about allowing 
user-space rtprio-rlimit driven apps to not run away, it's about 
allowing _any_ RT task to be throttled by default if they run away. 

On the other side of the equation, what exact application do you know 
that absolutely relies on being able to monopolize all CPU time in 
excess of 10 seconds? I havent heard much about that usecase. Why does 
that particular RT app do it, because that behavior sounds _very_ weird 
to me.

If it's some embedded system or other special-purpose app then it can 
tweak the sysctl no problem. (it will have to do it anyway, to turn off 
the softlockup watchdog)

If it's some general purpose Linux app, exactly which one is it? If it's 
an OSS app please give me an URL to its source code, we need to fix it 
urgently. Running for more than 10 seconds wastes power like mad and is 
generally a very un-nice thing to do.

All in one, since the 'buggy RT app runs into a loop and monopolizes the 
CPU' case is much more common, i do think that supporting that usecase 
is the better choice for a default.

... and in any case, i agree with some of the observations in this 
thread, in particular that the 1 second default limit was too low 
(_occasional_ spurts of a couple of seconds activities by RT tasks ought 
to be OK) - that's why we upped it to 10 seconds already in sched/devel 
tree, a week ago or so.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/