linux-kernel - Re: [PATCH v4] sched: automated per session task groups

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <201012052118.43843.kernel@kolivas.org>
Date:	Sun, 5 Dec 2010 21:18:43 +1100
From:	Con Kolivas <kernel@...ivas.org>
To:	Colin Walters <walters@...bum.org>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Mike Galbraith <efault@....de>, Ingo Molnar <mingo@...e.hu>,
	Oleg Nesterov <oleg@...hat.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Markus Trippelsdorf <markus@...ppelsdorf.de>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4] sched: automated per session task groups

Greets.

I applaud your efforts to continue addressing interactivity and responsiveness 
but, I know I'm going to regret this, I feel strongly enough to speak up about 
this change.

On Sun, 5 Dec 2010 10:43:44 Colin Walters wrote:
> On Sat, Dec 4, 2010 at 5:39 PM, Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
> > What's your point again? It's a heuristic.
> 
> So if it's a heuristic the OS can get wrong,

This is precisely what I see as the flaw in this approach. The whole reason 
you have CFS now is that we had a scheduler which was pretty good for all the 
other things in the O(1) scheduler, but needed heuristics to get interactivity 
right. I put them there. Then I spent the next few years trying to find a way 
to get rid of them. The reason is precisely what Colin says above. Heuristics 
get it wrong sometimes. So no matter how smart you think your heuristics are, 
it is impossible to get it right 100% of the time. If the heuristics make it 
better 99% of the time, and introduce disastrous corner cases, regressions and 
exploits 1% of the time, that's unforgivable. That's precisely what we had 
with the old O(1) scheduler and that's what you got rid of when you put CFS 
into mainline. The whole reason CFS was better was it was mostly fair and 
concentrated on ensuring decent latency rather than trying to guess what would 
be right, so it was predictable and reliable.

So if you introduce heuristics once again into the scheduler to try and 
improve the desktop by unfairly distributing CPU, you will go back to where 
you once were. Mostly better but sometimes really badly wrong. No matter how 
smart you think you can be with heuristics they cannot be right all the time. 
And there are regressions with these tty followed by per session group 
patches. Search forums where desktop users go and you'll see that people are 
afraid to speak up on lkml but some users are having mplayer and amarok 
skipping under light load when trying them. You want to program more 
intelligence in to work around these regressions, you'll just get yourself 
deeper and deeper into the same quagmire. The 'quick fix' you seek now is not 
something you should be defending so vehemently. The "I have a solution now" 
just doesn't make sense in this light. I for one do not welcome our new 
heuristic overlords.

If you're serious about really improving the desktop from within the kernel, 
as you seem to be with this latest change, then make a change that's 
predictable and gets it right ALL the time and is robust for the future. Stop 
working within all the old fashioned concepts and allow userspace to tell the 
kernel what it wants, and give the user the power to choose. If you think this 
is too hard and not doable, or that the user is too uninformed or want to 
modify things themselves, then allow me to propose a relatively simple change 
that can expedite this.

There are two aspects to getting good desktop behaviour, enough CPU and low 
latency. 'nice' by your own admission is too crude and doesn't really describe 
how either of these should really be modified. Furthermore there are 40 levels 
of it and only about 4 or 5 are ever used. We also know that users don't even 
bother using it. 

What I propose is a new syscall latnice for "latency nice". It only need have 
4 levels, 1 for default, 0 for latency insensitive, 2 for relatively latency 
sensitive gui apps, and 3 for exquisitely latency sensitive uses such as 
audio. These should not require extra privileges to use and thus should also 
not be usable for "exploiting" extra CPU by default. It's simply a matter of 
working with lower latencies yet shorter quota (or timeslices) which would 
mean throughput on these apps is sacrificed due to cache trashing but then 
that's not what latency sensitive applications need. These can then be 
encouraged to be included within the applications themselves, making this a 
more long term change. 'Firefox' could set itself 2, 'Amarok' and 'mplayer' 3, 
and 'make' - bless its soul - 0, and so on. Keeping the range simple and 
defined will make it easy for userspace developers to cope with, and users to 
fiddle with.

But that would only be the first step. The second step is to take the plunge 
and accept that we DO want selective unfairness on the desktop, but where WE 
want it, not where the kernel thinks we might want it. It's not an exploit if 
my full screen HD video continues to consume 80% of the CPU while make is 
running - on a desktop. Take a leaf out of other desktop OSs and allow the 
user to choose say levels 0, 1, or 2 for desktop interactivity with a simple 
/proc/sys/kernel/interactive tunable, a bit like the "optimise for foreground 
applications" seen elsewhere. This could then be used to decide whether to use 
the scheduling hints from latnice to either just ensure low latency but keep 
the same CPU usage  - 0, or actually give progressively more CPU for latniced 
tasks as the interactive tunable is increased. Then distros can set this on 
installation and make it part of the many funky GUIs to choose between the 
different levels. This then takes the user out of the picture almost entirely, 
yet gives them the power to change it if they so desire.

The actual scheduler changes required to implement this are absurdly simple 
and doable now, and will not cost in overhead the way cgroups do. It also 
should cause no regressions when interactive mode is disabled and would have 
no effect till changes are made elsewhere, or the users use the latnice 
utility.

Move away from the fragile heuristic tweaks and find a longer term robust 
solution.

Regards,
Con

-- 
-ck

P.S. I'm very happy for someone else to do it. Alternatively you could include 
BFS and I'd code it up for that in my spare time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/