linux-kernel - Re: CFS review

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.0.999.0708011903300.3582@woody.linux-foundation.org>
Date:	Wed, 1 Aug 2007 19:17:51 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Roman Zippel <zippel@...ux-m68k.org>
cc:	Andi Kleen <andi@...stfloor.org>, Mike Galbraith <efault@....de>,
	Ingo Molnar <mingo@...e.hu>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org
Subject: Re: CFS review

On Wed, 1 Aug 2007, Roman Zippel wrote:
> 
> I'm not so sure about that. sched_clock() has to be fast, so many archs 
> may want to continue to use jiffies. As soon as one does that one can also 
> save a lot of computational overhead by using 32bit instead of 64bit.
> The question is then how easy that is possible.

I have to say, it would be interesting to try to use 32-bit arithmetic.

I also think it's likely a mistake to do a nanosecond resolution. That's 
part of what forces us to 64 bits, and it's just not even an *interesting* 
resolution.

It would be better, I suspect, to make the scheduler clock totally 
distinct from the other clock sources (many architectures have per-cpu 
cycle counters), and *not* try to even necessarily force it to be a 
"time-based" one.

So I think it would be entirely appropriate to

 - do something that *approximates* microseconds.

   Using microseconds instead of nanoseconds would likely allow us to do 
   32-bit arithmetic in more areas, without any real overflow.

   And quite frankly, even on fast CPU's, the scheduler is almost 
   certainly not going to be able to take any advantage of the nanosecond 
   resolution. Just about anything takes a microsecond - including IO. I 
   don't think nanoseconds are worth the ten extra bits they need, if we 
   could do microseconds in 32 bits.

   And the "approximates" thing would be about the fact that we don't 
   actually care about "absolute" microseconds as much as something that 
   is in the "roughly a microsecond" area. So if we say "it doesn't have 
   to be microseconds, but it should be within a factor of two of a ms", 
   we could avoid all the expensive divisions (even if they turn into 
   multiplications with reciprocals), and just let people *shift* the CPU 
   counter instead.

   In fact, we could just say that we don't even care about CPU counters 
   that shift frequency - so what? It gets a bit further off the "ideal 
   microsecond", but the scheduler just cares about _relative_ times 
   between tasks (and that the total latency is within some reasonable 
   value), it doesn't really care about absolute time.

Hmm?

It would still be true that something that is purely based on timer ticks 
will always be liable to have rounding errors that will inevitably mean 
that you don't get good fairness - tuning threads to run less than a timer 
tick at a time would effectively "hide" them from the scheduler 
accounting. However, in the end, I think that's pretty much unavoidable. 
We should make sure that things mostly *work* for that situation, but I 
think it's ok to say that the *quality* of the fairness will obviously 
suffer (and possibly a lot in the extreme cases).

			Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/