linux-kernel - Re: Stolen and degraded time and schedulers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <45F74515.7010808@vmware.com>
Date:	Tue, 13 Mar 2007 17:43:01 -0700
From:	Dan Hecht <dhecht@...are.com>
To:	Jeremy Fitzhardinge <jeremy@...p.org>
Cc:	dwalker@...sta.com, cpufreq@...ts.linux.org.uk,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Con Kolivas <kernel@...ivas.org>,
	Chris Wright <chrisw@...s-sol.org>,
	Virtualization Mailing List <virtualization@...ts.osdl.org>,
	john stultz <johnstul@...ibm.com>, Ingo Molnar <mingo@...e.hu>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: Stolen and degraded time and schedulers

On 03/13/2007 02:59 PM, Jeremy Fitzhardinge wrote:
> Daniel Walker wrote:
>> The frequency tracking you mention is done to some extent inside the
>> timekeeping adjustment functions, but I'm not sure it's totally accurate
>> for non-timekeeping, and it also tracks things like interrupt latency.
>> Tracking frequency changes where it's important to get it right
>> shouldn't be done I think ..
>>
>> If you want accurate time accounting, don't use the TSC .
>>   
> 
> I'm not sure I follow you here.  Clocksources have the means to adjust
> the rate of time progression, mostly to warp the time for things like
> ntp.  The stability or otherwise of the tsc is irrelevant.
> 
> If you had a clocksource which was explicitly using the rate at which a
> CPU does work as a timebase, then using the same warping mechanism would
> allow you to model CPU speed changes.
> 
>> The sched_clock interface is basically a stripped down clocksource..
>> I've implemented sched_clock as a clocksource in the past ..
>>   
> 
> Yes, that works.  But a clocksource is strictly about measuring the
> progression of real time, and so doesn't generally measure how much work
> a CPU has done.
> 
>>> We currently have a sched_clock interface in paravirt_ops to deal with
>>> the hypervisor aspect.  It only occurred to me this morning that cpufreq
>>> presents exactly the same problem to the rest of the kernel, and so
>>> there's room for a more general solution.
>>>     
>> Are there other architecture which have this per-cpu clock frequency
>> changing issue? I worked with several other architectures beyond just
>> x86 and haven't seen this issue ..
> 
> Well, lots of cpus have dynamic frequencies.  Any scheduler which
> maintains history will suffer the same problem, even on UP.  If
> processes A and B are supposed to have the same priority and they both
> execute for 1ms of real time, did they make the same amount of
> progress?  Not if the cpu changed speed in between.
> 
> And any system which commonly runs virtualized (s390, power, etc) will
> need to deal with the notion of stolen time.
> 

With your previous definition of work time, would it be that:

monotonic_time == work_time + stolen_time ??

i.e. would you be defining stolen_time to include the time lost to 
processes due to the cpu running at a lower frequency?  How does this 
play into the other potential users, besides sched_clock(), of stolen 
time?  We should make sure that the abstraction introduced here makes 
sense in those places too.

For example, the stuff that happens in update_process_times().  I think 
we'd want to account the stolen time to cpustat->steal.  Also we'd 
probably want account for stolen time with regards to 
task_running_tick().  (Though, in the latter case, maybe we first have 
to move the scheduler away from assuming HZ rate decrementing of 
p->time_slice to get this right. i.e. remove the tick based assumption 
from the scheduler, and then maybe stolen time falls in more naturally 
when accounting time slices).

I guess taking your cpufreq as an example of work_time progressing 
slower than monotonic_time (and assuming that the remaining time is what 
you would call stolen), then e.g. top would report 50% of your cpu 
stolen when you cpu is running at 1/2 max rate.  And p->time_slice would 
decrement at 1/2 the rate it normally did when running at 1/2 speed.  Is 
this the right thing to do?  If so, then I agree it makes sense to model 
hypervisor stolen time in terms of your "work time".  But, if not, then 
maybe the amount of work you can get done during a period of time that 
is not stolen and the stolen time itself are really two different 
notions, and shouldn't be confused.  I can see arguments both ways.

Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/