linux-kernel - Re: [patch 1/3] sched: init rt_avg stat whenever rq comes online

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1281986708.1926.1877.camel@laptop>
Date:	Mon, 16 Aug 2010 21:25:08 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Suresh Siddha <suresh.b.siddha@...el.com>
Cc:	"mingo@...e.hu" <mingo@...e.hu>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"chris@...stnet.net" <chris@...stnet.net>,
	"debian00@...ceadsl.fr" <debian00@...ceadsl.fr>,
	"hpa@...or.com" <hpa@...or.com>,
	"jonathan.protzenko@...il.com" <jonathan.protzenko@...il.com>,
	"mans@...sr.com" <mans@...sr.com>,
	"psastudio@...l.ru" <psastudio@...l.ru>,
	"rjw@...k.pl" <rjw@...k.pl>,
	"stephan.eicher@....de" <stephan.eicher@....de>,
	"sxxe@....de" <sxxe@....de>,
	"thomas@...hlinux.org" <thomas@...hlinux.org>,
	"venki@...gle.com" <venki@...gle.com>,
	"wonghow@...il.com" <wonghow@...il.com>,
	"stable@...nel.org" <stable@...nel.org>, tglx <tglx@...utronix.de>
Subject: Re: [patch 1/3] sched: init rt_avg stat whenever rq comes online

On Mon, 2010-08-16 at 10:36 -0700, Suresh Siddha wrote:
> On Mon, 2010-08-16 at 00:47 -0700, Peter Zijlstra wrote:
> > On Fri, 2010-08-13 at 12:45 -0700, Suresh Siddha wrote:
> > > plain text document attachment (sched_reset_rt_avg_stat_online.patch)
> > > TSC's get reset after suspend/resume and this leads to a scenario of
> > > rq->clock (sched_clock_cpu()) less than rq->age_stamp. This leads
> > > to a big value returned by scale_rt_power() and the resulting big group
> > > power set by the update_group_power() is causing improper load balancing
> > > between busy and idle cpu's after suspend/resume.
> > 
> > ARGH, so i[357] westmere mobile stops TSC on some power state?
> 
> WSM has working TSC with constant rate across P/C/T-states. This issue
> is about suspend/resume (S-states).

Hurm..

> > Why don't we sync it back to the other CPUs instead?
> 
> All the cpu's entered suspend state and during resume it gets reset for
> all the CPU's.

Bloody lovely..

> > Or does it simply mark TSCs unstable and leaves it at that?
> 
> TSCs are stable and in sync after resume aswell. If we want to do SW
> sync, we need to keep track of the time we spent in the suspend state
> and do a SW sync (during resume) that can potentially disturb the HW
> sync.

Nah, no need to track the time spend in S-states, simply not going
backwards would be enough, save before entering S, restore after coming
out.

You can use something like:

suspend:
 __get_cpu_var(cyc2ns_suspend) = sched_clock();

resume:
 for_each_possible_cpu(i)
   per_cpu(cyc2ns_offset, i) += per_cpu(cyc2ns_suspend);

or something like that to keep sched_clock() stable, which is exactly
what most (all?) its users expect when we report the TSC is usable.

Not sure how to arrange the suspend bit to run on all cpus though, as I
think we offline them all first or something.

> > In any case, this needs to be fixed at the clock level, not like this.
> 
> If we have more such dependencies on TSC then we may need to address the
> issue at clock level aswell. Nevertheless, across cpu online/offline,
> current scheduler code is expecting TSC (sched_clock) to be going
> forward and not sure why we need to carry the rt_avg history across
> online/offline.

We assume sched_clock_cpu() _never_ goes backwards, when
sched_clock_stable, sched_clock_cpu() == sched_clock() (we could, and
probably should, do better on clock continuity when we flip
sched_clock_stable).

We carry rt_avg over suspend much like we carry pretty much all state
over suspend, including load_avg etc.. no reason to special case it at
all.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/