linux-kernel - Re: [patch] sched: don't use nutty scale_rt

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1393497512.8200.37.camel@marge.simpson.net>
Date:	Thu, 27 Feb 2014 11:38:32 +0100
From:	Mike Galbraith <bitbucket@...ine.de>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [patch] sched: don't use nutty scale_rt_power() output

On Thu, 2014-02-27 at 10:40 +0100, Peter Zijlstra wrote: 
> On Mon, Feb 24, 2014 at 09:06:51AM +0100, Mike Galbraith wrote:
> > Hi Peter,
> > 
> > I wonder if the below makes sense for mainline.
> > 
> > Background: I received some rather surprising news recently, a user of
> > old 2.6.32 kernels regularly receive log spam stemming from old 208 day
> > era warnings/protections inserted to prevent explosions from what was at
> > the time unknown bad juju happening (but don't report logs that look
> > like graffiti artist with an unlimited supply of spray paint gone mad).
> > 
> > The kernel that emitted the below does NOT contain..
> > 9993bc63 sched/x86: Fix overflow in cyc2ns_offset
> > ..though these folks use kexec fwtw.  They're one of those "You update
> > your kernel IFF world stops spinning" users, so will likely not be
> > terribly interested in me making their boxen say BUG(), and may even be
> > doing something naughty that induces it for all I know.
> > 
> > In any case, NOT using nutty output from the intentionally racy function
> > seems like a good plan no matter who or what makes weird unreproducible
> > (elsewhere) sh*t happen.  Wedging a bent 64 bit peg into 32 bit hole
> > could make boom, on top of doing funny things to balancing. 
> > 
> > sched: don't use nutty scale_rt_power() output
> > 
> > Boxen instructed to gripe if they see nutty cpu_power catch us
> > trashing it while seriously dazed and confused for an unknown reason.
> > 
> > Dec 18 05:50:56 kernel: [40091179.401405] update_group_power: cpu_power = 3148183471
> > Dec 18 05:51:01 /usr/sbin/cron[2279]: (root) CMD (/opt/blah/fix_cdr_bin.job >> /opt/blah/fix_cdr_bin.out 2>&1)
> > Dec 18 05:51:06 kernel: [40091189.455713] update_cpu_power: cpu_power = 19495027282; scale_rt = 19495027282
> > Dec 18 05:51:16 kernel: [22076800.665578] update_cpu_power: cpu_power = 2671067611; scale_rt = 18428729677871137243
> > Dec 18 05:51:16 kernel: [40091199.188773] update_cpu_power: cpu_power = 2675064501; scale_rt = 18428729677875134133
> > 
> > Don't do that, make a scary warning instead.
> > 
> 
> Yeah, I'm in two minds about that. Crappy clocks can make a whole lot of
> missery. Then again, we usually guard against them going backwards.
> 
> How about something like so? Most other sites don't complain about
> clocks going backwards either, they just deal with it.

Yeah, better to warp protect scale_rt_power() directly.

This small set of identical weird ass boxen should be reliable tsc.
They jump back and forth in time by _exactly 208 days_, and do that
straight from boot, and randomly thereafter.  Wish I could get my hands
on one of the things, but that ain't gonna happen.

Those boxen have long uptimes, which proves you can survive with a sched
clock that's going completely bonkers, which is kinda surprising to me.
On a busy box, I'd expect some poor victim to eat the mother of all
latency hits.

> ---
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5564,6 +5564,7 @@ static unsigned long scale_rt_power(int
>  {
>  	struct rq *rq = cpu_rq(cpu);
>  	u64 total, available, age_stamp, avg;
> +	s64 delta;
>  
>  	/*
>  	 * Since we're reading these variables without serialization make sure
> @@ -5572,7 +5573,11 @@ static unsigned long scale_rt_power(int
>  	age_stamp = ACCESS_ONCE(rq->age_stamp);
>  	avg = ACCESS_ONCE(rq->rt_avg);
>  
> -	total = sched_avg_period() + (rq_clock(rq) - age_stamp);
> +	delta = rq_clock(rq) - age_stamp;
> +	if (unlikely(delta < 0))
> +		delta = 0;
> +
> +	total = sched_avg_period() + delta;
>  
>  	if (unlikely(total < avg)) {
>  		/* Ensures that power won't end up being negative */


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/