[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1393497512.8200.37.camel@marge.simpson.net>
Date: Thu, 27 Feb 2014 11:38:32 +0100
From: Mike Galbraith <bitbucket@...ine.de>
To: Peter Zijlstra <peterz@...radead.org>
Cc: LKML <linux-kernel@...r.kernel.org>
Subject: Re: [patch] sched: don't use nutty scale_rt_power() output
On Thu, 2014-02-27 at 10:40 +0100, Peter Zijlstra wrote:
> On Mon, Feb 24, 2014 at 09:06:51AM +0100, Mike Galbraith wrote:
> > Hi Peter,
> >
> > I wonder if the below makes sense for mainline.
> >
> > Background: I received some rather surprising news recently, a user of
> > old 2.6.32 kernels regularly receive log spam stemming from old 208 day
> > era warnings/protections inserted to prevent explosions from what was at
> > the time unknown bad juju happening (but don't report logs that look
> > like graffiti artist with an unlimited supply of spray paint gone mad).
> >
> > The kernel that emitted the below does NOT contain..
> > 9993bc63 sched/x86: Fix overflow in cyc2ns_offset
> > ..though these folks use kexec fwtw. They're one of those "You update
> > your kernel IFF world stops spinning" users, so will likely not be
> > terribly interested in me making their boxen say BUG(), and may even be
> > doing something naughty that induces it for all I know.
> >
> > In any case, NOT using nutty output from the intentionally racy function
> > seems like a good plan no matter who or what makes weird unreproducible
> > (elsewhere) sh*t happen. Wedging a bent 64 bit peg into 32 bit hole
> > could make boom, on top of doing funny things to balancing.
> >
> > sched: don't use nutty scale_rt_power() output
> >
> > Boxen instructed to gripe if they see nutty cpu_power catch us
> > trashing it while seriously dazed and confused for an unknown reason.
> >
> > Dec 18 05:50:56 kernel: [40091179.401405] update_group_power: cpu_power = 3148183471
> > Dec 18 05:51:01 /usr/sbin/cron[2279]: (root) CMD (/opt/blah/fix_cdr_bin.job >> /opt/blah/fix_cdr_bin.out 2>&1)
> > Dec 18 05:51:06 kernel: [40091189.455713] update_cpu_power: cpu_power = 19495027282; scale_rt = 19495027282
> > Dec 18 05:51:16 kernel: [22076800.665578] update_cpu_power: cpu_power = 2671067611; scale_rt = 18428729677871137243
> > Dec 18 05:51:16 kernel: [40091199.188773] update_cpu_power: cpu_power = 2675064501; scale_rt = 18428729677875134133
> >
> > Don't do that, make a scary warning instead.
> >
>
> Yeah, I'm in two minds about that. Crappy clocks can make a whole lot of
> missery. Then again, we usually guard against them going backwards.
>
> How about something like so? Most other sites don't complain about
> clocks going backwards either, they just deal with it.
Yeah, better to warp protect scale_rt_power() directly.
This small set of identical weird ass boxen should be reliable tsc.
They jump back and forth in time by _exactly 208 days_, and do that
straight from boot, and randomly thereafter. Wish I could get my hands
on one of the things, but that ain't gonna happen.
Those boxen have long uptimes, which proves you can survive with a sched
clock that's going completely bonkers, which is kinda surprising to me.
On a busy box, I'd expect some poor victim to eat the mother of all
latency hits.
> ---
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5564,6 +5564,7 @@ static unsigned long scale_rt_power(int
> {
> struct rq *rq = cpu_rq(cpu);
> u64 total, available, age_stamp, avg;
> + s64 delta;
>
> /*
> * Since we're reading these variables without serialization make sure
> @@ -5572,7 +5573,11 @@ static unsigned long scale_rt_power(int
> age_stamp = ACCESS_ONCE(rq->age_stamp);
> avg = ACCESS_ONCE(rq->rt_avg);
>
> - total = sched_avg_period() + (rq_clock(rq) - age_stamp);
> + delta = rq_clock(rq) - age_stamp;
> + if (unlikely(delta < 0))
> + delta = 0;
> +
> + total = sched_avg_period() + delta;
>
> if (unlikely(total < avg)) {
> /* Ensures that power won't end up being negative */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists