linux-kernel - Re: 2.6.32.21 - uptime related crashes?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1311278009.2945.219.camel@work-vm>
Date:	Thu, 21 Jul 2011 12:53:29 -0700
From:	john stultz <johnstul@...ibm.com>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	Peter Zijlstra <peterz@...radead.org>, Willy Tarreau <w@....eu>,
	"MINOURA Makoto / ?$BL'1: ?$B??" <minoura@...inux.co.jp>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Faidon Liambotis <paravoid@...ian.org>,
	linux-kernel@...r.kernel.org, stable@...nel.org,
	Nikola Ciprich <nikola.ciprich@...uxbox.cz>,
	seto.hidetoshi@...fujitsu.com,
	Hervé Commowick <hcommowick@...sec.fr>,
	Rand@...per.es
Subject: Re: 2.6.32.21 - uptime related crashes?

On Thu, 2011-07-21 at 09:22 +0200, Ingo Molnar wrote:
> * john stultz <johnstul@...ibm.com> wrote:
> 
> > On Fri, 2011-07-15 at 12:01 +0200, Peter Zijlstra wrote:
> > > On Thu, 2011-07-14 at 17:35 -0700, john stultz wrote:
> > > > 
> > > > Peter/Ingo: Can you take a look at the above and let me know if you find
> > > > it too disagreeable?
> > > 
> > > +static unsigned long long __cycles_2_ns(unsigned long long cyc)
> > > +{
> > > +       unsigned long long ns = 0;
> > > +       struct x86_sched_clock_data *data;
> > > +       int cpu = smp_processor_id();
> > > +
> > > +       rcu_read_lock();
> > > +       data = rcu_dereference(per_cpu(cpu_sched_clock_data, cpu));
> > > +
> > > +       if (unlikely(!data))
> > > +               goto out;
> > > +
> > > +       ns = ((cyc - data->base_cycles) * data->mult) >> CYC2NS_SCALE_FACTOR;
> > > +       ns += data->accumulated_ns;
> > > +out:
> > > +       rcu_read_unlock();
> > > +       return ns;
> > > +}
> > > 
> > > The way I read that we're still not wrapping properly if freq scaling
> > > 'never' happens.
> > 
> > Right, this doesn't address the mult overflow behavior. As I mentioned
> > in the patch that the rework allows for solving that in the future using
> > a (possibly very rare) timer that would accumulate cycles to ns.
> > 
> > This rework just really addresses the multiplication overflow->negative
> > roll under that currently occurs with the cyc2ns_offset value.
> > 
> > > Because then we're wrapping on accumulated_ns + 2^54.
> > > 
> > > Something like resetting base, and adding ns to accumulated_ns and
> > > returning the latter would make more sense.
> > 
> > Although we have to update the base_cycles and accumulated_ns
> > atomically, so its probably not something to do in the sched_clock path.
> 
> Ping, what's going on with this bug? Systems are crashing so we need 
> a quick fix ASAP ...

I think Peter's patch disabling sched_clock_stable is a good approach
for now.

And just to clarify a bit here, while there was a related scheduler
division-by-zero issue which to my understanding has already been fixed
post-2.6.32.21, I have not actually seen any other crash logs connected
to the overflow.

There have been posted softlockup watchdog false-positive messages
(which I have also reproduced), but I've not seen any details on actual
crashes or have I been able to reproduce them using my forced-overflow
patch.

This isn't to say that the overflow isn't causing crashes, but that the
reports have not been clear that there have been crashes by something
other then the div-bv-zero issue.

thanks
-john


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/