linux-kernel - Re: WARNING: Adjusting tsc more then 11%

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120305202845.GD17489@zod.bos.redhat.com>
Date:	Mon, 5 Mar 2012 15:28:46 -0500
From:	Josh Boyer <jwboyer@...hat.com>
To:	John Stultz <john.stultz@...aro.org>
Cc:	Dave Jones <davej@...hat.com>,
	Fedora Kernel Team <kernel-team@...oraproject.org>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: WARNING: Adjusting tsc more then 11%

On Mon, Mar 05, 2012 at 12:24:37PM -0800, John Stultz wrote:
> > > Ok. Well, just to level set: the warning is informative, and points to
> > > unexpected, but not necessarily unsafe behavior.
> > > 
> > > In fact, the risk (where mult is adjusted to be large enough to cause an
> > > overflow) we're warning about have been present 2.6.36 or even possibly
> > > before. The change in 3.2 which added the warning also added a more
> > > conservative mult calculation, so we're less likely to get overflow
> > > prone large mult values.
> > 
> > Is there a reason you decided to use a WARN_ONCE, which dumps a full stack
> > trace, instead of just printk(KERN_ERR ?
> 
> Well, the WARN_ONCE behavior is really nice, since just a printk would
> end up possibly filling the logs, since you might get one every tick.

We have printk_once too.

> > > So it would be great to get further feedback from folks who are seeing
> > > this warning, so we can really hammer this out, but I don't want the
> > > warning spooking anyone into thinking things are terribly broken.
> > 
> > Right... people see backtraces and start thinking "my kernel is broken."
> > 
> > I'm certainly not meaning to pick on you for this.  Lately it seems all
> > the rage to throw WARN_ONs for all kinds of error paths and leave the user
> > to figure out how screwed they are.
> 
> Its a trade-off, since we really do want to know if our code has been
> pushed outside of its expected boundaries (either by unexpected hadware
> behavior or by expectations being raised, like long nohz idle times), so
> we have to get folks attention somewhat. The type of error reporting
> Dave's managed to collect here is really great.

It is, yes.  Do you know, aside from distro kernel maintainers, how many
reports have you gotten from actual users directly?

> But at the same time, I agree there has been a few cases where the code
> is limited more narrowly then the reality of existing hardware, and we
> end up with a constant stream of error messages that get waved off as
> broken hardware.
> 
> There we need to either fix the code or drop the warnings, but I think
> it gets hard when we really want to know about "unexpected behavior,
> except on some wide swath of hardware that always acts poorly", where
> conditionalizing the warnings isn't easy.

Oh my.  Quirks in the timekeeping code would just give me nightmares ;).

josh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/