lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20161203003309.GL10089@umbus.fritz.box>
Date:   Sat, 3 Dec 2016 11:33:09 +1100
From:   David Gibson <david@...son.dropbear.id.au>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     John Stultz <john.stultz@...aro.org>,
        lkml <linux-kernel@...r.kernel.org>,
        Liav Rehana <liavr@...lanox.com>,
        Chris Metcalf <cmetcalf@...lanox.com>,
        Richard Cochran <richardcochran@...il.com>,
        Ingo Molnar <mingo@...nel.org>,
        Prarit Bhargava <prarit@...hat.com>,
        Laurent Vivier <lvivier@...hat.com>,
        "Christopher S . Hall" <christopher.s.hall@...el.com>,
        "4.6+" <stable@...r.kernel.org>,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH] timekeeping: Change type of nsec variable to unsigned in
 its calculation.

On Fri, Dec 02, 2016 at 09:36:42AM +0100, Thomas Gleixner wrote:
> On Fri, 2 Dec 2016, David Gibson wrote:
> > On Thu, Dec 01, 2016 at 12:59:51PM +0100, Thomas Gleixner wrote:
> > > So I assume that you are talking about a VM which was not scheduled by the
> > > host due to overcommitment (who ever thought that this is a good idea) or
> > > whatever other reason (yes, people were complaining about wreckage caused
> > > by stopping kernels with debuggers) for a long enough time to trigger that
> > > overflow situation. If that's the case then the unsigned conversion will
> > > just make it more unlikely but it still will happen.
> > 
> > It was essentially the stopped by debugger case.  I forget exactly
> > why, but the guest was being explicitly stopped from outside, it
> > wasn't just scheduling lag.  I think it was something in the vicinity
> > of 10 minutes stopped.
> 
> Ok. Debuggers stopping stuff is one issue, but if I understood Liav
> correctly, then he is seing the issue on a heavy loaded machine.

Right.  I can't speak to other situations which might trigger this.

> Liav, can you please describe the scenario in detail? Are you observing
> this on bare metal or in a VM which gets scheduled out long enough or was
> there debugging/hypervisor intervention involved?
> 
> > It's long enough ago that I can't be sure, but I thought we'd tried
> > various different stoppage periods, which should have also triggered
> > the unsigned overflow you're describing, and didn't observe the crash
> > once the change was applied.  Note that there have been other changes
> > to the timekeeping code since then, which might have made a
> > difference.
> > 
> > I agree that it's not reasonable for the guest to be entirely
> > unaffected by such a large stoppage: I'd have no complaints if the
> > guest time was messed up, and/or it spewed warnings.  But complete
> > guest death seems a rather more fragile response to the situation than
> > we'd like.
> 
> Guests death? Is it really dead/crashed or just stuck in that endless loop
> trying to add that huge negative value piecewise?

Well, I don't know.  But the point was it was unusable from the
console, and didn't come back any time soon.

> That's at least what Liav was describing as he mentioned
> __iter_div_u64_rem() explicitely.
> 
> While I'm less worried about debuggers, I worry about the real thing.
> 
> I agree that we should not starve after resume from a debug stop, but in
> that case the least of my worries is time going backwards.
> 
> Though if the signed mult overrun is observable in a live system, then we
> need to worry about time going backwards even with the unsigned
> conversion. Simply because once we fixed the starvation issue people with
> insane enough setups will trigger the unsigned overrun and complain about
> time going backwards.
> 
> Thanks,
> 
> 	tglx
> 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

Download attachment "signature.asc" of type "application/pgp-signature" (820 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ