linux-kernel - Re: [patch 2/3] pvclock: detect watchdog reset at pvclock read

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20131009135519.GF227855@redhat.com>
Date:	Wed, 9 Oct 2013 09:55:19 -0400
From:	Don Zickus <dzickus@...hat.com>
To:	Marcelo Tosatti <mtosatti@...hat.com>
Cc:	kvm@...r.kernel.org, pbonzini@...hat.com, gleb@...hat.com,
	linux-kernel@...r.kernel.org
Subject: Re: [patch 2/3] pvclock: detect watchdog reset at pvclock read

On Tue, Oct 08, 2013 at 07:08:11PM -0300, Marcelo Tosatti wrote:
> On Tue, Oct 08, 2013 at 09:37:05AM -0400, Don Zickus wrote:
> > On Mon, Oct 07, 2013 at 10:05:17PM -0300, Marcelo Tosatti wrote:
> > > Implement reset of kernel watchdogs at pvclock read time. This avoids
> > > adding special code to every watchdog.
> > > 
> > > This is possible for watchdogs which measure time based on sched_clock() or 
> > > ktime_get() variants.
> > > 
> > > Suggested by Don Zickus.
> > > 
> > > Signed-off-by: Marcelo Tosatti <mtosatti@...hat.com>
> > 
> > Awesome.  Thanks for figuring this out Marcelo.  Does that mean we can
> > revert commit 5d1c0f4a now? :-)
> 
> Unfortunately no: soft lockup watchdog does not measure time based on
> sched_clock but on hrtimer interrupt count :-(

I believe it does.  See __touch_watchdog() which calls get_timestamp() -->
local_clock().  That is how it calculates the duration of the softlockup.

Now with your patch, it just sets the timestamp to zero with
touch_softlockup_watchdog_sync(), which is fine.  It will just sync up the
clock, set a new timestamp, and check again in the next hrtimer interrupt.

So I guess I am confused what that commit does compared to this patch.

> (see the the softlockup code in question, perhaps you can point to 
> something that i'm missing).
> 
> BTW, are you OK with printing additional steal time information?
> https://lkml.org/lkml/2013/6/27/755

Well, I thought this patch was supposed to replace that patch?  Why do you
still need that patch?


Perhaps my confusion is centered around which softlockups are the problem
the VM's or the host's.

>From the host perspective, I didn't think you would have any problem
because the VM is just another process that runs in its time slice.

>From the VM perspective, the whole overcommit/'wait a couple of minutes to
run again', could easily cause lockups.  But I thought this patch set
detected that and touched the watchdogs early enough that when the next
iteration of the hrtimer came through, it would _not_ cause a softlockup
(it would delay it an hrtimer cycle).



So, if I am misunderstanding the problems (which I probably am :-) ), I
could use a pointer or a quick explaination to remind what the issues are
again and why you think the other patches are still necessary. :-)

Cheers,
Don
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/