[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200122200710.GA3071@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com>
Date: Wed, 22 Jan 2020 20:07:10 +0000
From: Anchal Agarwal <anchalag@...zon.com>
To: "Rafael J. Wysocki" <rafael@...nel.org>
CC: Peter Zijlstra <peterz@...radead.org>,
"Singh, Balbir" <sblbir@...zon.com>,
"Valentin, Eduardo" <eduval@...zon.com>,
"boris.ostrovsky@...cle.com" <boris.ostrovsky@...cle.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"Woodhouse, David" <dwmw@...zon.co.uk>,
"vkuznets@...hat.com" <vkuznets@...hat.com>,
"sstabellini@...nel.org" <sstabellini@...nel.org>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"linux-pm@...r.kernel.org" <linux-pm@...r.kernel.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"pavel@....cz" <pavel@....cz>, "axboe@...nel.dk" <axboe@...nel.dk>,
"x86@...nel.org" <x86@...nel.org>,
"roger.pau@...rix.com" <roger.pau@...rix.com>,
"hpa@...or.com" <hpa@...or.com>,
"rjw@...ysocki.net" <rjw@...ysocki.net>,
"mingo@...hat.com" <mingo@...hat.com>,
"Kamata, Munehisa" <kamatam@...zon.com>,
"bp@...en8.de" <bp@...en8.de>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"konrad.wilk@...cle.co" <konrad.wilk@...cle.com>,
"len.brown@...el.com" <len.brown@...el.com>,
"davem@...emloft.net" <davem@...emloft.net>,
"fllinden@...ozn.com" <fllinden@...zon.com>,
"xen-devel@...ts.xenproject.org" <xen-devel@...ts.xenproject.org>,
<anchalag@...zon.com>
Subject: Re: [RFC PATCH V2 11/11] x86: tsc: avoid system instability in
hibernation
On Tue, Jan 14, 2020 at 07:29:52PM +0000, Anchal Agarwal wrote:
> On Tue, Jan 14, 2020 at 12:30:02AM +0100, Rafael J. Wysocki wrote:
> > On Mon, Jan 13, 2020 at 10:50 PM Rafael J. Wysocki <rafael@...nel.org> wrote:
> > >
> > > On Mon, Jan 13, 2020 at 1:43 PM Peter Zijlstra <peterz@...radead.org> wrote:
> > > >
> > > > On Mon, Jan 13, 2020 at 11:43:18AM +0000, Singh, Balbir wrote:
> > > > > For your original comment, just wanted to clarify the following:
> > > > >
> > > > > 1. After hibernation, the machine can be resumed on a different but compatible
> > > > > host (these are VM images hibernated)
> > > > > 2. This means the clock between host1 and host2 can/will be different
> > > > >
> > > > > In your comments are you making the assumption that the host(s) is/are the
> > > > > same? Just checking the assumptions being made and being on the same page with
> > > > > them.
> > > >
> > > > I would expect this to be the same problem we have as regular suspend,
> > > > after power off the TSC will have been reset, so resume will have to
> > > > somehow bridge that gap. I've no idea if/how it does that.
> > >
> > > In general, this is done by timekeeping_resume() and the only special
> > > thing done for the TSC appears to be the tsc_verify_tsc_adjust(true)
> > > call in tsc_resume().
> >
> > And I forgot about tsc_restore_sched_clock_state() that gets called
> > via restore_processor_state() on x86, before calling
> > timekeeping_resume().
> >
> In this case tsc_verify_tsc_adjust(true) this does nothing as
> feature bit X86_FEATURE_TSC_ADJUST is not available to guest.
> I am no expert in this area, but could this be messing things up?
>
> Thanks,
> Anchal
Gentle nudge on this. I will add more data here in case that helps.
1. Before this patch, tsc is stable but hibernation does not work
100% of the time. I agree if tsc is stable it should not be marked
unstable however, in this case if I run a cpu intensive workload
in the background and trigger reboot-hibernation loop I see a
workqueue lockup.
2. The lockup does not hose the system completely,
the reboot-hibernation carries out and system recovers.
However, as mentioned in the commit message system does
become unreachable for couple of seconds.
3. Xen suspend/resume seems to save/restore time_memory area in its
xen_arch_pre_suspend and xen_arch_post_suspend. The xen clock value
is saved. xen_sched_clock_offset is set at resume time to ensure a
monotonic clock value
4. Also, the instances do not have InvariantTSC exposed. Feature bit
X86_FEATURE_TSC_ADJUST is not available to guest and xen clocksource
is used by guests.
I am not sure if something needs to be fixed on hibernate path itself
or its very much ties to time handling on xen guest hibernation
Here is a part of log from last hibernation exit to next hibernation
entry. The loop was running for a while so boot to lockup log will be
huge. I am specifically including the timestamps.
...
01h 57m 15.627s( 16ms): [ 5.822701] OOM killer enabled.
01h 57m 15.627s( 0ms): [ 5.824981] Restarting tasks ... done.
01h 57m 15.627s( 0ms): [ 5.836397] PM: hibernation exit
01h 57m 17.636s(2009ms): [ 7.844471] PM: hibernation entry
01h 57m 52.725s(35089ms): [ 42.934542] BUG: workqueue lockup - pool cpus=0
node=0 flags=0x0 nice=0 stuck for 37s!
01h 57m 52.730s( 5ms): [ 42.941468] Showing busy workqueues and worker
pools:
01h 57m 52.734s( 4ms): [ 42.945088] workqueue events: flags=0x0
01h 57m 52.737s( 3ms): [ 42.948385] pwq 0: cpus=0 node=0 flags=0x0 nice=0
active=2/256
01h 57m 52.742s( 5ms): [ 42.952838] pending: vmstat_shepherd,
check_corruption
01h 57m 52.746s( 4ms): [ 42.956927] workqueue events_power_efficient:
flags=0x80
01h 57m 52.749s( 3ms): [ 42.960731] pwq 0: cpus=0 node=0 flags=0x0 nice=0
active=4/256
01h 57m 52.754s( 5ms): [ 42.964835] pending: neigh_periodic_work,
do_cache_clean [sunrpc], neigh_periodic_work, check_lifetime
01h 57m 52.781s( 27ms): [ 42.971419] workqueue mm_percpu_wq: flags=0x8
01h 57m 52.781s( 0ms): [ 42.974628] pwq 0: cpus=0 node=0 flags=0x0 nice=0
active=1/256
01h 57m 52.781s( 0ms): [ 42.978901] pending: vmstat_update
01h 57m 52.781s( 0ms): [ 42.981822] workqueue ipv6_addrconf: flags=0x40008
01h 57m 52.781s( 0ms): [ 42.985524] pwq 0: cpus=0 node=0 flags=0x0 nice=0
active=1/1
01h 57m 52.781s( 0ms): [ 42.989670] pending: addrconf_verify_work [ipv6]
01h 57m 52.782s( 1ms): [ 42.993282] workqueue xfs-conv/xvda1: flags=0xc
01h 57m 52.786s( 4ms): [ 42.996708] pwq 0: cpus=0 node=0 flags=0x0 nice=0
active=3/256
01h 57m 52.790s( 4ms): [ 43.000954] pending: xfs_end_io [xfs],
xfs_end_io [xfs], xfs_end_io [xfs]
01h 57m 52.795s( 5ms): [ 43.005610] workqueue xfs-reclaim/xvda1: flags=0xc
01h 57m 52.798s( 3ms): [ 43.008945] pwq 0: cpus=0 node=0 flags=0x0 nice=0
active=1/256
01h 57m 52.802s( 4ms): [ 43.012675] pending: xfs_reclaim_worker [xfs]
01h 57m 52.805s( 3ms): [ 43.015741] workqueue xfs-sync/xvda1: flags=0x4
01h 57m 52.808s( 3ms): [ 43.018723] pwq 0: cpus=0 node=0 flags=0x0 nice=0
active=1/256
01h 57m 52.811s( 3ms): [ 43.022436] pending: xfs_log_worker [xfs]
01h 57m 52.814s( 3ms): [ 43.043519] Filesystems sync: 35.234 seconds
01h 57m 52.837s( 23ms): [ 43.048133] Freezing user space processes ...
(elapsed 0.001 seconds) done.
01h 57m 52.844s( 7ms): [ 43.055996] OOM killer disabled.
01h 57m 53.838s( 994ms): [ 43.061512] PM: Preallocating image memory... done
(allocated 385859 pages)
01h 57m 53.843s( 5ms): [ 44.054720] PM: Allocated 1543436 kbytes in 1.06
seconds (1456.07 MB/s)
01h 57m 53.861s( 18ms): [ 44.060885] Freezing remaining freezable tasks ...
(elapsed 0.001 seconds) done.
01h 57m 53.861s( 0ms): [ 44.069715] printk: Suspending console(s) (use
no_console_suspend to debug)
01h 57m 56.278s(2417ms): [ 44.116601] Disabling non-boot CPUs ...
.....
hibernate-resume loop continues after this. As mentioned above, I loose
connectivity for a while.
Thanks,
Anchal
Powered by blists - more mailing lists