linux-kernel - Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.20.1610282049500.5053@nanos>
Date:   Fri, 28 Oct 2016 20:58:41 +0200 (CEST)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Ville Syrjälä <ville.syrjala@...ux.intel.com>
cc:     Feng Tang <feng.79.tang@...il.com>, feng.tang@...el.com,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        linux-arch@...r.kernel.org, Rik van Riel <riel@...hat.com>,
        "Srivatsa S. Bhat" <srivatsa@....edu>,
        Peter Zijlstra <peterz@...radead.org>,
        Arjan van de Ven <arjan@...ux.intel.com>,
        Rusty Russell <rusty@...tcorp.com.au>,
        Oleg Nesterov <oleg@...hat.com>, Tejun Heo <tj@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Paul McKenney <paulmck@...ux.vnet.ibm.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Paul Turner <pjt@...gle.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        "Zhang, Rui" <rui.zhang@...el.com>,
        Len Brown <len.brown@...el.com>,
        Linux PM <linux-pm@...r.kernel.org>,
        Linux ACPI <linux-acpi@...r.kernel.org>
Subject: Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online
 calls to hotplugged cpu")]

On Fri, 28 Oct 2016, Ville Syrjälä wrote:
> On Thu, Oct 27, 2016 at 10:41:18PM +0200, Thomas Gleixner wrote:
> > On Thu, 27 Oct 2016, Ville Syrjälä wrote:
> > > On Thu, Oct 27, 2016 at 09:25:05PM +0200, Thomas Gleixner wrote:
> > > > So it would be interesting whether that hunk in resume_broadcast() is
> > > > sufficient.
> > > 
> > > So far it looks like the answer is yes.
> > > 
> > > Looks to be about 5 seconds slower than acpi-idle in resuming, but
> > > I suppose that's not all that surprising ;)
> > 
> > Well, set it to 1msec then. If that works reliably then we really can do
> > that unconditionally. There is no harm in firing a useless timer during
> > resume once.
> 
> I narrowed down the required timeout, and looks like 25ms is the
> minimum that works. With 24ms I already started to have failures. So
> maybe just bump it up by an order of magnitude to 250ms for some
> safety margin?

Sure, but what puzzles me is that we need a timeout that big. What happens
between broadcast_resume() and broadcast_resume() + 25ms?

IOW, what is the event/resume function which we need to bridge. We should
really try to track than down.

You might try to enable function tracing and do a tracing_off() when that
25ms timeout fires.

Something like 

	stop_trace = true;

in broadcast_resume() and then in the broadcast timer function:

	if (stop_trace) {
		stop_trace = false;
		tracing_off();
	}

Then when the machine is up read the trace, compress and upload it
somewhere or send it in private mail if it's not that big.

Thanks,

	tglx