linux-kernel - Re: [RFC PATCH v1] cpu/suspend: Do a partial hotplug during suspend

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAGETcx_wv_sC+FhChr8OaV6wjkHxTf9W66AoBQihV=m70G=_iQ@mail.gmail.com>
Date: Wed, 20 Nov 2024 13:02:11 -0800
From: Saravana Kannan <saravanak@...gle.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Thomas Gleixner <tglx@...utronix.de>, kernel-team@...roid.com, 
	linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v1] cpu/suspend: Do a partial hotplug during suspend

On Wed, Nov 20, 2024 at 12:42 AM Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Tue, Nov 19, 2024 at 06:28:00PM -0800, Saravana Kannan wrote:
> > On Tue, Nov 19, 2024 at 1:28 AM Peter Zijlstra <peterz@...radead.org> wrote:
>
> > > Well, if we push this one step further, why do we need hotplug at all?
> > > Can't we just keep them up and idle?
> > >
> > > That is, if we look at suspend_enter(), you'll note that
> > > PM_SUSPEND_TO_IDLE happens before the whole disable_secondary_cpus()
> > > thing.
> > >
> > > So million-dollar question, can this pixel thing do suspend to idle?
> >
> > Unfortunately not. You saw my rant about firmware and s2idle bugs at
> > LPC. But yes, I'm going my part towards pushing for s2idle over s2ram.
>
> Right, so with Google doing their own chips, I think you stand a fair
> chance of making it work 'soon', right? :-)

I can neither confirm or deny :)

>
> > And even if this Pixel could do it, there are a lot of devices in use
> > today that will never get a firmware update to enable s2idle. So, why
> > have all of them waste time and energy doing useless steps during
> > suspend?
>
> Right, so if we really want to go do this, we should add place-holder
> state for suspend, something like CPUHP_SUSPEND and document the
> requirements and audit all existing states now skipped to meet
> requirements.

Yup! This is exactly what I had in mind. But didn't want to go full
out on the patch before I got some sanity check here.

>
> I think it should go somewhere right between CPUHP_BP_PREPARE_DYN_END
> and CPUHP_BRINGUP_CPU.

I was thinking before CPUHP_BP_PREPARE_DYN because I saw some drivers
doing whatever the heck they do in CPUHP_BP_PREPARE_DYN. It'll be much
easier to do audits of non-dynamic stuff and keep it within
requirements.

> WORKQUEUE_PREP seems awefully random, and the
> typical purpose of the _PREPARE stages is to allocate memory/resources
> such that STARTING can do its thing, similarly _DEAD is about freeing
> resources that got unused during _DYING.

Yeah, I understood all this. I wanted to pick CPUHP_TMIGR_PREPARE
(mentioned in my first email) because it was right before
CPUHP_BP_PREPARE_DYN (and if you skip over CPUHP_MIPS_SOC_PREPARE
which sounds like a hardware step). But hrtimers seem to have a bug --
if the sequence fails anywhere in between CPUHP_AP_HRTIMERS_DYING and
CPUHP_HRTIMERS_PREPARE things fail badly.

So, for now I'd say we get in something like CPUHP_SUSPEND wherever it
works right now (after WORKQUEUE_PREP) and slowly move it up till we
get it right before CPUHP_BP_PREPARE_DYN.

> So the most logical setup would be to skip the entire _DEAD/_PREPARE
> cycle.

Makes sense to me.

On a separate note, I'm kinda confused by state machine stages where
only one of the startup/teardown callbacks are set up. For example,
I'd think the workqueue_prepare_cpu() would be combined with
workqueue_online_cpu()/workqueue_offline_cpu(). Why is online() not
sufficient to undo whatever offline() did?

>
> > > Traditionally hybernate is the whole save-to-disk and power machine off
> > > thing, and then there was suspend (to RAM) which was some dodgy as heck
> > > BIOS thing (on x86) which required all non-boot CPUs to be 'dead'.
> >
> > My change would also help with the time it takes to power off the CPUs
> > during hibernate :) If it'll work (otherwise, we can make sure this
> > applies only to suspend).
>
> So I'm not sure you can actually skip this during hibernate. The thing
> is, we load the image from the boot CPU in a state where the secondary
> CPUs have never yet been loaded up. It might be possible to skip the
> DEAD/PREPARE cycle, but it would also mean the image must contain the
> full PREPARE resources. So if it all works, then the result is a larger
> image, for a slightly faster cycle, but since hibernate is already super
> slow, I don't think this trade-off is worth it.

Ok, makes sense. We'll have to make some changes so that this doesn't
run for hibernate (it will as this patch is written).

-Saravana