[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGETcx_vABsh8HgMi1rYRWmB5RhYwqGT6kKJ+9LX0HrcP8i7yA@mail.gmail.com>
Date: Tue, 19 Nov 2024 18:28:00 -0800
From: Saravana Kannan <saravanak@...gle.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Thomas Gleixner <tglx@...utronix.de>, kernel-team@...roid.com,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v1] cpu/suspend: Do a partial hotplug during suspend
On Tue, Nov 19, 2024 at 1:28 AM Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Mon, Nov 18, 2024 at 06:05:15PM -0800, Saravana Kannan wrote:
> > The hotplug state machine goes through 100+ states when transitioning
> > from online to offline. And on the way back, it goes through these
> > states in reverse.
> >
> > When a CPU goes offline, some of the states that occur after a CPU is
> > powered off are about freeing up various per-CPU resources like
> > kmalloc caches, pages, network buffers, etc. All of these states make
> > sense when a CPU is permanently hotplugged off.
> >
> > However, when offlining a CPU during suspend, we just want to power
> > down the CPUs to that the system can enter suspend. In this scenario,
> > we could simply stop the hotplug state machine right after the CPU has
> > been power off. During resume, we can simply resume the CPU to an
> > online state from the state where we paused the offline.
> >
> > This save both time both during suspend and resume and it is
> > proportional to the number of CPUs in the system. So, if systems with
> > a large number of CPUs, we can expect this to have a huge amount of
> > time saved.
> >
> > On a Pixel 6, averaging across 100+ suspend/resumes cycles, the total
> > time to power off 7 of the 8 CPUs goes from 51 ms down to 24 ms.
> > Similarly, the average time to power off each individual CPU (they are
> > different) also goes down by 50%.
> >
> > The average time spent powering up CPUs goes down from 34 ms to 32 ms.
> > Keep in mind that the time saved during resume is not easily
> > quantified by looking at CPU onlining times. This is because the
> > actual time savings comes later when per-CPU resources do not need to
> > be reallocated and would speed up actions like allocations, etc that
> > can pick up memory from per-CPU kmalloc caches, etc.
> >
> > Signed-off-by: Saravana Kannan <saravanak@...gle.com>
> > ---
> >
> > Hi Thomas/Peter,
> >
> > The hotplug state machine rewrite is great! Enables all kinds of
> > optimizations for suspend/resume.
> >
> > About this patch, I'm not sure if the exact state the hotplug state is
> > paused at (CPUHP_WORKQUEUE_PREP) will work for all arch/boards, but
> > this is the general idea.
> >
> > If it works as is, great! At a glance, it looks like it should work
> > though. None of the other stages between this and CPUHP_OFFLINE seem
> > to be touching hardware.
> >
> > If CPUHP_WORKQUEUE_PREP doesn't work, then we can make it a config
> > option to select the state or an arch call or something along those
> > lines.
> >
> > What are your thoughts on this? How would you like me to proceed?
>
> Well, if we push this one step further, why do we need hotplug at all?
> Can't we just keep them up and idle?
>
> That is, if we look at suspend_enter(), you'll note that
> PM_SUSPEND_TO_IDLE happens before the whole disable_secondary_cpus()
> thing.
>
> So million-dollar question, can this pixel thing do suspend to idle?
Unfortunately not. You saw my rant about firmware and s2idle bugs at
LPC. But yes, I'm going my part towards pushing for s2idle over s2ram.
And even if this Pixel could do it, there are a lot of devices in use
today that will never get a firmware update to enable s2idle. So, why
have all of them waste time and energy doing useless steps during
suspend?
> Traditionally hybernate is the whole save-to-disk and power machine off
> thing, and then there was suspend (to RAM) which was some dodgy as heck
> BIOS thing (on x86) which required all non-boot CPUs to be 'dead'.
My change would also help with the time it takes to power off the CPUs
during hibernate :) If it'll work (otherwise, we can make sure this
applies only to suspend).
> But does your (aaargh64) platform actually require us to take out the
> non-boot CPUs, or is this just histerical raisins?
Lol, I had to google histerical raisins to understand what it meant. I
might start using this :)
I'm pretty sure we need to call into the firmware to power off the CPU
so it can do all the housekeeping before powering down the caches.
Thanks,
Saravana
Powered by blists - more mailing lists