linux-kernel - Re: [RFC PATCH v1] cpu/suspend: Do a partial hotplug during suspend

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <87bjxiawjt.ffs@tglx>
Date: Wed, 11 Dec 2024 19:45:10 +0100
From: Thomas Gleixner <tglx@...utronix.de>
To: Saravana Kannan <saravanak@...gle.com>, Peter Zijlstra
 <peterz@...radead.org>
Cc: kernel-team@...roid.com, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v1] cpu/suspend: Do a partial hotplug during suspend

On Wed, Nov 20 2024 at 13:02, Saravana Kannan wrote:
> On Wed, Nov 20, 2024 at 12:42 AM Peter Zijlstra <peterz@...radead.org> wrote:
> I was thinking before CPUHP_BP_PREPARE_DYN because I saw some drivers
> doing whatever the heck they do in CPUHP_BP_PREPARE_DYN. It'll be much
> easier to do audits of non-dynamic stuff and keep it within
> requirements.
>
>> WORKQUEUE_PREP seems awefully random, and the
>> typical purpose of the _PREPARE stages is to allocate memory/resources
>> such that STARTING can do its thing, similarly _DEAD is about freeing
>> resources that got unused during _DYING.
>
> Yeah, I understood all this. I wanted to pick CPUHP_TMIGR_PREPARE
> (mentioned in my first email) because it was right before
> CPUHP_BP_PREPARE_DYN (and if you skip over CPUHP_MIPS_SOC_PREPARE
> which sounds like a hardware step). But hrtimers seem to have a bug --
> if the sequence fails anywhere in between CPUHP_AP_HRTIMERS_DYING and
> CPUHP_HRTIMERS_PREPARE things fail badly.

Yes, that's known and someone is working on it. Here is the thread:

  https://lore.kernel.org/all/87wmg9oyzk.ffs@tglx

> So, for now I'd say we get in something like CPUHP_SUSPEND wherever it
> works right now (after WORKQUEUE_PREP) and slowly move it up till we
> get it right before CPUHP_BP_PREPARE_DYN.
>
>> So the most logical setup would be to skip the entire _DEAD/_PREPARE
>> cycle.
>
> Makes sense to me.
>
> On a separate note, I'm kinda confused by state machine stages where
> only one of the startup/teardown callbacks are set up. For example,
> I'd think the workqueue_prepare_cpu() would be combined with
> workqueue_online_cpu()/workqueue_offline_cpu(). Why is online() not
> sufficient to undo whatever offline() did?

Some of this is purely historical and was more or less blindly converted
from the original notifier chains.

Other things are one-off initializations to allocate and initialize
memory, which is never freed again under the assumption that the CPUs
will come back online.

But yes, this needs to be looked at on a state by state basis.

Thanks,

        tglx