lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7848ad2f-75fc-2416-8d9e-b0cc7c520107@infradead.org>
Date:   Sun, 5 Sep 2021 19:11:19 -0700
From:   Randy Dunlap <rdunlap@...radead.org>
To:     Thomas Gleixner <tglx@...utronix.de>,
        Dave Chinner <david@...morbit.com>
Cc:     "Darrick J. Wong" <djwong@...nel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Dennis Zhou <dennis@...nel.org>, Tejun Heo <tj@...nel.org>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        linux-xfs <linux-xfs@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Eric Sandeen <sandeen@...deen.net>,
        Christoph Hellwig <hch@....de>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>
Subject: Re: [GIT PULL] xfs: new code for 5.15

On 9/5/21 4:28 PM, Thomas Gleixner wrote:
> Dave,
> 

[snip]

Hi,

Doc. comments below...


> I'm sorry that this change which turned CPU hotplug into a reliable,
> testable and instrumentable mechanism causes so much trouble for you. I
> hope it's just the lack of coherent documentation which made you
> unhappy.
> 
> If the updated documentation does not answer your questions, please let
> me know and please provide a coherent explanation of the problem you are
> trying to solve. Either I can give you an hint or I can identify further
> issues in the documentation.
> 
> If it turns out that there are functional shortcomings then I'm of
> course all ears as well.
> 
> If you need a conveniance API to install multiple states at once to
> regain the "simple API" feeling, please let me know - I surely have some
> ideas.
> 
> Thanks,
> 
>          tglx
> ---
> --- a/Documentation/core-api/cpu_hotplug.rst
> +++ b/Documentation/core-api/cpu_hotplug.rst
> @@ -156,95 +156,479 @@ hotplug states will be invoked, starting
>   * Once all services are migrated, kernel calls an arch specific routine
>     ``__cpu_disable()`` to perform arch specific cleanup.
>   

[snip]

> +
> +The CPU hotplug API
> +===================
> +
> +CPU hotplug state machine
> +-------------------------
> +
> +CPU hotplug uses a trivial state machine with a linear state space from
> +CPUHP_OFFLINE to CPUHP_ONLINE. Each state has a startup and a teardown
> +callback.
> +
> +When a CPU is onlined, the startup callbacks are invoked sequentially until
> +the state CPUHP_ONLINE is reached. They can also be invoked when the
> +callbacks of a state are set up or an instance is added to a multi-instance
> +state.
> +
> +When a CPU is offlined the teardown callbacks are invoked in the reverse
> +order sequenctially until the state CPUHP_OFFLINE is reached. They can also

          sequentially

> +be invoked when the callbacks of a state are removed or an instance is
> +removed from a multi-instance state.
> +
> +If a usage site requires only a callback in one direction of the hotplug
> +operations (CPU online or CPU offline) then the other not required callback

                                                          not-required

> +can be set to NULL when the state is set up.
> +
> +The state space is divided into three sections:
> +
> +* The PREPARE section
> +
> +  The PREPARE section covers the state space from CPUHP_OFFLINE to CPUHP_BRINGUP_CPU

                                                                       CPUHP_BRINGUP_CPU.

> +
> +  The startup callbacks in this section are invoked before the CPU is
> +  started during a CPU online operation. The teardown callbacks are invoked
> +  after the CPU has become dysfunctional during a CPU offline operation.
> +
> +  The callbacks are invoked on a control CPU as they can't obviously run on
> +  the hotplugged CPU which is either not yet started or has become
> +  dysfunctional already.
> +
> +  The startup callbacks are used to setup resources which are required to
> +  bring a CPU successfully online. The teardown callbacks are used to free
> +  resources or to move pending work to an online CPU after the hotplugged
> +  CPU became dysfunctional.
> +
> +  The startup callbacks are allowed to fail. If a callback fails, the CPU
> +  online operation is aborted and the CPU is brought down to the previous
> +  state (usually CPUHP_OFFLINE) again.
> +
> +  The teardown callbacks in this section are not allowed to fail.
> +
> +* The STARTING section
> +
> +  The STARTING section covers the state space between CPUHP_BRINGUP_CPU + 1
> +  and CPUHP_AP_ONLINE

      and CPUHP_AP_ONLINE.

> +
> +  The startup callbacks in this section are invoked on the hotplugged CPU
> +  with interrupts disabled during a CPU online operation in the early CPU
> +  setup code. The teardown callbacks are invoked with interrupts disabled
> +  on the hotplugged CPU during a CPU offline operation shortly before the
> +  CPU is completely shut down.
> +
> +  The callbacks in this section are not allowed to fail.
> +
> +  The callbacks are used for low level hardware initialization/shutdown and
> +  for core subsystems.
> +
> +* The ONLINE section
> +
> +  The ONLINE section covers the state space between CPUHP_AP_ONLINE + 1 and
> +  CPUHP_ONLINE.
> +
> +  The startup callbacks in this section are invoked on the hotplugged CPU
> +  during a CPU online operation. The teardown callbacks are invoked on the
> +  hotplugged CPU during a CPU offline operation.
> +
> +  The callbacks are invoked in the context of the per CPU hotplug thread,
> +  which is pinned on the hotplugged CPU. The callbacks are invoked with
> +  interrupts and preemption enabled.
> +
> +  The callbacks are allowed to fail. When a callback fails the hotplug
> +  operation is aborted and the CPU is brought back to the previous state.
> +
> +CPU online/offline operations
> +-----------------------------
> +
> +A successful online operation looks like this: ::
> +
> +  [CPUHP_OFFLINE]
> +  [CPUHP_OFFLINE + 1]->startup()       -> success
> +  [CPUHP_OFFLINE + 2]->startup()       -> success
> +  [CPUHP_OFFLINE + 3]                  -> skipped because startup == NULL
> +  ...
> +  [CPUHP_BRINGUP_CPU]->startup()       -> success
> +  === End of PREPARE section
> +  [CPUHP_BRINGUP_CPU + 1]->startup()   -> success
> +  ...
> +  [CPUHP_AP_ONLINE]->startup()         -> success
> +  === End of STARTUP section
> +  [CPUHP_AP_ONLINE + 1]->startup()     -> success
> +  ...
> +  [CPUHP_ONLINE - 1]->startup()        -> success
> +  [CPUHP_ONLINE]
> +
> +A successful offline operation looks like this: ::
> +
> +  [CPUHP_ONLINE]
> +  [CPUHP_ONLINE - 1]->teardown()       -> success
> +  ...
> +  [CPUHP_AP_ONLINE + 1]->teardown()    -> success
> +  === Start of STARTUP section
> +  [CPUHP_AP_ONLINE]->teardown()        -> success
> +  ...
> +  [CPUHP_BRINGUP_ONLINE - 1]->teardown()
> +  ...
> +  === Start of PREPARE section
> +  [CPUHP_BRINGUP_CPU]->teardown()
> +  [CPUHP_OFFLINE + 3]->teardown()
> +  [CPUHP_OFFLINE + 2]                  -> skipped because teardown == NULL
> +  [CPUHP_OFFLINE + 1]->teardown()
> +  [CPUHP_OFFLINE]
> +
> +A failed online operation looks like this: ::
> +
> +  [CPUHP_OFFLINE]
> +  [CPUHP_OFFLINE + 1]->startup()       -> success
> +  [CPUHP_OFFLINE + 2]->startup()       -> success
> +  [CPUHP_OFFLINE + 3]                  -> skipped because startup == NULL
> +  ...
> +  [CPUHP_BRINGUP_CPU]->startup()       -> success
> +  === End of PREPARE section
> +  [CPUHP_BRINGUP_CPU + 1]->startup()   -> success
> +  ...
> +  [CPUHP_AP_ONLINE]->startup()         -> success
> +  === End of STARTUP section
> +  [CPUHP_AP_ONLINE + 1]->startup()     -> success
> +  ---
> +  [CPUHP_AP_ONLINE + N]->startup()     -> fail
> +  [CPUHP_AP_ONLINE + (N - 1)]->teardown()
> +  ...
> +  [CPUHP_AP_ONLINE + 1]->teardown()
> +  === Start of STARTUP section
> +  [CPUHP_AP_ONLINE]->teardown()
> +  ...
> +  [CPUHP_BRINGUP_ONLINE - 1]->teardown()
> +  ...
> +  === Start of PREPARE section
> +  [CPUHP_BRINGUP_CPU]->teardown()
> +  [CPUHP_OFFLINE + 3]->teardown()
> +  [CPUHP_OFFLINE + 2]                  -> skipped because teardown == NULL
> +  [CPUHP_OFFLINE + 1]->teardown()
> +  [CPUHP_OFFLINE]
> +
> +A failed offline operation looks like this: ::
> +
> +  [CPUHP_ONLINE]
> +  [CPUHP_ONLINE - 1]->teardown()       -> success
> +  ...
> +  [CPUHP_ONLINE - N]->teardown()       -> fail
> +  [CPUHP_ONLINE - (N - 1)]->startup()
> +  ...
> +  [CPUHP_ONLINE - 1]->startup()
> +  [CPUHP_ONLINE]
> +
> +Recursive failures cannot be handled sensibly. Look at the following
> +example of a recursive fail due to a failed offline operation: ::
> +
> +  [CPUHP_ONLINE]
> +  [CPUHP_ONLINE - 1]->teardown()       -> success
> +  ...
> +  [CPUHP_ONLINE - N]->teardown()       -> fail
> +  [CPUHP_ONLINE - (N - 1)]->startup()  -> success
> +  [CPUHP_ONLINE - (N - 2)]->startup()  -> fail
> +
> +The CPU hotplug state machine stops right here and does not try to go back
> +down again because that would likely result in an endless loop: ::
> +
> +  [CPUHP_ONLINE - (N - 1)]->teardown() -> success
> +  [CPUHP_ONLINE - N]->teardown()       -> fail
> +  [CPUHP_ONLINE - (N - 1)]->startup()  -> success
> +  [CPUHP_ONLINE - (N - 2)]->startup()  -> fail
> +  [CPUHP_ONLINE - (N - 1)]->teardown() -> success
> +  [CPUHP_ONLINE - N]->teardown()       -> fail
> +
> +Lather, rinse and repeat. In this case the CPU left in state: ::

                                               CPU is left

> +
> +  [CPUHP_ONLINE - (N - 1)]
> +
> +which at least lets the system make progress and gives the user a chance to
> +debug or even resolve the situation.
> +
> +Allocating a state
> +------------------
> +
> +There are two ways to allocate a CPU hotplug state:
> +
> +* Static allocation
> +
> +  Static allocation has to be used when the subsystem or driver has
> +  ordering requirements versus other CPU hotplug states. E.g. the PERF core
> +  startup callback has to be invoked before the PERF driver startup
> +  callbacks during a CPU online operation. During a CPU offline operation
> +  the driver teardown callbacks have to be invoked before the core teardown
> +  callback. The statically allocated states are described by constants in
> +  the cpuhp_state enum which can be found in include/linux/cpuhotplug.h.
> +
> +  Insert the state into the enum at the proper place so the ordering
> +  requirements are fulfilled. The state constant has to be used for state
> +  setup and removal.
> +
> +  Static allocation is also required when the state callbacks are not set
> +  up at runtime and are part of the initializer of the CPU hotplug state
> +  array in kernel/cpu.c.
> +
> +* Dynamic allocation
> +
> +  When there are no ordering requirements for the state callbacks then
> +  dynamic allocation is the preferred method. The state number is allocated
> +  by the setup function and returned to the caller on success.
> +
> +  Only the PREPARE and ONLINE sections provide a dynamic allocation
> +  range. The STARTING section does not as most of the callbacks in that
> +  section have explicit ordering requirements.
> +
> +Setup of a CPU hotplug state
> +----------------------------
> +
> +The core code provides the following functions to setup a state:
> +
> +* cpuhp_setup_state(state, name, startup, teardown)
> +* cpuhp_setup_state_nocalls(state, name, startup, teardown)
> +* cpuhp_setup_state_cpuslocked(state, name, startup, teardown)
> +* cpuhp_setup_state_nocalls_cpuslocked(state, name, startup, teardown)
> +
> +For cases where a driver or a subsystem has multiple instances and the same
> +CPU hotplug state callbacks need to be invoked for each instance, the CPU
> +hotplug core provides multi-instance support. The advantage over driver
> +specific instance lists is that the instance related functions are fully
> +serialized against CPU hotplug operations and provide the automatic
> +invocations of the state callbacks on add and removal. To set up such a
> +multi-instance state the following function is available:
> +
> +* cpuhp_setup_state_multi(state, name, startup, teardown)
> +
> +The @state argument is either a statically allocated state or one of the
> +constants for dynamically allocated states - CPUHP_PREPARE_DYN,
> +CPUHP_ONLINE_DYN - depending on the state section (PREPARE, ONLINE) for
> +which a dynamic state should be allocated.
> +
> +The @name argument is used for sysfs output and for instrumentation. The
> +naming convention is "subsys:mode" or "subsys/driver:mode",
> +e.g. "perf:mode" or "perf/x86:mode". The common mode names:

                                                         names are:

> +
> +======== =======================================================
> +prepare  For states in the PREPAREsection

                               PREPARE section

> +
> +dead     For states in the PREPARE section which do not provide
> +         a startup callback
> +
> +starting For states in the STARTING section
> +
> +dying    For states in the STARTING section which do not provide
> +         a startup callback
> +
> +online   For states in the ONLINE section
> +
> +offline  For states in the ONLINE section which do not provide
> +         a startup callback
> +======== =======================================================
> +
> +As the @name argument is only used for sysfs and instrumentation other mode
> +descriptors can be used as well if they describe the nature of the state
> +better than the common ones.
> +
> +Examples for @name arguments: "perf/online", "perf/x86:prepare",
> +"RCU/tree:dying", "sched/waitempty"
> +
> +The @startup argument is a function pointer to the callback which should be
> +invoked during a CPU online operation. If the usage site does not require a
> +startup callback set the pointer to NULL.
> +
> +The @teardown argument is a function pointer to the callback which should
> +be invoked during a CPU offline operation. If the usage site does not
> +require a teardown callback set the pointer to NULL.
> +
> +The functions differ in the way how the installed callbacks are treated:
> +
> +  * cpuhp_setup_state_nocalls(), cpuhp_setup_state_nocalls_cpuslocked()
> +    and cpuhp_setup_state_multi() only install the callbacks
> +
> +  * cpuhp_setup_state() and cpuhp_setup_state_cpuslocked() install the
> +    callbacks and invoke the @startup callback (if not NULL) for all online
> +    CPUs which have currently a state greater than the newly installed
> +    state. Depending on the state section the callback is either invoked on
> +    the current CPU (PREPARE section) or on each online CPU (ONLINE
> +    section) in the context of the CPU's hotplug thread.
> +
> +    If a callback fails for CPU N then the teardown callback for CPU
> +    0 .. N-1 is invoked to rollback the operation. The state setup fails,

CPU 0? Does one of these fail since it's not an AP?

> +    the callbacks for the state are not installed and in case of dynamic
> +    allocation the allocated state is freed.
> +
> +The state setup and the callback invocations are serialized against CPU
> +hotplug operations. If the setup function has to be called from a CPU
> +hotplug read locked region, then the _cpuslocked() variants have to be
> +used. These functions cannot be used from within CPU hotplug callbacks.
> +
> +The function return values:
> +  ======== ===================================================================
> +  0        Statically allocated state was successfully set up
> +
> +  >0       Dynamically allocated state was successfully set up.
> +
> +           The returned number is the state number which was allocated. If
> +           the state callbacks have to be removed later, e.g. module
> +           removal, then this number has to be saved by the caller and used
> +           as @state argument for the state remove function. For
> +           multi-instance states the dynamically allocated state number is
> +           also required as @state argument for the instance add/remove
> +           operations.
> +
> +  <0	   Operation failed
> +  ======== ===================================================================
> +
> +Removal of a CPU hotplug state
> +------------------------------
> +
> +To remove a previously set up state, the following functions are provided:
> +
> +* cpuhp_remove_state(state)
> +* cpuhp_remove_state_nocalls(state)
> +* cpuhp_remove_state_nocalls_cpuslocked(state)
> +* cpuhp_remove_multi_state(state)
> +
> +The @state argument is either a statically allocated state or the state
> +number which was allocated in the dynamic range by cpuhp_setup_state*(). If
> +the state is in the dynamic range, then the state number is freed and
> +available for dynamic allocation again.
> +
> +The functions differ in the way how the installed callbacks are treated:
> +
> +  * cpuhp_remove_state_nocalls(), cpuhp_remove_state_nocalls_cpuslocked()
> +    and cpuhp_remove_multi_state() only remove the callbacks.
> +
> +  * cpuhp_remove_state() removes the callbacks and invokes the teardown
> +    callback (if not NULL) for all online CPUs which have currently a state
> +    greater than the removed state. Depending on the state section the
> +    callback is either invoked on the current CPU (PREPARE section) or on
> +    each online CPU (ONLINE section) in the context of the CPU's hotplug
> +    thread.
> +
> +    In order to complete the removal, the teardown callback should not fail.
> +
> +The state removal and the callback invocations are serialized against CPU
> +hotplug operations. If the remove function has to be called from a CPU
> +hotplug read locked region, then the _cpuslocked() variants have to be
> +used. These functions cannot be used from within CPU hotplug callbacks.
> +
> +If a multi-instance state is removed then the caller has to remove all
> +instances first.
> +
> +Multi-Instance state instance management
> +----------------------------------------
> +
> +Once the multi-instance state is set up, instances can be added to the
> +state:
> +
> +  * cpuhp_state_add_instance(state, node)
> +  * cpuhp_state_add_instance_nocalls(state, node)
> +
> +The @state argument is either a statically allocated state or the state
> +number which was allocated in the dynamic range by cpuhp_setup_state_multi().
> +
> +The @node argument is a pointer to a hlist_node which is embedded in the

               I would say:         to an hlist_node

> +instance's data structure. The pointer is handed to the multi-instance
> +state callbacks and can be used by the callback to retrieve the instance
> +via container_of().
> +
> +The functions differ in the way how the installed callbacks are treated:
> +
> +  * cpuhp_state_add_instance_nocalls() and only adds the instance to the
> +    multi-instance state's node list.
> +
> +  * cpuhp_state_add_instance() adds the instance and invokes the startup
> +    callback (if not NULL) associated with @state for all online CPUs which
> +    have currently a state greater than @state. The callback is only
> +    invoked for the to be added instance. Depending on the state section
> +    the callback is either invoked on the current CPU (PREPARE section) or
> +    on each online CPU (ONLINE section) in the context of the CPU's hotplug
> +    thread.
> +
> +    If a callback fails for CPU N then the teardown callback for CPU
> +    0 .. N-1 is invoked to rollback the operation, the function fails and

all except the Boot CPU?

> +    the instance is not added to the node list of the multi-instance state.
> +
> +To remove an instance from the state's node list these functions are
> +available:
> +
> +  * cpuhp_state_remove_instance(state, node)
> +  * cpuhp_state_remove_instance_nocalls(state, node)
> +
> +The arguments are the same as for the the cpuhp_state_add_instance*()
> +variants above.
> +
> +The functions differ in the way how the installed callbacks are treated:
> +
> +  * cpuhp_state_remove_instance_nocalls() only removes the instance from the
> +    state's node list.
> +
> +  * cpuhp_state_remove_instance() removes the instance and invokes the
> +    teardown callback (if not NULL) associated with @state for all online
> +    CPUs which have currently a state greater than @state.  The callback is
> +    only invoked for the to be removed instance.  Depending on the state
> +    section the callback is either invoked on the current CPU (PREPARE
> +    section) or on each online CPU (ONLINE section) in the context of the
> +    CPU's hotplug thread.
> +
> +    In order to complete the removal, the teardown callback should not fail.
> +
> +The node list add/remove operations and the callback invocations are
> +serialized against CPU hotplug operations. These functions cannot be used
> +from within CPU hotplug callbacks and CPU hotplug read locked regions.
> +
> +Examples
> +--------
> +
> +Setup and teardown a statically allocated state in the STARTING section for
> +notifications on online and offline operations: ::
> +
> +   ret = cpuhp_setup_state(CPUHP_SUBSYS_STARTING, "subsys:starting", subsys_cpu_starting, subsys_cpu_dying);
> +   if (ret < 0)
> +        return ret;
> +   ....
> +   cpuhp_remove_state(CPUHP_SUBSYS_STARTING);
> +
> +Setup and teardown a dynamically allocated state in the ONLINE section
> +for notifications on offline operations: ::
> +
> +   state = cpuhp_setup_state(CPUHP_ONLINE_DYN, "subsys:offline", NULL, subsys_cpu_offline);
> +   if (state < 0)
> +       return state;
> +   ....
> +   cpuhp_remove_state(state);
> +
> +Setup and teardown a dynamically allocated state in the ONLINE section
> +for notifications on online operations without invoking the callbacks: ::
> +
> +   state = cpuhp_setup_state_nocalls(CPUHP_ONLINE_DYN, "subsys:online", subsys_cpi_online, NULL);

                                                                                  _cpu_

> +   if (state < 0)
> +       return state;
> +   ....
> +   cpuhp_remove_state_nocalls(state);
> +
> +Setup, use and teardown a dynamically allocated multi-instance state in the
> +ONLINE section for notifications on online and offline operation: ::
> +
> +   state = cpuhp_setup_state_multi(CPUHP_ONLINE_DYN, "subsys:online", subsys_cpu_online, subsys_cpu_offline);
> +   if (state < 0)
> +       return state;
> +   ....
> +   ret = cpuhp_state_add_instance(state, &inst1->node);
> +   if (ret)
> +        return ret;
> +   ....
> +   ret = cpuhp_state_add_instance(state, &inst2->node);
> +   if (ret)
> +        return ret;
> +   ....
> +   cpuhp_remove_instance(state, &inst1->node);
> +   ....
> +   cpuhp_remove_instance(state, &inst2->node);
> +   ....
> +   remove_multi_state(state);
> +
>   
>   Testing of hotplug states
>   =========================


-- 
~Randy

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ