lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Wed, 10 Jan 2024 14:46:50 +0100
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>, Chen Yu <yu.c.chen@...el.com>, 
	Len Brown <len.brown@...el.com>, 
	Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>, 
	Stanislaw Gruszka <stanislaw.gruszka@...ux.intel.com>, Zhang Rui <rui.zhang@...el.com>, 
	Zhao Liu <zhao1.liu@...el.com>, linux-pm@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3] thermal: intel: hfi: Add syscore callbacks

On Wed, Jan 10, 2024 at 4:05 AM Ricardo Neri
<ricardo.neri-calderon@...ux.intel.com> wrote:
>
> The kernel allocates a memory buffer and provides its location to the
> hardware, which uses it to update the HFI table. This allocation occurs
> during boot and remains constant throughout runtime.
>
> When resuming from hibernation, the restore kernel allocates a second
> memory buffer and reprograms the HFI hardware with the new location as part
> of a normal boot. The location of the second memory buffer may differ from
> the one allocated by the image kernel. When the restore kernel transfers
> control to the image kernel, the second buffer becomes invalid, potentially
> leading to memory corruption if the hardware writes to it (the hardware
> continues using the buffer from the restore kernel).
>
> It is also possible that the hardware forgets the address of the memory
> buffer when resuming from "deep" suspend. Memory corruption may also occur
> in such a scenario.
>
> To prevent the described memory corruption, disable HFI when preparing to
> suspend or hibernate. Enable it when resuming.
>
> Add syscore callbacks to handle the package of the boot CPU (packages of
> non-boot CPUs are handled via CPU offline). Syscore ops always run on the
> boot CPU. Additionally, HFI only needs to be disabled during "deep" suspend
> and hibernation. Syscore ops only run in these cases.
>
> Cc: Chen Yu <yu.c.chen@...el.com>
> Cc: Len Brown <len.brown@...el.com>
> Cc: Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>
> Cc: Stanislaw Gruszka <stanislaw.gruszka@...ux.intel.com>
> Cc: Zhang Rui <rui.zhang@...el.com>
> Cc: Zhao Liu <zhao1.liu@...ux.intel.com>
> Cc: linux-pm@...r.kernel.org
> Cc: stable@...r.kernel.org    # 6.1
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>
> ---
> This is v3 of the patch [4/4] of [1]. Rafael has already accepted the
> first three patches. Thanks!
>
> I tested this patch on the testing branch of Rafael on Jan 9th. 2024. I
> tested it on a Meteor Lake system. It completed 1000 hibernation-resume
> cycles.
>
> [1]. https://lore.kernel.org/all/20240103041459.11113-1-ricardo.neri-calderon@linux.intel.com/
>
> ---
> Changes since v2:
>  * Switched to syscore ops instead of a suspend notifier as it is a better
>    fit: we only need to handle the boot CPU, disabling HFI is only needed
>    for "deep" suspend and hibernation. (Rafael)
>  * Removed incorrect statement about the suspend flow in the commit
>    message. (Rafael)
>
> Changes since v1:
>  * Moved registration of the suspend notifier towards the end of
>    intel_hfi_init(). (Stan)
>  * Renamed hfi_do_pm_[enable|disable]() to hfi_do_[enable|disable](). Stan
>    will use these functions outside the suspend notifier. (Stan)
>  * Added locking to calls to hfi_[enable|disable]() from the suspend
>    notifier. (Rafael)
> ---
>  drivers/thermal/intel/intel_hfi.c | 28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
>
> diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c
> index 22445403b520..768cd5f9eb32 100644
> --- a/drivers/thermal/intel/intel_hfi.c
> +++ b/drivers/thermal/intel/intel_hfi.c
> @@ -35,7 +35,9 @@
>  #include <linux/processor.h>
>  #include <linux/slab.h>
>  #include <linux/spinlock.h>
> +#include <linux/suspend.h>
>  #include <linux/string.h>
> +#include <linux/syscore_ops.h>
>  #include <linux/topology.h>
>  #include <linux/workqueue.h>
>
> @@ -571,6 +573,30 @@ static __init int hfi_parse_features(void)
>         return 0;
>  }
>
> +static void hfi_do_enable(void)
> +{
> +       /* If we are here, we are on the boot CPU */
> +       struct hfi_cpu_info *info = &per_cpu(hfi_cpu_info, 0);
> +       struct hfi_instance *hfi_instance = info->hfi_instance;
> +
> +       /* No locking needed. There is no concurrency with CPU online. */
> +       hfi_set_hw_table(hfi_instance);
> +       hfi_enable();
> +}
> +
> +static int hfi_do_disable(void)
> +{
> +       /* No locking needed. There is no concurrency with CPU offline. */
> +       hfi_disable();
> +
> +       return 0;
> +}
> +
> +static struct syscore_ops hfi_pm_ops = {
> +       .resume = hfi_do_enable,
> +       .suspend = hfi_do_disable,
> +};
> +
>  void __init intel_hfi_init(void)
>  {
>         struct hfi_instance *hfi_instance;
> @@ -602,6 +628,8 @@ void __init intel_hfi_init(void)
>         if (!hfi_updates_wq)
>                 goto err_nomem;
>
> +       register_syscore_ops(&hfi_pm_ops);
> +
>         return;
>
>  err_nomem:
> --

Applied as 6.8-rc1 material, with edited subject and changelog, and
I've reworded a comment in hfi_do_enable() to speak about the code.

Thanks!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ