lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAAFQd5DDmBVC2OcJ6vd-y1wJFq8o3j1ssP9G5hQ1ZeB=mcwETQ@mail.gmail.com>
Date: Thu, 9 Jan 2025 13:45:37 +0900
From: Tomasz Figa <tfiga@...omium.org>
To: Douglas Anderson <dianders@...omium.org>
Cc: "Rafael J . Wysocki" <rafael@...nel.org>, Pavel Machek <pavel@....cz>, Len Brown <len.brown@...el.com>, 
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>, linux-kernel@...r.kernel.org, 
	linux-pm@...r.kernel.org
Subject: Re: [PATCH] PM / core: Allow configuring the DPM watchdog to warn
 earlier than panic

Hi Doug,

On Thu, Dec 19, 2024 at 2:56 AM Douglas Anderson <dianders@...omium.org> wrote:
>
> Allow configuring the DPM watchdog to warn about slow suspend/resume
> functions without causing a system panic(). This allows you to set the
> DPM_WATCHDOG_WARNING_TIMEOUT to something like 5 or 10 seconds to get
> warnings about slow suspend/resume functions that eventually succeed.

Thanks for the patch. I agree that it would be very useful to avoid
completely bringing the user's system down, while still having a
warning that lets them notice that something is not right.

That said, please see my comments below.

>
> Signed-off-by: Douglas Anderson <dianders@...omium.org>
> ---
>
>  drivers/base/power/main.c | 22 ++++++++++++++++++----
>  kernel/power/Kconfig      |  8 +++++++-
>  2 files changed, 25 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> index 4a67e83300e1..239bcf93f821 100644
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -496,6 +496,7 @@ struct dpm_watchdog {
>         struct device           *dev;
>         struct task_struct      *tsk;
>         struct timer_list       timer;
> +       bool                    fatal;
>  };
>
>  #define DECLARE_DPM_WATCHDOG_ON_STACK(wd) \
> @@ -512,11 +513,23 @@ struct dpm_watchdog {
>  static void dpm_watchdog_handler(struct timer_list *t)
>  {
>         struct dpm_watchdog *wd = from_timer(wd, t, timer);
> +       struct timer_list *timer = &wd->timer;
> +       unsigned int time_left;
> +
> +       if (wd->fatal) {
> +               dev_emerg(wd->dev, "**** DPM device timeout ****\n");
> +               show_stack(wd->tsk, NULL, KERN_EMERG);
> +               panic("%s %s: unrecoverable failure\n",
> +                       dev_driver_string(wd->dev), dev_name(wd->dev));
> +       }
>
> -       dev_emerg(wd->dev, "**** DPM device timeout ****\n");
> +       time_left = CONFIG_DPM_WATCHDOG_TIMEOUT - CONFIG_DPM_WATCHDOG_WARNING_TIMEOUT;
> +       dev_emerg(wd->dev, "**** DPM device timeout after %u seconds; %u seconds until panic ****\n",
> +                 CONFIG_DPM_WATCHDOG_WARNING_TIMEOUT, time_left);

Should this be printed using dev_warn() instead, since it's not fatal?

>         show_stack(wd->tsk, NULL, KERN_EMERG);
> -       panic("%s %s: unrecoverable failure\n",
> -               dev_driver_string(wd->dev), dev_name(wd->dev));
> +
> +       wd->fatal = true;
> +       mod_timer(timer, jiffies + HZ * time_left);
>  }
>
>  /**
> @@ -530,10 +543,11 @@ static void dpm_watchdog_set(struct dpm_watchdog *wd, struct device *dev)
>
>         wd->dev = dev;
>         wd->tsk = current;
> +       wd->fatal = CONFIG_DPM_WATCHDOG_TIMEOUT == CONFIG_DPM_WATCHDOG_WARNING_TIMEOUT;
>
>         timer_setup_on_stack(timer, dpm_watchdog_handler, 0);
>         /* use same timeout value for both suspend and resume */
> -       timer->expires = jiffies + HZ * CONFIG_DPM_WATCHDOG_TIMEOUT;
> +       timer->expires = jiffies + HZ * CONFIG_DPM_WATCHDOG_WARNING_TIMEOUT;
>         add_timer(timer);
>  }
>
> diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig
> index afce8130d8b9..3387b94e76c1 100644
> --- a/kernel/power/Kconfig
> +++ b/kernel/power/Kconfig
> @@ -257,11 +257,17 @@ config DPM_WATCHDOG
>           boot session.
>
>  config DPM_WATCHDOG_TIMEOUT
> -       int "Watchdog timeout in seconds"
> +       int "Watchdog timeout to panic in seconds"
>         range 1 120
>         default 120
>         depends on DPM_WATCHDOG
>
> +config DPM_WATCHDOG_WARNING_TIMEOUT
> +       int "Watchdog timeout to warn in seconds"
> +       range 1 DPM_WATCHDOG_TIMEOUT
> +       default DPM_WATCHDOG_TIMEOUT
> +       depends on DPM_WATCHDOG
> +

If I'm reading the code correctly, if we set
DPM_WATCHDOG_WARNING_TIMEOUT to the same value as
DPM_WATCHDOG_TIMEOUT, then we get the current behavior when the
timeout is fatal. Would it make sense to document this in the help
text of the new symbol?

Best regards,
Tomasz

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ