lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 20 Jan 2022 19:55:57 +0100
From:   "Rafael J. Wysocki" <rafael@...nel.org>
To:     Shaokun Zhang <zhangshaokun@...ilicon.com>
Cc:     Linux PM <linux-pm@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Guo Yang <guoyang2@...wei.com>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Daniel Lezcano <daniel.lezcano@...aro.org>
Subject: Re: [PATCH] cpuidle: menu: Fix long delay issue when tick stopped

On Mon, Jan 17, 2022 at 9:16 AM Shaokun Zhang
<zhangshaokun@...ilicon.com> wrote:
>
> From: Guo Yang <guoyang2@...wei.com>
>
> The network delay was always big on arm server tested by qperf,
> the reason was that the cpu entered deep power down idle state(like intel
> C6) and can't goto a shallow one.
>
> The intervals in @get_typical_interval() was much smaller than predicted_ns
> in @menu_select(), so the predict state is always deepest and cause long
> time network delay.
>
> Every time when the cpu got an interrupt from the network, the cpu was
> waken up and did the IRQ, after that the cpu enter @menu_select()
> but the @tick_nohz_tick_stopped() was true and get a big data->next_timer_ns,
> the cpu can never goto a shallow state util the data->next_timer_ns timeout.
> Below was the print when the issue occurrence.
>
> [   37.082861] intervals = 36us
> [   37.082875] intervals = 15us
> [   37.082888] intervals = 22us
> [   37.082902] intervals = 35us
> [   37.082915] intervals = 34us
> [   37.082929] intervals = 39us
> [   37.082942] intervals = 39us
> [   37.082956] intervals = 35us
> [   37.082970] target_residency_ns = 10000, predicted_ns = 35832710
> [   37.082998] target_residency_ns = 600000, predicted_ns = 35832710
> [   37.083037] intervals = 36us
> [   37.083050] intervals = 15us
> [   37.083064] intervals = 22us
> [   37.083077] intervals = 35us
> [   37.083091] intervals = 34us
> [   37.083104] intervals = 39us
> [   37.083118] intervals = 39us
> [   37.083131] intervals = 35us
> [   37.083145] target_residency_ns = 10000, predicted_ns = 35657420
> [   37.083174] target_residency_ns = 600000, predicted_ns = 35657420
> [   37.083212] intervals = 36us
> [   37.083225] intervals = 15us
> [   37.083239] intervals = 22us
> [   37.083253] intervals = 35us
> [   37.083266] intervals = 34us
> [   37.083279] intervals = 39us
> [   37.083293] intervals = 39us
> [   37.083307] intervals = 35us
> [   37.083320] target_residency_ns = 10000, predicted_ns = 35482140
> [   37.083349] target_residency_ns = 600000, predicted_ns = 35482140
>
> Add idle tick wakeup judge before change predicted_ns.
>
> Cc: "Rafael J. Wysocki" <rafael@...nel.org>
> Cc: Daniel Lezcano <daniel.lezcano@...aro.org>
> Signed-off-by: Guo Yang <guoyang2@...wei.com>
> Signed-off-by: Shaokun Zhang <zhangshaokun@...ilicon.com>
> ---
>  drivers/cpuidle/governors/menu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index c492268..3f03843 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -313,7 +313,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
>                                 get_typical_interval(data, predicted_us)) *
>                                 NSEC_PER_USEC;
>
> -       if (tick_nohz_tick_stopped()) {
> +       if (tick_nohz_tick_stopped() && data->tick_wakeup) {

data->tick_wakeup is only true if tick_nohz_idle_got_tick() has
returned true, but I'm not sure how this can happen after stopping the
tick.

IOW, it looks like the change simply makes the condition be always false.

>                 /*
>                  * If the tick is already stopped, the cost of possible short
>                  * idle duration misprediction is much higher, because the CPU
> --
> 1.8.3.1
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ