lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Tue, 15 Sep 2020 08:41:00 +0100
From:   Marc Zyngier <maz@...nel.org>
To:     Shenming Lu <lushenming@...wei.com>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Jason Cooper <jason@...edaemon.net>,
        linux-kernel@...r.kernel.org, wanghaibin.wang@...wei.com
Subject: Re: [PATCH] irqchip/gic-v4.1: Optimize the delay time of the poll on
 the GICR_VENPENDBASER.Dirty bit

On 2020-09-15 08:22, Shenming Lu wrote:
> Every time the vPE is scheduled, the code starts polling the
> GICR_VPENDBASER.Dirty bit until it becomes 0. It's OK. But
> the delay_us of the poll function is directly set to 10, which
> is a long time for most interrupts. In our measurement, it takes
> only 1~2 microseconds, or even hundreds of nanoseconds, to finish
> parsing the VPT in most cases. However, in the current implementation,
> if the GICR_VPENDBASER.Dirty bit is not 0 immediately after the
> vPE is scheduled, it will directly wait for 10 microseconds,
> resulting in meaningless waiting.
> 
> In order to avoid meaningless waiting, we can set the delay_us
> to 0, which can exit the poll function immediately when the Dirty
> bit becomes 0.

We clearly have a difference in interpretation of the word 
"meaningless".

With this, you are busy-waiting on the register, adding even more 
overhead
at the RD level. How is that better? The whole point is to back off and 
let
the RD do its stuff in the background. This is also based on a massive
sample of *one* implementation. How is that representative?

It would be a lot more convincing if you measured the difference it
makes on the total scheduling latency of a vcpu. Assuming it makes
*any* observable difference.

Thanks,

         M.

> 
> Signed-off-by: Shenming Lu <lushenming@...wei.com>
> ---
>  drivers/irqchip/irq-gic-v3-its.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/irqchip/irq-gic-v3-its.c 
> b/drivers/irqchip/irq-gic-v3-its.c
> index 548de7538632..5cfcf0c2ce1a 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -3803,7 +3803,7 @@ static void its_wait_vpt_parse_complete(void)
>  	WARN_ON_ONCE(readq_relaxed_poll_timeout_atomic(vlpi_base + 
> GICR_VPENDBASER,
>  						       val,
>  						       !(val & GICR_VPENDBASER_Dirty),
> -						       10, 500));
> +						       0, 500));
>  }
> 
>  static void its_vpe_schedule(struct its_vpe *vpe)
-- 
Jazz is not dead. It just smells funny...

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ