linux-kernel - Re: [PATCH] irqchip/gic-v4.1: Optimize the delay time of the poll on the GICR

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <977d4c86cae651d1b22b1a519ee6b037@kernel.org>
Date:   Tue, 15 Sep 2020 08:41:00 +0100
From:   Marc Zyngier <maz@...nel.org>
To:     Shenming Lu <lushenming@...wei.com>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Jason Cooper <jason@...edaemon.net>,
        linux-kernel@...r.kernel.org, wanghaibin.wang@...wei.com
Subject: Re: [PATCH] irqchip/gic-v4.1: Optimize the delay time of the poll on
 the GICR_VENPENDBASER.Dirty bit

On 2020-09-15 08:22, Shenming Lu wrote:
> Every time the vPE is scheduled, the code starts polling the
> GICR_VPENDBASER.Dirty bit until it becomes 0. It's OK. But
> the delay_us of the poll function is directly set to 10, which
> is a long time for most interrupts. In our measurement, it takes
> only 1~2 microseconds, or even hundreds of nanoseconds, to finish
> parsing the VPT in most cases. However, in the current implementation,
> if the GICR_VPENDBASER.Dirty bit is not 0 immediately after the
> vPE is scheduled, it will directly wait for 10 microseconds,
> resulting in meaningless waiting.
> 
> In order to avoid meaningless waiting, we can set the delay_us
> to 0, which can exit the poll function immediately when the Dirty
> bit becomes 0.

We clearly have a difference in interpretation of the word 
"meaningless".

With this, you are busy-waiting on the register, adding even more 
overhead
at the RD level. How is that better? The whole point is to back off and 
let
the RD do its stuff in the background. This is also based on a massive
sample of *one* implementation. How is that representative?

It would be a lot more convincing if you measured the difference it
makes on the total scheduling latency of a vcpu. Assuming it makes
*any* observable difference.

Thanks,

         M.

> 
> Signed-off-by: Shenming Lu <lushenming@...wei.com>
> ---
>  drivers/irqchip/irq-gic-v3-its.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/irqchip/irq-gic-v3-its.c 
> b/drivers/irqchip/irq-gic-v3-its.c
> index 548de7538632..5cfcf0c2ce1a 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -3803,7 +3803,7 @@ static void its_wait_vpt_parse_complete(void)
>  	WARN_ON_ONCE(readq_relaxed_poll_timeout_atomic(vlpi_base + 
> GICR_VPENDBASER,
>  						       val,
>  						       !(val & GICR_VPENDBASER_Dirty),
> -						       10, 500));
> +						       0, 500));
>  }
> 
>  static void its_vpe_schedule(struct its_vpe *vpe)
-- 
Jazz is not dead. It just smells funny...