linux-kernel - Re: [PATCH] irqchip/gic-v4.1: Optimize the delay time of the poll on the GICR

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8c9f4731295af025302e084ba546b74b@kernel.org>
Date:   Wed, 16 Sep 2020 09:39:42 +0100
From:   Marc Zyngier <maz@...nel.org>
To:     lushenming <lushenming@...wei.com>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Jason Cooper <jason@...edaemon.net>,
        linux-kernel@...r.kernel.org,
        "Wanghaibin (D)" <wanghaibin.wang@...wei.com>,
        yuzenghui <yuzenghui@...wei.com>
Subject: Re: [PATCH] irqchip/gic-v4.1: Optimize the delay time of the poll on
 the GICR_VPENDBASER.Dirty bit

On 2020-09-16 08:04, lushenming wrote:
> Hi,
> 
> Our team just discussed this issue again and consulted our GIC hardware
> design team. They think the RD can afford busy waiting. So we still 
> think
> maybe 0 is better, at least for our hardware.
> 
> In addition, if not 0, as I said before, in our measurement, it takes 
> only
> hundreds of nanoseconds, or 1~2 microseconds, to finish parsing the VPT
> in most cases. So maybe 1 microseconds, or smaller, is more 
> appropriate.
> Anyway, 10 microseconds is too much.
> 
> But it has to be said that it does depend on the hardware 
> implementation.

Exactly. And given that the only publicly available implementation is
a software model, I am reluctant to change "performance" related things
based on benchmarks that can't be verified and appears to me as a micro
optimization.

> Besides, I'm not sure where are the start and end point of the total 
> scheduling
> latency of a vcpu you said, which includes many events. Is the parse 
> time of
> the VPT not clear enough?

Measure the time it takes from kvm_vcpu_load() to the point where the 
vcpu
enters the guest. How much, in proportion, do these 1/2/10ms represent?

Also, a better(?) course of action would maybe to consider whether we 
should
split the its_vpe_schedule() call into two distinct operations: one that
programs the VPE to be resident, and another that poll the Dirty bit 
*much
later* on the entry path, giving the GIC a chance to work in parallel 
with
the CPU on the entry path.

If your HW is a quick as you say it is, it would pretty much guarantee
a clear read of GICR_VPENDBASER without waiting.

         M.
-- 
Jazz is not dead. It just smells funny...