lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6ba7f198-4403-c9d1-f0be-7069cc8cd421@suse.de>
Date:   Tue, 29 Aug 2017 13:58:12 +0200
From:   Alexander Graf <agraf@...e.de>
To:     Yang Zhang <yang.zhang.wz@...il.com>, linux-kernel@...r.kernel.org
Cc:     kvm@...r.kernel.org, wanpeng.li@...mail.com, mst@...hat.com,
        pbonzini@...hat.com, tglx@...utronix.de, rkrcmar@...hat.com,
        dmatlack@...gle.com, peterz@...radead.org,
        linux-doc@...r.kernel.org
Subject: Re: [RFC PATCH v2 0/7] x86/idle: add halt poll support

On 08/29/2017 01:46 PM, Yang Zhang wrote:
> Some latency-intensive workload will see obviously performance
> drop when running inside VM. The main reason is that the overhead
> is amplified when running inside VM. The most cost i have seen is
> inside idle path.
>
> This patch introduces a new mechanism to poll for a while before
> entering idle state. If schedule is needed during poll, then we
> don't need to goes through the heavy overhead path.
>
> Here is the data we get when running benchmark contextswitch to measure
> the latency(lower is better):
>
>     1. w/o patch:
>        2493.14 ns/ctxsw -- 200.3 %CPU
>     
>     2. w/ patch:
>        halt_poll_threshold=10000 -- 1485.96ns/ctxsw -- 201.0 %CPU
>        halt_poll_threshold=20000 -- 1391.26 ns/ctxsw -- 200.7 %CPU
>        halt_poll_threshold=30000 -- 1488.55 ns/ctxsw -- 200.1 %CPU
>        halt_poll_threshold=500000 -- 1159.14 ns/ctxsw -- 201.5 %CPU
>     
>     3. kvm dynamic poll
>        halt_poll_ns=10000 -- 2296.11 ns/ctxsw -- 201.2 %CPU
>        halt_poll_ns=20000 -- 2599.7 ns/ctxsw -- 201.7 %CPU
>        halt_poll_ns=30000 -- 2588.68 ns/ctxsw -- 211.6 %CPU
>        halt_poll_ns=500000 -- 2423.20 ns/ctxsw -- 229.2 %CPU
>     
>     4. idle=poll
>        2050.1 ns/ctxsw -- 1003 %CPU
>     
>     5. idle=mwait
>        2188.06 ns/ctxsw -- 206.3 %CPU

Could you please try to create another metric for guest initiated, host 
aborted mwait?

For a quick benchmark, reserve 4 registers for a magic value, set them 
to the magic value before you enter MWAIT in the guest. Then allow 
native MWAIT execution on the host. If you see the guest wants to enter 
with the 4 registers containing the magic contents and no events are 
pending, directly go into the vcpu block function on the host.

That way any time a guest gets naturally aborted while in mwait, it will 
only reenter mwait when an event actually occured. While the guest is 
normally running (and nobody else wants to run on the host), we just 
stay in guest context, but with a sleeping CPU.

Overall, that might give us even better performance, as it allows for 
turbo boost and HT to work properly.


Alex

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ