lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANRm+CwpGoSTLNG3D8KN+fgfc+20-gFND7xGAYo2b5EuwzPeOg@mail.gmail.com>
Date:   Mon, 22 Jan 2018 19:47:45 +0800
From:   Wanpeng Li <kernellwp@...il.com>
To:     linux-kernel@...r.kernel.org, kvm <kvm@...r.kernel.org>
Cc:     Paolo Bonzini <pbonzini@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Radim Krcmar <rkrcmar@...hat.com>,
        Frederic Weisbecker <fweisbec@...il.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...nel.org>
Subject: unixbench context switch perfomance & cpu topology

Hi all,

We can observe unixbench context switch performance is heavily
influenced by cpu topology which is exposed to the guest. the score is
posted below, bigger is better, both the guest and the host kernel are
3.15-rc3(we can also reproduce against centos 7.4 693 guest/host), LLC
is exposed to the guest, kvm adaptive halt-polling is default enabled,
then start a guest w/ 8 logical cpus.



unixbench context switch
-smp 8, sockets=8, cores=1, threads=1    382036
-smp 8, sockets=4, cores=2, threads=1    132480
-smp 8, sockets=2, cores=4, threads=1    128032
-smp 8, sockets=2, cores=2, threads=2    131767
-smp 8, sockets=1, cores=4, threads=2    132742
-smp 8, sockets=1, cores=4, threads=2 (guest w/ nohz=off idle=poll)    331471

I can observe there are a lot of reschedule IPIs sent from one vCPU to
another vCPU, the context switch workload switches between running and
idle frequently which results in HLT instruction in the idle path, I
use idle=poll to avoid vmexit due to HLT and to avoid reschedule IPIs
since idle task checks TIF_NEED_RESCHED flags in a loop, nohz=off can
stop to program lapic timer/other nohz stuffs. Any idea why sockets=8
can get best performance?


Regards,
Wanpeng Li

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ