linux-kernel - Re: [PATCH 0/4] Powerpc: Better preemption for shared processor

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <da67d6ce-f120-f61a-19ff-0ae4f1f5dac0@redhat.com>
Date:   Wed, 28 Oct 2020 20:01:30 -0400
From:   Waiman Long <longman@...hat.com>
To:     Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
        Michael Ellerman <mpe@...erman.id.au>
Cc:     linuxppc-dev <linuxppc-dev@...ts.ozlabs.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Nicholas Piggin <npiggin@...il.com>,
        Nathan Lynch <nathanl@...ux.ibm.com>,
        Gautham R Shenoy <ego@...ux.vnet.ibm.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Valentin Schneider <valentin.schneider@....com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Phil Auld <pauld@...hat.com>
Subject: Re: [PATCH 0/4] Powerpc: Better preemption for shared processor

On 10/28/20 8:35 AM, Srikar Dronamraju wrote:
> Currently, vcpu_is_preempted will return the yield_count for
> shared_processor. On a PowerVM LPAR, Phyp schedules at SMT8 core boundary
> i.e all CPUs belonging to a core are either group scheduled in or group
> scheduled out. This can be used to better predict non-preempted CPUs on
> PowerVM shared LPARs.
>
> perf stat -r 5 -a perf bench sched pipe -l 10000000 (lesser time is better)
>
> powerpc/next
>       35,107,951.20 msec cpu-clock                 #  255.898 CPUs utilized            ( +-  0.31% )
>          23,655,348      context-switches          #    0.674 K/sec                    ( +-  3.72% )
>              14,465      cpu-migrations            #    0.000 K/sec                    ( +-  5.37% )
>              82,463      page-faults               #    0.002 K/sec                    ( +-  8.40% )
>   1,127,182,328,206      cycles                    #    0.032 GHz                      ( +-  1.60% )  (66.67%)
>      78,587,300,622      stalled-cycles-frontend   #    6.97% frontend cycles idle     ( +-  0.08% )  (50.01%)
>     654,124,218,432      stalled-cycles-backend    #   58.03% backend cycles idle      ( +-  1.74% )  (50.01%)
>     834,013,059,242      instructions              #    0.74  insn per cycle
>                                                    #    0.78  stalled cycles per insn  ( +-  0.73% )  (66.67%)
>     132,911,454,387      branches                  #    3.786 M/sec                    ( +-  0.59% )  (50.00%)
>       2,890,882,143      branch-misses             #    2.18% of all branches          ( +-  0.46% )  (50.00%)
>
>             137.195 +- 0.419 seconds time elapsed  ( +-  0.31% )
>
> powerpc/next + patchset
>       29,981,702.64 msec cpu-clock                 #  255.881 CPUs utilized            ( +-  1.30% )
>          40,162,456      context-switches          #    0.001 M/sec                    ( +-  0.01% )
>               1,110      cpu-migrations            #    0.000 K/sec                    ( +-  5.20% )
>              62,616      page-faults               #    0.002 K/sec                    ( +-  3.93% )
>   1,430,030,626,037      cycles                    #    0.048 GHz                      ( +-  1.41% )  (66.67%)
>      83,202,707,288      stalled-cycles-frontend   #    5.82% frontend cycles idle     ( +-  0.75% )  (50.01%)
>     744,556,088,520      stalled-cycles-backend    #   52.07% backend cycles idle      ( +-  1.39% )  (50.01%)
>     940,138,418,674      instructions              #    0.66  insn per cycle
>                                                    #    0.79  stalled cycles per insn  ( +-  0.51% )  (66.67%)
>     146,452,852,283      branches                  #    4.885 M/sec                    ( +-  0.80% )  (50.00%)
>       3,237,743,996      branch-misses             #    2.21% of all branches          ( +-  1.18% )  (50.01%)
>
>              117.17 +- 1.52 seconds time elapsed  ( +-  1.30% )
>
> This is around 14.6% improvement in performance.
>
> Cc: linuxppc-dev <linuxppc-dev@...ts.ozlabs.org>
> Cc: LKML <linux-kernel@...r.kernel.org>
> Cc: Michael Ellerman <mpe@...erman.id.au>
> Cc: Nicholas Piggin <npiggin@...il.com>
> Cc: Nathan Lynch <nathanl@...ux.ibm.com>
> Cc: Gautham R Shenoy <ego@...ux.vnet.ibm.com>
> Cc: Peter Zijlstra <peterz@...radead.org>
> Cc: Valentin Schneider <valentin.schneider@....com>
> Cc: Juri Lelli <juri.lelli@...hat.com>
> Cc: Waiman Long <longman@...hat.com>
> Cc: Phil Auld <pauld@...hat.com>
>
> Srikar Dronamraju (4):
>    powerpc: Refactor is_kvm_guest declaration to new header
>    powerpc: Rename is_kvm_guest to check_kvm_guest
>    powerpc: Reintroduce is_kvm_guest
>    powerpc/paravirt: Use is_kvm_guest in vcpu_is_preempted
>
>   arch/powerpc/include/asm/firmware.h  |  6 ------
>   arch/powerpc/include/asm/kvm_guest.h | 25 +++++++++++++++++++++++++
>   arch/powerpc/include/asm/kvm_para.h  |  2 +-
>   arch/powerpc/include/asm/paravirt.h  | 18 ++++++++++++++++++
>   arch/powerpc/kernel/firmware.c       |  5 ++++-
>   arch/powerpc/platforms/pseries/smp.c |  3 ++-
>   6 files changed, 50 insertions(+), 9 deletions(-)
>   create mode 100644 arch/powerpc/include/asm/kvm_guest.h
>
This patch series looks good to me and the performance is nice too.

Acked-by: Waiman Long <longman@...hat.com>

Just curious, is the performance mainly from the use of static_branch 
(patches 1 - 3) or from reducing call to yield_count_of().

Cheers,
Longman