[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOnJCU+duqiLAX5G=DQOfb=ugeP_ZVPLPd=HKzG7PMU2XEH6yg@mail.gmail.com>
Date: Fri, 24 Dec 2021 22:12:20 -0800
From: Atish Patra <atishp@...shpatra.org>
To: "linux-kernel@...r.kernel.org List" <linux-kernel@...r.kernel.org>
Cc: Albert Ou <aou@...s.berkeley.edu>,
Anup Patel <anup@...infault.org>,
Damien Le Moal <damien.lemoal@....com>,
devicetree <devicetree@...r.kernel.org>,
Jisheng Zhang <jszhang@...nel.org>,
Krzysztof Kozlowski <krzysztof.kozlowski@...onical.com>,
linux-riscv <linux-riscv@...ts.infradead.org>,
Palmer Dabbelt <palmer@...belt.com>,
Paul Walmsley <paul.walmsley@...ive.com>,
Rob Herring <robh+dt@...nel.org>
Subject: Re: [v5 0/9] Improve RISC-V Perf support using SBI PMU and sscofpmf extension
On Fri, Dec 24, 2021 at 9:47 PM Atish Patra <atishp@...shpatra.org> wrote:
>
> This series adds improved perf support for RISC-V based system using
> SBI PMU extension[1] and Sscofpmf extension[2]. The SBI PMU extension allows
> the kernel to program the counters for different events and start/stop counters
> while the sscofpmf extension allows the counter overflow interrupt and privilege
> mode filtering. An hardware platform can leverage SBI PMU extension without
> the sscofpmf extension if it supports mcounteren at least. Perf stat will work
> but record won't work as sscofpmf & mcountinhibit is required to support that.
> A platform can support both features event counting and sampling using perf
> tool only if sscofpmf is supported.
>
> This series introduces a platform perf driver instead of a existing arch
> specific implementation. The new perf implementation has adopted a modular
> approach where most of the generic event handling is done in the core library
> while individual PMUs need to only implement necessary features specific to
> the PMU. This is easily extensible and any future RISC-V PMU implementation
> can leverage this. Currently, SBI PMU driver & legacy PMU driver are implemented
> as a part of this series.
>
> The legacy driver tries to reimplement the existing minimal perf under a new
> config to maintain backward compatibility. This implementation only allows
> monitoring of always running cycle/instruction counters. Moreover, they can
> not be started or stopped. In general, this is very limited and not very useful.
> That's why, I am not very keen to carry the support into the new driver.
> However, I don't want to break perf for any existing hardware platforms.
> If everybody agrees that we don't need legacy perf implementation for older
> implementation, I will be happy to drop PATCH 4.
>
> This series has been tested in Qemu (RV64 & RV32) and HiFive Unmatched.
> Qemu patches[4] and OpenSBI v1.0 is required to test it on Qemu and a dt patch
> required in U-Boot[5] for HiFive Unmatched. Qemu changes are not
> backward compatible. That means, you can not use perf anymore on older Qemu
> versions with latest OpenSBI and/or Kernel. However, newer kernel will
> just use legacy pmu driver if old OpenSBI is detected.
>
> The U-Boot patch is just an example that encodes few of the events defined
> in fu740 documentation [6] in the DT. We can update the DT to include all the
> events defined if required.
>
> This series depends on the ISA extension parsing series[7].
>
> Here is an output of perf stat/report while running perf benchmark with OpenSBI,
> Linux kernel and U-Boot patches applied.
>
> HiFive Unmatched:
> =================
> perf stat -e cycles -e instructions -e L1-icache-load-misses -e branches -e branch-misses \
> -e r0000000000000200 -e r0000000000000400 \
> -e r0000000000000800 perf bench sched messaging -g 25 -l 15
>
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 25 groups == 1000 processes run
>
> Total time: 0.826 [sec]
>
> Performance counter stats for 'perf bench sched messaging -g 25 -l 15':
>
> 3426710073 cycles (65.92%)
> 1348772808 instructions #0.39 insn per cycle (75.44%)
> 0 L1-icache-load-misses (72.28%)
> 201133996 branches (67.88%)
> 44663584 branch-misses #22.21% of all branches (35.01%)
> 248194747 r0000000000000200 (41.94%) --> Integer load instruction retired
> 156879950 r0000000000000400 (43.58%) --> Integer store instruction retired
> 6988678 r0000000000000800 (47.91%) --> Atomic memory operation retired
>
> 1.931335000 seconds time elapsed
>
> 1.100415000 seconds user
> 3.755176000 seconds sys
>
>
> QEMU:
> =========
> Perf stat:
> =========
>
> [root@...ora-riscv riscv]# perf stat -e r8000000000000005 -e r8000000000000007 \
> -e r8000000000000006 -e r0000000000020002 -e r0000000000020004 -e branch-misses \
> -e cache-misses -e dTLB-load-misses -e dTLB-store-misses -e iTLB-load-misses \
> -e cycles -e instructions perf bench sched messaging -g 15 -l 10 \
> Running with 15*40 (== 600) tasks.
> Time: 6.578
>
> Performance counter stats for './hackbench -pipe 15 process':
>
> 1,794 r8000000000000005 (52.59%) --> SBI_PMU_FW_SET_TIMER
> 2,859 r8000000000000007 (60.74%) --> SBI_PMU_FW_IPI_RECVD
> 4,205 r8000000000000006 (68.71%) --> SBI_PMU_FW_IPI_SENT
> 0 r0000000000020002 (81.69%)
> <not counted> r0000000000020004 (0.00%)
> <not counted> branch-misses (0.00%)
> <not counted> cache-misses (0.00%)
> 7,878,328 dTLB-load-misses (15.60%)
> 680,270 dTLB-store-misses (28.45%)
> 8,287,931 iTLB-load-misses (39.24%)
> 20,008,506,675 cycles (48.60%)
> 21,484,427,932 instructions # 1.07 insn per cycle (56.60%)
>
> 1.681344735 seconds time elapsed
>
> 0.614460000 seconds user
> 8.313254000 seconds sys
>
>
> [root@...ora-riscv ~]# perf stat -e cycles -e instructions -e dTLB-load-misses -e dTLB-store-misses -e iTLB-load-misses \
> > perf bench sched messaging -g 1 -l 10
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 1 groups == 40 processes run
>
> Total time: 0.218 [sec]
>
> Performance counter stats for 'perf bench sched messaging -g 1 -l 10':
>
> 3,685,401,394 cycles
> 3,684,529,388 instructions # 1.00 insn per cycle
> 3,006,042 dTLB-load-misses
> 258,144 dTLB-store-misses
> 1,992,860 iTLB-load-misses
>
> 0.588717389 seconds time elapsed
>
> 0.324009000 seconds user
> 0.937087000 seconds sys
>
> [root@...ora-riscv ~]# perf record -e cycles -e instructions -e dTLB-load-misses -e dTLB-store-misses \
> -e iTLB-load-misses -c 10000 perf bench sched messaging -g 1 -l 10
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 1 groups == 40 processes run
>
> Total time: 2.160 [sec]
> [ perf record: Woken up 11 times to write data ]
> Warning:
> Processed 291769 events and lost 1 chunks!
>
> [root@...ora-riscv ~]# perf report
>
> Available samples
> 146K cycles ◆
> 146K instructions ▒
> 298 dTLB-load-misses ▒
> 8 dTLB-store-misses ▒
> 211 iTLB-load-misses
>
> [1] https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/riscv-sbi.adoc
> [2] https://drive.google.com/file/d/171j4jFjIkKdj5LWcExphq4xG_2sihbfd/edit
> [3] https://github.com/atishp04/linux/tree/riscv_pmu_v5
> [4] https://github.com/atishp04/qemu/tree/riscv_pmu_v3
> [5] https://github.com/atishp04/u-boot/tree/hifive_unmatched_dt_pmu
> [6] https://sifive.cdn.prismic.io/sifive/de1491e5-077c-461d-9605-e8a0ce57337d_fu740-c000-manual-v1p3.pdf
> [7] https://lkml.org/lkml/2021/12/24/313
>
> Changes from v4->v5:
> 1. Fixed few corner case issues in perf interrupt handling.
> 2. Changed the set_period API so that the caller can compute the initialize
> value.
> 3. Fixed the per cpu interrupt enablement issue.
> 4. Fixed a bug for the privilege mode filtering.
> 5. Modified the sbi driver independent of the DT.
> 6. Removed any DT related modifications.
>
> Changes from v3->v4:
> 1. Do not proceed overflow handler if event doesn't set for sampling.
> 2. overflow status register is only read after counters are stopped.
> 3. Added the PMU DT node for HiFive Unmatched.
>
> Changes from v2->v3:
> 1. Added interrupt overflow support.
> 2. Cleaned up legacy driver initialization.
> 3. Supports perf record now.
> 4. Added the DT binding and maintainers file.
> 5. Changed cpu hotplug notifier to be multi-state.
> 6. OpenSBI doesn't disable cycle/instret counter during boot. Update the
> perf code to disable all the counter during the boot.
>
> Changes from v1->v2
> 1. Implemented the latest SBI PMU extension specification.
> 2. The core platform driver was changed to operate as a library while only
> sbi based PMU is built as a driver. The legacy one is just a fallback if
> SBI PMU extension is not available.
>
> Atish Patra (9):
> RISC-V: Remove the current perf implementation
> RISC-V: Add CSR encodings for all HPMCOUNTERS
> RISC-V: Add a perf core library for pmu drivers
> RISC-V: Add a simple platform driver for RISC-V legacy perf
> RISC-V: Add RISC-V SBI PMU extension definitions
> RISC-V: Add perf platform driver based on SBI PMU extension
> RISC-V: Add sscofpmf extension support
> Documentation: riscv: Remove the old documentation
> MAINTAINERS: Add entry for RISC-V PMU drivers
>
> Documentation/riscv/pmu.rst | 255 ----------
> MAINTAINERS | 9 +
> arch/riscv/Kconfig | 13 -
> arch/riscv/include/asm/csr.h | 66 ++-
> arch/riscv/include/asm/hwcap.h | 1 +
> arch/riscv/include/asm/perf_event.h | 72 ---
> arch/riscv/include/asm/sbi.h | 97 ++++
> arch/riscv/kernel/Makefile | 1 -
> arch/riscv/kernel/cpufeature.c | 1 +
> arch/riscv/kernel/perf_event.c | 485 ------------------
> drivers/perf/Kconfig | 30 ++
> drivers/perf/Makefile | 5 +
> drivers/perf/riscv_pmu.c | 330 ++++++++++++
> drivers/perf/riscv_pmu_legacy.c | 143 ++++++
> drivers/perf/riscv_pmu_sbi.c | 762 ++++++++++++++++++++++++++++
> include/linux/cpuhotplug.h | 1 +
> include/linux/perf/riscv_pmu.h | 73 +++
> 17 files changed, 1517 insertions(+), 827 deletions(-)
> delete mode 100644 Documentation/riscv/pmu.rst
> delete mode 100644 arch/riscv/kernel/perf_event.c
> create mode 100644 drivers/perf/riscv_pmu.c
> create mode 100644 drivers/perf/riscv_pmu_legacy.c
> create mode 100644 drivers/perf/riscv_pmu_sbi.c
> create mode 100644 include/linux/perf/riscv_pmu.h
>
> --
> 2.33.1
>
Apologies for multiple emails on this series (in case you are
subscribed to linux-kernel@...r.kernel.org as well).
I messed up my script earlier and it removed all the CC entries and
sent the email just to lkml by mistake.
Sorry for the noise.
--
Regards,
Atish
Powered by blists - more mailing lists