[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bc423d7b-df03-d4e2-2898-0873db710943@quicinc.com>
Date: Wed, 22 Jun 2022 17:16:48 +0530
From: Rajendra Nayak <quic_rjendra@...cinc.com>
To: Krzysztof Kozlowski <krzysztof.kozlowski@...aro.org>,
Andy Gross <agross@...nel.org>,
Bjorn Andersson <bjorn.andersson@...aro.org>,
"Georgi Djakov" <djakov@...nel.org>,
Rob Herring <robh+dt@...nel.org>,
Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will@...nel.org>, <linux-arm-msm@...r.kernel.org>,
<linux-pm@...r.kernel.org>, <devicetree@...r.kernel.org>,
<linux-kernel@...r.kernel.org>,
<linux-arm-kernel@...ts.infradead.org>
CC: Thara Gopinath <thara.gopinath@...aro.org>
Subject: Re: [PATCH v4 4/4] arm64: dts: qcom: sdm845: Add CPU BWMON
On 6/1/2022 3:41 PM, Krzysztof Kozlowski wrote:
> Add device node for CPU-memory BWMON device (bandwidth monitoring) on
> SDM845 measuring bandwidth between CPU (gladiator_noc) and Last Level
> Cache (memnoc). Usage of this BWMON allows to remove fixed bandwidth
> votes from cpufreq (CPU nodes) thus achieve high memory throughput even
> with lower CPU frequencies.
>
> Performance impact (SDM845-MTP RB3 board, linux next-20220422):
> 1. No noticeable impact when running with schedutil or performance
> governors.
>
> 2. When comparing to customized kernel with synced interconnects and
> without bandwidth votes from CPU freq, the sysbench memory tests
> show significant improvement with bwmon for blocksizes past the L3
> cache. The results for such superficial comparison:
>
> sysbench memory test, results in MB/s (higher is better)
> bs kB | type | V | V+no bw votes | bwmon | benefit %
> 1 | W/seq | 14795 | 4816 | 4985 | 3.5%
> 64 | W/seq | 41987 | 10334 | 10433 | 1.0%
> 4096 | W/seq | 29768 | 8728 | 32007 | 266.7%
> 65536 | W/seq | 17711 | 4846 | 18399 | 279.6%
> 262144 | W/seq | 16112 | 4538 | 17429 | 284.1%
> 64 | R/seq | 61202 | 67092 | 66804 | -0.4%
> 4096 | R/seq | 23871 | 5458 | 24307 | 345.4%
> 65536 | R/seq | 18554 | 4240 | 18685 | 340.7%
> 262144 | R/seq | 17524 | 4207 | 17774 | 322.4%
> 64 | W/rnd | 2663 | 1098 | 1119 | 1.9%
> 65536 | W/rnd | 600 | 316 | 610 | 92.7%
> 64 | R/rnd | 4915 | 4784 | 4594 | -4.0%
> 65536 | R/rnd | 664 | 281 | 678 | 140.7%
>
> Legend:
> bs kB: block size in KB (small block size means only L1-3 caches are
> used
> type: R - read, W - write, seq - sequential, rnd - random
> V: vanilla (next-20220422)
> V + no bw votes: vanilla without bandwidth votes from CPU freq
> bwmon: bwmon without bandwidth votes from CPU freq
> benefit %: difference between vanilla without bandwidth votes and bwmon
> (higher is better)
>
> Co-developed-by: Thara Gopinath <thara.gopinath@...aro.org>
> Signed-off-by: Thara Gopinath <thara.gopinath@...aro.org>
> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@...aro.org>
> ---
> arch/arm64/boot/dts/qcom/sdm845.dtsi | 54 ++++++++++++++++++++++++++++
> 1 file changed, 54 insertions(+)
>
> diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> index 83e8b63f0910..adffb9c70566 100644
> --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
> +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> @@ -2026,6 +2026,60 @@ llcc: system-cache-controller@...0000 {
> interrupts = <GIC_SPI 582 IRQ_TYPE_LEVEL_HIGH>;
> };
>
> + pmu@...6400 {
> + compatible = "qcom,sdm845-cpu-bwmon";
> + reg = <0 0x01436400 0 0x600>;
> +
> + interrupts = <GIC_SPI 581 IRQ_TYPE_LEVEL_HIGH>;
> +
> + interconnects = <&gladiator_noc MASTER_APPSS_PROC 3 &mem_noc SLAVE_EBI1 3>,
> + <&osm_l3 MASTER_OSM_L3_APPS &osm_l3 SLAVE_OSM_L3>;
> + interconnect-names = "ddr", "l3c";
Is this the pmu/bwmon instance between the cpu and caches or the one between the caches and DDR?
Depending on which one it is, shouldn;t we just be scaling either one and not both the interconnect paths?
> +
> + operating-points-v2 = <&cpu_bwmon_opp_table>;
> +
> + cpu_bwmon_opp_table: opp-table {
> + compatible = "operating-points-v2";
> +
> + /*
> + * The interconnect paths bandwidths taken from
> + * cpu4_opp_table bandwidth.
> + * They also match different tables from
> + * msm-4.9 downstream kernel:
> + * - the gladiator_noc-mem_noc from bandwidth
> + * table of qcom,llccbw (property qcom,bw-tbl);
> + * bus width: 4 bytes;
> + * - the OSM L3 from bandwidth table of
> + * qcom,cpu4-l3lat-mon (qcom,core-dev-table);
> + * bus width: 16 bytes;
> + */
> + opp-0 {
> + opp-peak-kBps = <800000 4800000>;
> + };
> + opp-1 {
> + opp-peak-kBps = <1804000 9216000>;
> + };
> + opp-2 {
> + opp-peak-kBps = <2188000 11980800>;
> + };
> + opp-3 {
> + opp-peak-kBps = <3072000 15052800>;
> + };
> + opp-4 {
> + opp-peak-kBps = <4068000 19353600>;
> + };
> + opp-5 {
> + opp-peak-kBps = <5412000 20889600>;
> + };
> + opp-6 {
> + opp-peak-kBps = <6220000 22425600>;
> + };
> + opp-7 {
> + opp-peak-kBps = <7216000 25497600>;
> + };
> + };
> + };
> +
> pcie0: pci@...0000 {
> compatible = "qcom,pcie-sdm845";
> reg = <0 0x01c00000 0 0x2000>,
Powered by blists - more mailing lists