lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bc423d7b-df03-d4e2-2898-0873db710943@quicinc.com>
Date:   Wed, 22 Jun 2022 17:16:48 +0530
From:   Rajendra Nayak <quic_rjendra@...cinc.com>
To:     Krzysztof Kozlowski <krzysztof.kozlowski@...aro.org>,
        Andy Gross <agross@...nel.org>,
        Bjorn Andersson <bjorn.andersson@...aro.org>,
        "Georgi Djakov" <djakov@...nel.org>,
        Rob Herring <robh+dt@...nel.org>,
        Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will@...nel.org>, <linux-arm-msm@...r.kernel.org>,
        <linux-pm@...r.kernel.org>, <devicetree@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>,
        <linux-arm-kernel@...ts.infradead.org>
CC:     Thara Gopinath <thara.gopinath@...aro.org>
Subject: Re: [PATCH v4 4/4] arm64: dts: qcom: sdm845: Add CPU BWMON


On 6/1/2022 3:41 PM, Krzysztof Kozlowski wrote:
> Add device node for CPU-memory BWMON device (bandwidth monitoring) on
> SDM845 measuring bandwidth between CPU (gladiator_noc) and Last Level
> Cache (memnoc).  Usage of this BWMON allows to remove fixed bandwidth
> votes from cpufreq (CPU nodes) thus achieve high memory throughput even
> with lower CPU frequencies.
> 
> Performance impact (SDM845-MTP RB3 board, linux next-20220422):
> 1. No noticeable impact when running with schedutil or performance
>     governors.
> 
> 2. When comparing to customized kernel with synced interconnects and
>     without bandwidth votes from CPU freq, the sysbench memory tests
>     show significant improvement with bwmon for blocksizes past the L3
>     cache.  The results for such superficial comparison:
> 
> sysbench memory test, results in MB/s (higher is better)
>   bs kB |  type |    V  | V+no bw votes | bwmon | benefit %
>       1 | W/seq | 14795 |          4816 |  4985 |      3.5%
>      64 | W/seq | 41987 |         10334 | 10433 |      1.0%
>    4096 | W/seq | 29768 |          8728 | 32007 |    266.7%
>   65536 | W/seq | 17711 |          4846 | 18399 |    279.6%
> 262144 | W/seq | 16112 |          4538 | 17429 |    284.1%
>      64 | R/seq | 61202 |         67092 | 66804 |     -0.4%
>    4096 | R/seq | 23871 |          5458 | 24307 |    345.4%
>   65536 | R/seq | 18554 |          4240 | 18685 |    340.7%
> 262144 | R/seq | 17524 |          4207 | 17774 |    322.4%
>      64 | W/rnd |  2663 |          1098 |  1119 |      1.9%
>   65536 | W/rnd |   600 |           316 |   610 |     92.7%
>      64 | R/rnd |  4915 |          4784 |  4594 |     -4.0%
>   65536 | R/rnd |   664 |           281 |   678 |    140.7%
> 
> Legend:
> bs kB: block size in KB (small block size means only L1-3 caches are
>        used
> type: R - read, W - write, seq - sequential, rnd - random
> V: vanilla (next-20220422)
> V + no bw votes: vanilla without bandwidth votes from CPU freq
> bwmon: bwmon without bandwidth votes from CPU freq
> benefit %: difference between vanilla without bandwidth votes and bwmon
>             (higher is better)
> 
> Co-developed-by: Thara Gopinath <thara.gopinath@...aro.org>
> Signed-off-by: Thara Gopinath <thara.gopinath@...aro.org>
> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@...aro.org>
> ---
>   arch/arm64/boot/dts/qcom/sdm845.dtsi | 54 ++++++++++++++++++++++++++++
>   1 file changed, 54 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> index 83e8b63f0910..adffb9c70566 100644
> --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
> +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> @@ -2026,6 +2026,60 @@ llcc: system-cache-controller@...0000 {
>   			interrupts = <GIC_SPI 582 IRQ_TYPE_LEVEL_HIGH>;
>   		};
>   
> +		pmu@...6400 {
> +			compatible = "qcom,sdm845-cpu-bwmon";
> +			reg = <0 0x01436400 0 0x600>;
> +
> +			interrupts = <GIC_SPI 581 IRQ_TYPE_LEVEL_HIGH>;
> +
> +			interconnects = <&gladiator_noc MASTER_APPSS_PROC 3 &mem_noc SLAVE_EBI1 3>,
> +					<&osm_l3 MASTER_OSM_L3_APPS &osm_l3 SLAVE_OSM_L3>;
> +			interconnect-names = "ddr", "l3c";

Is this the pmu/bwmon instance between the cpu and caches or the one between the caches and DDR?
Depending on which one it is, shouldn;t we just be scaling either one and not both the interconnect paths?

> +
> +			operating-points-v2 = <&cpu_bwmon_opp_table>;
> +
> +			cpu_bwmon_opp_table: opp-table {
> +				compatible = "operating-points-v2";
> +
> +				/*
> +				 * The interconnect paths bandwidths taken from
> +				 * cpu4_opp_table bandwidth.
> +				 * They also match different tables from
> +				 * msm-4.9 downstream kernel:
> +				 *  - the gladiator_noc-mem_noc from bandwidth
> +				 *    table of qcom,llccbw (property qcom,bw-tbl);
> +				 *    bus width: 4 bytes;
> +				 *  - the OSM L3 from bandwidth table of
> +				 *    qcom,cpu4-l3lat-mon (qcom,core-dev-table);
> +				 *    bus width: 16 bytes;
> +				 */
> +				opp-0 {
> +					opp-peak-kBps = <800000 4800000>;
> +				};
> +				opp-1 {
> +					opp-peak-kBps = <1804000 9216000>;
> +				};
> +				opp-2 {
> +					opp-peak-kBps = <2188000 11980800>;
> +				};
> +				opp-3 {
> +					opp-peak-kBps = <3072000 15052800>;
> +				};
> +				opp-4 {
> +					opp-peak-kBps = <4068000 19353600>;
> +				};
> +				opp-5 {
> +					opp-peak-kBps = <5412000 20889600>;
> +				};
> +				opp-6 {
> +					opp-peak-kBps = <6220000 22425600>;
> +				};
> +				opp-7 {
> +					opp-peak-kBps = <7216000 25497600>;
> +				};
> +			};
> +		};
> +
>   		pcie0: pci@...0000 {
>   			compatible = "qcom,pcie-sdm845";
>   			reg = <0 0x01c00000 0 0x2000>,

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ