lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 4 Apr 2024 20:25:04 -0700
From: Bjorn Andersson <quic_bjorande@...cinc.com>
To: Andrew Halaney <ahalaney@...hat.com>
CC: <linux-arm-msm@...r.kernel.org>, <robdclark@...il.com>, <will@...nel.org>,
        <iommu@...ts.linux.dev>, <joro@...tes.org>,
        <linux-arm-kernel@...ts.infradead.org>, <linux-kernel@...r.kernel.org>,
        <quic_c_gdjako@...cinc.com>, <quic_cgoldswo@...cinc.com>,
        <quic_sukadev@...cinc.com>, <quic_pdaly@...cinc.com>,
        <quic_sudaraja@...cinc.com>
Subject: Re: sa8775p-ride: What's a normal SMMU TLB sync time?

On Tue, Apr 02, 2024 at 04:22:31PM -0500, Andrew Halaney wrote:
> Hey,
> 
> Sorry for the wide email, but I figured someone recently contributing
> to / maintaining the Qualcomm SMMU driver may have some proper insights
> into this.
> 
> Recently I remembered that performance on some Qualcomm platforms
> takes a major hit when you use iommu.strict=1/CONFIG_IOMMU_DEFAULT_DMA_STRICT.
> 
> On the sa8775p-ride, I see most TLB sync calls to be about 150 us long,
> with some spiking to 500 us, etc:
> 
>     [root@...-snapdragon-ride4-sa8775p-09 ~]# trace-cmd start -p function_graph -g qcom_smmu_tlb_sync --max-graph-depth 1
>       plugin 'function_graph'
>     [root@...-snapdragon-ride4-sa8775p-09 ~]# trace-cmd show
>     # tracer: function_graph
>     #
>     # CPU  DURATION                  FUNCTION CALLS
>     # |     |   |                     |   |   |   |
>      0) ! 144.062 us  |  qcom_smmu_tlb_sync();
> 
> On my sc8280xp-lenovo-thinkpad-x13s (only other Qualcomm platform I can compare
> with) I see around 2-15 us with spikes up to 20-30 us. That's thanks to this
> patch[0], which I guess improved the platform from 1-2 ms to the ~10 us number.
> 
> It's not entirely clear to me how a DPU specific programming affects system
> wide SMMU performance, but I'm curious if this is the only way to achieve this?
> sa8775p doesn't have the DPU described even right now, so that's a bummer
> as there's no way to make a similar immediate optimization, but I'm still struggling
> to understand what that patch really did to improve things so maybe I'm missing
> something.
> 

The cause was that the TLB sync is synchronized with the display updates,
but without appropriate safe_lut_tlb values the display side wouldn't
play nice.

Regards,
Bjorn

> I'm honestly not even sure what a "typical" range for TLB sync time would be,
> but on sa8775p-ride its bad enough that some IRQs like UFS can cause RCU stalls
> (pretty easy to reproduce with fio basic-verify.fio for example on the platform).
> It also makes running with iommu.strict=1 impractical as performance for UFS,
> ethernet, etc drops 75-80%.
> 
> Does anyone have any bright ideas on how to improve this, or if I'm even in
> the right for assuming that time is suspiciously long?
> 
> Thanks,
> Andrew
> 
> [0] https://lore.kernel.org/linux-arm-msm/CAF6AEGs9PLiCZdJ-g42-bE6f9yMR6cMyKRdWOY5m799vF9o4SQ@mail.gmail.com/
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ