[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zg9vEJV5JyGoM8KY@hu-bjorande-lv.qualcomm.com>
Date: Thu, 4 Apr 2024 20:25:04 -0700
From: Bjorn Andersson <quic_bjorande@...cinc.com>
To: Andrew Halaney <ahalaney@...hat.com>
CC: <linux-arm-msm@...r.kernel.org>, <robdclark@...il.com>, <will@...nel.org>,
<iommu@...ts.linux.dev>, <joro@...tes.org>,
<linux-arm-kernel@...ts.infradead.org>, <linux-kernel@...r.kernel.org>,
<quic_c_gdjako@...cinc.com>, <quic_cgoldswo@...cinc.com>,
<quic_sukadev@...cinc.com>, <quic_pdaly@...cinc.com>,
<quic_sudaraja@...cinc.com>
Subject: Re: sa8775p-ride: What's a normal SMMU TLB sync time?
On Tue, Apr 02, 2024 at 04:22:31PM -0500, Andrew Halaney wrote:
> Hey,
>
> Sorry for the wide email, but I figured someone recently contributing
> to / maintaining the Qualcomm SMMU driver may have some proper insights
> into this.
>
> Recently I remembered that performance on some Qualcomm platforms
> takes a major hit when you use iommu.strict=1/CONFIG_IOMMU_DEFAULT_DMA_STRICT.
>
> On the sa8775p-ride, I see most TLB sync calls to be about 150 us long,
> with some spiking to 500 us, etc:
>
> [root@...-snapdragon-ride4-sa8775p-09 ~]# trace-cmd start -p function_graph -g qcom_smmu_tlb_sync --max-graph-depth 1
> plugin 'function_graph'
> [root@...-snapdragon-ride4-sa8775p-09 ~]# trace-cmd show
> # tracer: function_graph
> #
> # CPU DURATION FUNCTION CALLS
> # | | | | | | |
> 0) ! 144.062 us | qcom_smmu_tlb_sync();
>
> On my sc8280xp-lenovo-thinkpad-x13s (only other Qualcomm platform I can compare
> with) I see around 2-15 us with spikes up to 20-30 us. That's thanks to this
> patch[0], which I guess improved the platform from 1-2 ms to the ~10 us number.
>
> It's not entirely clear to me how a DPU specific programming affects system
> wide SMMU performance, but I'm curious if this is the only way to achieve this?
> sa8775p doesn't have the DPU described even right now, so that's a bummer
> as there's no way to make a similar immediate optimization, but I'm still struggling
> to understand what that patch really did to improve things so maybe I'm missing
> something.
>
The cause was that the TLB sync is synchronized with the display updates,
but without appropriate safe_lut_tlb values the display side wouldn't
play nice.
Regards,
Bjorn
> I'm honestly not even sure what a "typical" range for TLB sync time would be,
> but on sa8775p-ride its bad enough that some IRQs like UFS can cause RCU stalls
> (pretty easy to reproduce with fio basic-verify.fio for example on the platform).
> It also makes running with iommu.strict=1 impractical as performance for UFS,
> ethernet, etc drops 75-80%.
>
> Does anyone have any bright ideas on how to improve this, or if I'm even in
> the right for assuming that time is suspiciously long?
>
> Thanks,
> Andrew
>
> [0] https://lore.kernel.org/linux-arm-msm/CAF6AEGs9PLiCZdJ-g42-bE6f9yMR6cMyKRdWOY5m799vF9o4SQ@mail.gmail.com/
>
Powered by blists - more mailing lists