[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aMsJHheI6Y1V5q74@linaro.org>
Date: Wed, 17 Sep 2025 21:16:46 +0200
From: Stephan Gerhold <stephan.gerhold@...aro.org>
To: Robin Murphy <robin.murphy@....com>
Cc: Will Deacon <will@...nel.org>, Joerg Roedel <joro@...tes.org>,
Rob Clark <robin.clark@....qualcomm.com>,
Manivannan Sadhasivam <mani@...nel.org>,
Johan Hovold <johan@...nel.org>,
Bjorn Andersson <andersson@...nel.org>, iommu@...ts.linux.dev,
linux-arm-msm@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] iommu/arm-smmu-qcom: Enable use of all SMR groups when
running bare-metal
On Wed, Sep 17, 2025 at 07:02:52PM +0100, Robin Murphy wrote:
> On 2025-09-09 4:35 pm, Stephan Gerhold wrote:
> > On Tue, Sep 09, 2025 at 01:57:11PM +0100, Will Deacon wrote:
> > > On Thu, Aug 21, 2025 at 10:33:53AM +0200, Stephan Gerhold wrote:
> > > > Some platforms (e.g. SC8280XP and X1E) support more than 128 stream
> > > > matching groups. This is more than what is defined as maximum by the ARM
> > > > SMMU architecture specification. Commit 122611347326 ("iommu/arm-smmu-qcom:
> > > > Limit the SMR groups to 128") disabled use of the additional groups because
> > > > they don't exhibit the same behavior as the architecture supported ones.
> > > >
> > > > It seems like this is just another quirk of the hypervisor: When running
> > > > bare-metal without the hypervisor, the additional groups appear to behave
> > > > just like all others. The boot firmware uses some of the additional groups,
> > > > so ignoring them in this situation leads to stream match conflicts whenever
> > > > we allocate a new SMR group for the same SID.
> > > >
> > > > The workaround exists primarily because the bypass quirk detection fails
> > > > when using a S2CR register from the additional matching groups, so let's
> > > > perform the test with the last reliable S2CR (127) and then limit the
> > > > number of SMR groups only if we detect that we are running below the
> > > > hypervisor (because of the bypass quirk).
> > > >
> > > > Fixes: 122611347326 ("iommu/arm-smmu-qcom: Limit the SMR groups to 128")
> > > > Signed-off-by: Stephan Gerhold <stephan.gerhold@...aro.org>
> > > > ---
> > > > I modified arm_smmu_find_sme() to prefer allocating from the SMR groups
> > > > above 128 (until they are all used). I did not see any issues, so I don't
> > > > see any indication that they behave any different from the others.
> > > > ---
> > > > drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 27 +++++++++++++++++----------
> > > > 1 file changed, 17 insertions(+), 10 deletions(-)
> > >
> > > Is the existing workaround causing you problems somehow? Limiting the SMR
> > > groups to what the architecture allows still seems like the best bet to
> > > me unless there's a compelling reason to do something else.
> > >
> >
> > Yes, the problem is the following (copied from commit message above):
> >
> > > The boot firmware uses some of the additional groups, so ignoring them
> > > in this situation leads to stream match conflicts whenever we allocate
> > > a new SMR group for the same SID.
> >
> > This happens e.g. in the following situation on SC8280XP when enabling
> > video decoding acceleration bare-metal without the hypervisor:
> >
> > 1. The SMMU is already set up by the boot firmware before Linux is
> > started, so some SMRs are already in use during boot. I added some
> > code to dump them:
> >
> > arm-smmu 15000000.iommu: Found SMR0 <0xe0 0x0>
> > ...
> > arm-smmu 15000000.iommu: Found SMR8 <0x800 0x0>
> > <unused>
> > arm-smmu 15000000.iommu: Found SMR170 <0x2a22 0x400>
> > arm-smmu 15000000.iommu: Found SMR171 <0x2a02 0x400>
> > ...
> > arm-smmu 15000000.iommu: Found SMR211 <0x400 0x3>
> >
> > 2. We limit the SMRs to 128, so all the ones >= 170 just stay as-is.
> > Only the ones < 128 are considered when allocating SMRs.
> >
> > 3. We need to configure the following IOMMU for video acceleration:
> >
> > video-firmware {
> > iommus = <&apps_smmu 0x2a02 0x400>;
> > };
> >
> > 4. arm-smmu 15000000.iommu: Picked SMR 14 for SID 0x2a02 mask 0x400
> > ... but SMR170 already uses that SID+mask!
> >
> > 5. arm-smmu 15000000.iommu: Unexpected global fault, this could be serious
> > arm-smmu 15000000.iommu: GFSR 0x80000004, GFSYNR0 0x0000000c, GFSYNR1 0x00002a02, GFSYNR2 0x00000000
> >
> > SMCF, bit[2] is set -> Stream match conflict fault
> > caused by SID GFSYNR1 0x00002a02
> >
> > With my patch this does not happen anymore. As I wrote, so far I have
> > seen no indication that the extra groups behave any different from the
> > standard ones defined by the architecture. I don't know why it was done
> > this way (rather than e.g. implementing the Extended Stream Matching
> > Extension), but we definitely need to do something with the extra SMRs
> > to avoid stream match conflicts.
>
> I'm also a little wary of exposing more non-architectural stuff to the main
> driver - could we not keep the existing logic and simply add an extra loop
> at the end here to ensure any "extra" SMRs are disabled?
>
It's not that simple at least, because some of these SMRs are used by
co-processors (remoteprocs) that are already active during boot and we
need to keep them in bypass until they are taken over by the drivers in
Linux. Any interruption inbetween could cause the remoteprocs to crash.
With my changes, the boot SMRs stay active (at the same index), because
there is an existing loop inside qcom_smmu_cfg_probe() that preserves
them as bypass:
for (i = 0; i < smmu->num_mapping_groups; i++) {
smr = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_SMR(i));
if (FIELD_GET(ARM_SMMU_SMR_VALID, smr)) {
/* Ignore valid bit for SMR mask extraction. */
smr &= ~ARM_SMMU_SMR_VALID;
smmu->smrs[i].id = FIELD_GET(ARM_SMMU_SMR_ID, smr);
smmu->smrs[i].mask = FIELD_GET(ARM_SMMU_SMR_MASK, smr);
smmu->smrs[i].valid = true;
smmu->s2crs[i].type = S2CR_TYPE_BYPASS;
smmu->s2crs[i].privcfg = S2CR_PRIVCFG_DEFAULT;
smmu->s2crs[i].cbndx = 0xff;
}
}
We could "move" the SMRs > 128 to earlier indexes, but this also needs
to be done carefully in order to avoid:
- Stream match conflicts, if we write the new entry before deleting the
old one.
- Unhandled transactions, if we delete the old entry before writing the
new one.
Currently this can't happen, because we don't move any entries around.
We could do it similar to arm_smmu_rmr_install_bypass_smr() and add:
/*
* Rather than trying to look at existing mappings that
* are setup by the firmware and then invalidate the ones
* that do no have matching RMR entries, just disable the
* SMMU until it gets enabled again in the reset routine.
*/
reg = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sCR0);
reg |= ARM_SMMU_sCR0_CLIENTPD;
arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_sCR0, reg);
However, this would need to be done carefully only for the bare-metal
case, since I doubt Qualcomm's hypervisor will allow disabling all
access protections by setting CLIENTPD.
I can try implementing this, but the resulting code will likely be more
complex than this patch.
I realize it is weird to allow non-architectural features like this, but
I haven't found any indication that the additional SMRs work any
different from the standard ones. The SMMU spec seems to reserve space
for up to 256 SMRs in the address space and the register bits, as if it
was intended to be extended like this later. That's also why it works
correctly without any changes in arm-smmu.c: the bit masks used there
already allow up to 256 SMRs.
What do you think?
Thanks,
Stephan
Powered by blists - more mailing lists