lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260102184113.GA125261@ziepe.ca>
Date: Fri, 2 Jan 2026 14:41:13 -0400
From: Jason Gunthorpe <jgg@...pe.ca>
To: Dawei Li <dawei.li@...ux.dev>
Cc: will@...nel.org, robin.murphy@....com, joro@...tes.org,
	linux-arm-kernel@...ts.infradead.org, iommu@...ts.linux.dev,
	linux-kernel@...r.kernel.org, set_pte_at@...look.com,
	stable@...r.kernel.org
Subject: Re: [PATCH] iommu/arm-smmu-v3: Maintain valid access attributes for
 non-coherent SMMU

On Mon, Dec 29, 2025 at 08:23:54AM +0800, Dawei Li wrote:
> According to SMMUv3 architecture specification, IO-coherent access for
> SMMU is supported for:
> - Translation table walks.
> - Fetches of L1STD, STE, L1CD and CD.
> - Command queue, Event queue and PRI queue access.
> - GERROR, CMD_SYNC, Event queue and PRI queue MSIs, if supported

I was recently looking at this too..  IMHO this is not really a clean
description of what this patch is doing.

I would write this description as:

When the SMMU does a DMA for itself it can set various memory access
attributes which control how the interconnect should execute the
DMA. Linux uses these to differentiate DMA that must snoop the cache
and DMA that must bypass it because Linux has allocated non-coherent
on the CPU.

In Table "13.8 Attributes for SMMU-originated accesses" each of the
different types of DMA is categorized and the specific bits
controlling the memory attribute for the fetch are identified.

Make this consisent globally. If Linux has cache flushed the buffer,
or allocated a DMA incoherenet buffer, then it should set the
non-caching memory attribute so the DMA matches.

This is important for some of the allocations where Linux is currently
allocating DMA coherent memory, meaning nothing has made the CPU cache
coherent and doing any coherent access to that memory may result in
cache inconsistencies.

This may solve problems in systems where the SMMU driver thinks the
SMMU is non-coherent, but in fact, the SMMU and the interconnect
selectively supports coherence and setting the wrong memory attributes
will cause non-working cached access.

[and then if you have a specific SOC that shows an issue please
describe the HW]

> +static __always_inline bool smmu_coherent(struct arm_smmu_device *smmu)
> +{
> +	return !!(smmu->features & ARM_SMMU_FEAT_COHERENCY);
> +}
> +
>  /* High-level queue accessors */
> -static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
> +static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent,
> +				   struct arm_smmu_device *smmu)
>  {
>  	memset(cmd, 0, 1 << CMDQ_ENT_SZ_SHIFT);
>  	cmd[0] |= FIELD_PREP(CMDQ_0_OP, ent->opcode);
> @@ -358,8 +364,13 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
>  		} else {
>  			cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_CS, CMDQ_SYNC_0_CS_SEV);
>  		}
> -		cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_MSH, ARM_SMMU_SH_ISH);
> -		cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_MSIATTR, ARM_SMMU_MEMATTR_OIWB);
> +		if (smmu_coherent(smmu)) {
> +			cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_MSH, ARM_SMMU_SH_ISH);
> +			cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_MSIATTR, ARM_SMMU_MEMATTR_OIWB);
> +		} else {
> +			cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_MSH, ARM_SMMU_SH_OSH);
> +			cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_MSIATTR, ARM_SMMU_MEMATTR_OINC);
> +		}

And then please go through your patch and add comments actually
explaining what the DMA is and what memory is being reached by it -
since it is not always very clear from the ARM mnemonics

For instance, this is:
 /* DMA for "CMDQ MSI" which targets q->base_dma allocated by arm_smmu_init_one_queue() */

> @@ -1612,11 +1624,18 @@ void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target,
>  		(cd_table->cdtab_dma & STRTAB_STE_0_S1CTXPTR_MASK) |
>  		FIELD_PREP(STRTAB_STE_0_S1CDMAX, cd_table->s1cdmax));
>  
> +	if (smmu_coherent(smmu)) {
> +		val = FIELD_PREP(STRTAB_STE_1_S1CIR, STRTAB_STE_1_S1C_CACHE_WBRA) |
> +		      FIELD_PREP(STRTAB_STE_1_S1COR, STRTAB_STE_1_S1C_CACHE_WBRA) |
> +		      FIELD_PREP(STRTAB_STE_1_S1CSH, ARM_SMMU_SH_ISH);
> +	} else {
> +		val = FIELD_PREP(STRTAB_STE_1_S1CIR, STRTAB_STE_1_S1C_CACHE_NC) |
> +		      FIELD_PREP(STRTAB_STE_1_S1COR, STRTAB_STE_1_S1C_CACHE_NC) |
> +		      FIELD_PREP(STRTAB_STE_1_S1CSH, ARM_SMMU_SH_OSH);
> +	}

This one is "CD fetch" allocated by arm_smmu_alloc_cd_ptr()

etc

And note that the above will need this hunk too:

+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
@@ -432,6 +432,14 @@ size_t arm_smmu_get_viommu_size(struct device *dev,
            !(smmu->features & ARM_SMMU_FEAT_S2FWB))
                return 0;
 
+       /*
+        * When running non-coherent we can't suppot S2FWB since it will also
+        * force a coherent CD fetch, aside from the question of what
+        * S2FWB/CANWBS even does with non-coherent SMMUs.
+        */
+       if (!smmu_coherent(smmu))
+               return 0;

> @@ -3746,7 +3765,7 @@ int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
>  	q->cons_reg	= page + cons_off;
>  	q->ent_dwords	= dwords;
>  
> -	q->q_base  = Q_BASE_RWA;
> +	q->q_base  = smmu_coherent(smmu) ? Q_BASE_RWA : 0;

CMDQ fetch, though do we even need to manage RWA? Isn't it ignored if
IC/OC/SH are set to their non-cachable values?

etc..

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ