lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aMQroI4NDu74PDGT@willie-the-truck>
Date: Fri, 12 Sep 2025 15:18:08 +0100
From: Will Deacon <will@...nel.org>
To: Mostafa Saleh <smostafa@...gle.com>
Cc: linux-kernel@...r.kernel.org, kvmarm@...ts.linux.dev,
	linux-arm-kernel@...ts.infradead.org, iommu@...ts.linux.dev,
	maz@...nel.org, oliver.upton@...ux.dev, joey.gouly@....com,
	suzuki.poulose@....com, yuzenghui@...wei.com,
	catalin.marinas@....com, robin.murphy@....com,
	jean-philippe@...aro.org, qperret@...gle.com, tabba@...gle.com,
	jgg@...pe.ca, mark.rutland@....com, praan@...gle.com
Subject: Re: [PATCH v4 22/28] iommu/arm-smmu-v3-kvm: Emulate CMDQ for host

On Tue, Aug 19, 2025 at 09:51:50PM +0000, Mostafa Saleh wrote:
> Don’t allow access to the command queue from the host:
> - ARM_SMMU_CMDQ_BASE: Only allowed to be written when CMDQ is disabled, we
>   use it to keep track of the host command queue base.
>   Reads return the saved value.
> - ARM_SMMU_CMDQ_PROD: Writes trigger command queue emulation which sanitises
>   and filters the whole range. Reads returns the host copy.
> - ARM_SMMU_CMDQ_CONS: Writes move the sw copy of the cons, but the host can’t
>   skip commands once submitted. Reads return the emulated value and the error
>   bits in the actual cons.
> 
> Signed-off-by: Mostafa Saleh <smostafa@...gle.com>
> ---
>  .../iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c  | 108 +++++++++++++++++-
>  1 file changed, 105 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
> index 554229e466f3..10c6461bbf12 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/pkvm/arm-smmu-v3.c
> @@ -325,6 +325,88 @@ static bool is_cmdq_enabled(struct hyp_arm_smmu_v3_device *smmu)
>  	return FIELD_GET(CR0_CMDQEN, smmu->cr0);
>  }
>  
> +static bool smmu_filter_command(struct hyp_arm_smmu_v3_device *smmu, u64 *command)
> +{
> +	u64 type = FIELD_GET(CMDQ_0_OP, command[0]);
> +
> +	switch (type) {
> +	case CMDQ_OP_CFGI_STE:
> +		/* TBD: SHADOW_STE*/
> +		break;
> +	case CMDQ_OP_CFGI_ALL:
> +	{
> +		/*
> +		 * Linux doesn't use range STE invalidation, and only use this
> +		 * for CFGI_ALL, which is done on reset and not on an new STE
> +		 * being used.
> +		 * Although, this is not architectural we rely on the current Linux
> +		 * implementation.
> +		 */
> +		WARN_ON((FIELD_GET(CMDQ_CFGI_1_RANGE, command[1]) != 31));
> +		break;
> +	}
> +	case CMDQ_OP_TLBI_NH_ASID:
> +	case CMDQ_OP_TLBI_NH_VA:
> +	case 0x13: /* CMD_TLBI_NH_VAA: Not used by Linux */
> +	{
> +		/* Only allow VMID = 0*/
> +		if (FIELD_GET(CMDQ_TLBI_0_VMID, command[0]) == 0)
> +			break;
> +		break;
> +	}
> +	case 0x10: /* CMD_TLBI_NH_ALL: Not used by Linux */
> +	case CMDQ_OP_TLBI_EL2_ALL:
> +	case CMDQ_OP_TLBI_EL2_VA:
> +	case CMDQ_OP_TLBI_EL2_ASID:
> +	case CMDQ_OP_TLBI_S12_VMALL:
> +	case 0x23: /* CMD_TLBI_EL2_VAA: Not used by Linux */
> +		/* Malicous host */
> +		return WARN_ON(true);
> +	case CMDQ_OP_CMD_SYNC:
> +		if (FIELD_GET(CMDQ_SYNC_0_CS, command[0]) == CMDQ_SYNC_0_CS_IRQ) {
> +			/* Allow it, but let the host timeout, as this should never happen. */
> +			command[0] &= ~CMDQ_SYNC_0_CS;
> +			command[0] |= FIELD_PREP(CMDQ_SYNC_0_CS, CMDQ_SYNC_0_CS_SEV);
> +			command[1] &= ~CMDQ_SYNC_1_MSIADDR_MASK;
> +		}
> +		break;
> +	}
> +
> +	return false;
> +}
> +
> +static void smmu_emulate_cmdq_insert(struct hyp_arm_smmu_v3_device *smmu)
> +{
> +	u64 *host_cmdq = hyp_phys_to_virt(smmu->cmdq_host.q_base & Q_BASE_ADDR_MASK);
> +	int idx;
> +	u64 cmd[CMDQ_ENT_DWORDS];
> +	bool skip;
> +
> +	if (!is_cmdq_enabled(smmu))
> +		return;
> +
> +	while (!queue_empty(&smmu->cmdq_host.llq)) {
> +		/* Wait for the command queue to have some space. */
> +		WARN_ON(smmu_wait_event(smmu, !smmu_cmdq_full(&smmu->cmdq)));
> +
> +		idx = Q_IDX(&smmu->cmdq_host.llq, smmu->cmdq_host.llq.cons);
> +		/* Avoid TOCTOU */
> +		memcpy(cmd, &host_cmdq[idx * CMDQ_ENT_DWORDS], CMDQ_ENT_DWORDS << 3);
> +		skip = smmu_filter_command(smmu, cmd);
> +		if (!skip)
> +			smmu_add_cmd_raw(smmu, cmd);
> +		queue_inc_cons(&smmu->cmdq_host.llq);
> +	}

Hmmm. There's something I'd not considered before here.

Ideally, the data structures that are shadowed by the hypervisor would
be mapped as normal-WB cacheable in both the host and the hypervisor so
we don't have to worry about coherency and we get the performance
benefits from the caches. Indeed, I think that's how you've mapped
'host_cmdq' above _however_ I sadly don't think we can do that if the
actual SMMU hardware isn't coherent.

We don't have a way to say things like "The STEs and CMDQ are coherent
but the CDs and Stage-1 page-tables aren't" so that means we have to
treat the shadowed structures populated by the host in the same way as
the host-owned structures that are consumed directly by the hardware.
Consequently, we should either be using non-cacheable mappings at EL2
for these structures or doing CMOs around the accesses.

Will

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ