linux-kernel - Re: [PATCH v9 5/6] iommu/arm-smmu-v3: Add in-kernel support for NVIDIA Tegra241 (Grace) CMDQV

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240702174152.GA4740@willie-the-truck>
Date: Tue, 2 Jul 2024 18:41:53 +0100
From: Will Deacon <will@...nel.org>
To: Nicolin Chen <nicolinc@...dia.com>
Cc: robin.murphy@....com, joro@...tes.org, jgg@...dia.com,
	thierry.reding@...il.com, vdumpa@...dia.com, jonathanh@...dia.com,
	linux-kernel@...r.kernel.org, iommu@...ts.linux.dev,
	linux-arm-kernel@...ts.infradead.org, linux-tegra@...r.kernel.org
Subject: Re: [PATCH v9 5/6] iommu/arm-smmu-v3: Add in-kernel support for
 NVIDIA Tegra241 (Grace) CMDQV

On Wed, Jun 12, 2024 at 02:45:32PM -0700, Nicolin Chen wrote:
> From: Nate Watterson <nwatterson@...dia.com>
> 
> NVIDIA's Tegra241 Soc has a CMDQ-Virtualization (CMDQV) hardware, extending
> the standard ARM SMMU v3 IP to support multiple VCMDQs with virtualization
> capabilities. In terms of command queue, they are very like a standard SMMU
> CMDQ (or ECMDQs), but only support CS_NONE in the CS field of CMD_SYNC.
> 
> Add a new tegra241-cmdqv driver, and insert its structure pointer into the
> existing arm_smmu_device, and then add related function calls in the SMMUv3
> driver to interact with the CMDQV driver.
> 
> In the CMDQV driver, add a minimal part for the in-kernel support: reserve
> VINTF0 for in-kernel use, and assign some of the VCMDQs to the VINTF0, and
> select one VCMDQ based on the current CPU ID to execute supported commands.
> This multi-queue design for in-kernel use gives some limited improvements:
> up to 20% reduction of invalidation time was measured by a multi-threaded
> DMA unmap benchmark, compared to a single queue.
> 
> The other part of the CMDQV driver will be user-space support that gives a
> hypervisor running on the host OS to talk to the driver for virtualization
> use cases, allowing VMs to use VCMDQs without trappings, i.e. no VM Exits.
> This is designed based on IOMMUFD, and its RFC series is also under review.
> It will provide a guest OS a bigger improvement: 70% to 90% reductions of
> TLB invalidation time were measured by DMA unmap tests running in a guest,
> compared to nested SMMU CMDQ (with trappings).
> 
> However, it is very important for this in-kernel support to get merged and
> installed to VMs running on Grace-powered servers as soon as possible. So,
> later those servers would only need to upgrade their host kernels for the
> user-space support.

^^^ This is a weird paragraph to put in the commit message.

> 
> As the initial version, the CMDQV driver only supports ACPI configurations.
> 
> Signed-off-by: Nate Watterson <nwatterson@...dia.com>
> Reviewed-by: Jason Gunthorpe <jgg@...dia.com>
> Co-developed-by: Nicolin Chen <nicolinc@...dia.com>
> Signed-off-by: Nicolin Chen <nicolinc@...dia.com>
> ---
>  MAINTAINERS                                   |   1 +
>  drivers/iommu/Kconfig                         |  11 +
>  drivers/iommu/arm/arm-smmu-v3/Makefile        |   1 +
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |  52 +-
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  50 ++
>  .../iommu/arm/arm-smmu-v3/tegra241-cmdqv.c    | 842 ++++++++++++++++++
>  6 files changed, 945 insertions(+), 12 deletions(-)
>  create mode 100644 drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index aacccb376c28..ecf7af1b2df8 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -22078,6 +22078,7 @@ M:	Thierry Reding <thierry.reding@...il.com>
>  R:	Krishna Reddy <vdumpa@...dia.com>
>  L:	linux-tegra@...r.kernel.org
>  S:	Supported
> +F:	drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c
>  F:	drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c
>  F:	drivers/iommu/tegra*
>  
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index c04584be3089..e009387d3cba 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -423,6 +423,17 @@ config ARM_SMMU_V3_KUNIT_TEST
>  	  Enable this option to unit-test arm-smmu-v3 driver functions.
>  
>  	  If unsure, say N.
> +
> +config TEGRA241_CMDQV
> +	bool "NVIDIA Tegra241 CMDQ-V extension support for ARM SMMUv3"
> +	depends on ACPI
> +	help
> +	  Support for NVIDIA CMDQ-Virtualization extension for ARM SMMUv3. The
> +	  CMDQ-V extension is similar to v3.3 ECMDQ for multi command queues
> +	  support, except with virtualization capabilities.
> +
> +	  Say Y here if your system is NVIDIA Tegra241 (Grace) or it has the same
> +	  CMDQ-V extension.
>  endif
>  
>  config S390_IOMMU
> diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile b/drivers/iommu/arm/arm-smmu-v3/Makefile
> index 014a997753a8..55201fdd7007 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/Makefile
> +++ b/drivers/iommu/arm/arm-smmu-v3/Makefile
> @@ -2,6 +2,7 @@
>  obj-$(CONFIG_ARM_SMMU_V3) += arm_smmu_v3.o
>  arm_smmu_v3-objs-y += arm-smmu-v3.o
>  arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o
> +arm_smmu_v3-objs-$(CONFIG_TEGRA241_CMDQV) += tegra241-cmdqv.o
>  arm_smmu_v3-objs := $(arm_smmu_v3-objs-y)
>  
>  obj-$(CONFIG_ARM_SMMU_V3_KUNIT_TEST) += arm-smmu-v3-test.o
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index ba0e24d5ffbf..430e84fe3679 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -334,6 +334,9 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
>  
>  static struct arm_smmu_cmdq *arm_smmu_get_cmdq(struct arm_smmu_device *smmu)
>  {
> +	if (arm_smmu_has_tegra241_cmdqv(smmu))
> +		return tegra241_cmdqv_get_cmdq(smmu);
> +
>  	return &smmu->cmdq;

Hardcoding all these tegra-specific checks in the core driver is pretty
horrible :/

Instead, please can we do something similar to the SMMUv2 driver? That
is, tweak the probe routine to call something akin to the
arm_smmu_impl_init() function, which looks at the 'model' field pulled
out of the IORT and can then dispatch directly to a tegra-specific init
function (see, e.g. nvidia_smmu_impl_init() for SMMUv2).

>From there, you can both install function pointers into the
'arm_smmu_device' structure which can be used instead of having the the
'if (tegra)' checks in the main driver and you can also re-allocate the
structu to live inside a private structure instead of having the
backpointer.

Maybe those cmdq function pointers would be happy for other extensions
too (e.g. the ECMDQ stuff at [1]).

Will

[1] https://lore.kernel.org/r/20240425144152.52352-3-tanmay@marvell.com