lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aNrce2up6ZxzMJpo@lizhi-Precision-Tower-5810>
Date: Mon, 29 Sep 2025 15:22:35 -0400
From: Frank Li <Frank.li@....com>
To: "Rob Herring (Arm)" <robh@...nel.org>
Cc: Tomeu Vizoso <tomeu@...euvizoso.net>,
	Krzysztof Kozlowski <krzk+dt@...nel.org>,
	Conor Dooley <conor+dt@...nel.org>,
	Oded Gabbay <ogabbay@...nel.org>,
	Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
	Maxime Ripard <mripard@...nel.org>,
	Thomas Zimmermann <tzimmermann@...e.de>,
	David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
	Sumit Semwal <sumit.semwal@...aro.org>,
	Christian König <christian.koenig@....com>,
	Robin Murphy <robin.murphy@....com>,
	Steven Price <steven.price@....com>,
	Daniel Stone <daniel@...ishbar.org>, devicetree@...r.kernel.org,
	linux-kernel@...r.kernel.org, dri-devel@...ts.freedesktop.org,
	linux-media@...r.kernel.org, linaro-mm-sig@...ts.linaro.org
Subject: Re: [PATCH v3 2/2] accel: Add Arm Ethos-U NPU driver

On Fri, Sep 26, 2025 at 03:00:49PM -0500, Rob Herring (Arm) wrote:
> Add a driver for Arm Ethos-U65/U85 NPUs. The Ethos-U NPU has a
> relatively simple interface with single command stream to describe
> buffers, operation settings, and network operations. It supports up to 8
> memory regions (though no h/w bounds on a region). The Ethos NPUs
> are designed to use an SRAM for scratch memory. Region 2 is reserved
> for SRAM (like the downstream driver stack and compiler). Userspace
> doesn't need access to the SRAM.
>
> The h/w has no MMU nor external IOMMU and is a DMA engine which can
> read and write anywhere in memory without h/w bounds checks. The user
> submitted command streams must be validated against the bounds of the
> GEM BOs. This is similar to the VC4 design which validates shaders.
>
> The job submit is based on the rocket driver for the Rockchip NPU
> utilizing the GPU scheduler. It is simpler as there's only 1 core rather
> than 3.
>
> Tested on i.MX93 platform (U65) and FVP (U85) with WIP Mesa Teflon
> support.
>
> Acked-by: Thomas Zimmermann <tzimmermann@...e.de>
> Signed-off-by: Rob Herring (Arm) <robh@...nel.org>
> ---
> v3:
>  - Rework and improve job submit validation
>  - Rename ethos to ethosu. There was an Ethos-Nxx that's unrelated.
>  - Add missing init for sched_lock mutex
>  - Drop some prints to debug level
>  - Fix i.MX93 SRAM accesses (AXI config)
>  - Add U85 AXI configuration
>  - Print the current cmd value on timeout
>
> v2:
>  - Rebase on v6.17-rc1 adapting to scheduler changes
>  - scheduler: Drop the reset workqueue. According to the scheduler docs,
>    we don't need it since we have a single h/w queue.
>  - scheduler: Rework the timeout handling to continue running if we are
>    making progress.
>  - Reset the NPU on resume so it's in a known state
>  - Add error handling on clk_get() calls
>  - Fix drm_mm splat on module unload. We were missing a put on the
>    cmdstream BO in the scheduler clean-up.
>  - Fix 0-day report needing explicit bitfield.h include
> ---
>  MAINTAINERS                          |   9 +
>  drivers/accel/Kconfig                |   1 +
>  drivers/accel/Makefile               |   1 +
>  drivers/accel/ethosu/Kconfig         |  10 +
>  drivers/accel/ethosu/Makefile        |   4 +
>  drivers/accel/ethosu/ethosu_device.h | 187 +++++++++
>  drivers/accel/ethosu/ethosu_drv.c    | 430 +++++++++++++++++++++
>  drivers/accel/ethosu/ethosu_drv.h    |  15 +
>  drivers/accel/ethosu/ethosu_gem.c    | 709 +++++++++++++++++++++++++++++++++++
>  drivers/accel/ethosu/ethosu_gem.h    |  46 +++
>  drivers/accel/ethosu/ethosu_job.c    | 543 +++++++++++++++++++++++++++
>  drivers/accel/ethosu/ethosu_job.h    |  41 ++
>  include/uapi/drm/ethosu_accel.h      | 262 +++++++++++++
>  13 files changed, 2258 insertions(+)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index fe168477caa4..b091badec8e4 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1991,6 +1991,15 @@ F:	arch/arm/include/asm/arch_timer.h
>  F:	arch/arm64/include/asm/arch_timer.h
>  F:	drivers/clocksource/arm_arch_timer.c
>
> +ARM ETHOS-U NPU DRIVER
> +M:	Rob Herring (Arm) <robh@...nel.org>
> +M:	Tomeu Vizoso <tomeu@...euvizoso.net>
> +L:	dri-devel@...ts.freedesktop.org
> +S:	Supported
> +T:	git https://gitlab.freedesktop.org/drm/misc/kernel.git
> +F:	drivers/accel/ethosu/
> +F:	include/uapi/drm/ethosu_accel.h
> +
>  ARM GENERIC INTERRUPT CONTROLLER DRIVERS
>  M:	Marc Zyngier <maz@...nel.org>
>  L:	linux-arm-kernel@...ts.infradead.org (moderated for non-subscribers)
> diff --git a/drivers/accel/Kconfig b/drivers/accel/Kconfig
> index 5b9490367a39..a5a4faab221b 100644
> --- a/drivers/accel/Kconfig
> +++ b/drivers/accel/Kconfig
> @@ -25,6 +25,7 @@ menuconfig DRM_ACCEL
>  	  and debugfs).
>
>  source "drivers/accel/amdxdna/Kconfig"
> +source "drivers/accel/ethosu/Kconfig"
>  source "drivers/accel/habanalabs/Kconfig"
>  source "drivers/accel/ivpu/Kconfig"
>  source "drivers/accel/qaic/Kconfig"
> diff --git a/drivers/accel/Makefile b/drivers/accel/Makefile
> index a301fb6089d4..c759857d64cb 100644
> --- a/drivers/accel/Makefile
> +++ b/drivers/accel/Makefile
> @@ -1,6 +1,7 @@
>  # SPDX-License-Identifier: GPL-2.0-only
>
>  obj-$(CONFIG_DRM_ACCEL_AMDXDNA)		+= amdxdna/
> +obj-$(CONFIG_DRM_ACCEL_ARM_ETHOSU)	+= ethosu/
>  obj-$(CONFIG_DRM_ACCEL_HABANALABS)	+= habanalabs/
>  obj-$(CONFIG_DRM_ACCEL_IVPU)		+= ivpu/
>  obj-$(CONFIG_DRM_ACCEL_QAIC)		+= qaic/
...
> +#include <drm/ethosu_accel.h>
> +
> +struct clk;
> +struct gen_pool;
> +
> +#define NPU_REG_ID		0x0000
> +#define NPU_REG_STATUS		0x0004
> +#define NPU_REG_CMD		0x0008
> +#define NPU_REG_RESET		0x000C
> +#define NPU_REG_QBASE		0x0010
> +#define NPU_REG_QBASE_HI	0x0014
> +#define NPU_REG_QREAD		0x0018
> +#define NPU_REG_QCONFIG		0x001C
> +#define NPU_REG_QSIZE		0x0020
> +#define NPU_REG_PROT		0x0024
> +#define NPU_REG_CONFIG		0x0028
> +#define NPU_REG_REGIONCFG	0x003C
> +#define NPU_REG_AXILIMIT0	0x0040		// U65
> +#define NPU_REG_AXILIMIT1	0x0044		// U65
> +#define NPU_REG_AXILIMIT2	0x0048		// U65
> +#define NPU_REG_AXILIMIT3	0x004c		// U65

Nit: can you keep low-case/up-case consistent for Hex value

> +#define NPU_REG_MEM_ATTR0	0x0040		// U85
> +#define NPU_REG_MEM_ATTR1	0x0044		// U85
> +#define NPU_REG_MEM_ATTR2	0x0048		// U85
> +#define NPU_REG_MEM_ATTR3	0x004c		// U85
> +#define NPU_REG_AXI_SRAM	0x0050		// U85
> +#define NPU_REG_AXI_EXT		0x0054		// U85
> +
...
> +
> +#define ETHOSU_SRAM_REGION	2	/* Matching Vela compiler */
> +
> +/**
> + * struct ethosu_device - Ethosu device
> + */
> +struct ethosu_device {
> +	/** @base: Base drm_device. */
> +	struct drm_device base;
> +
> +	/** @iomem: CPU mapping of the registers. */
> +	void __iomem *regs;
> +
> +	void __iomem *sram;
> +	struct gen_pool *srampool;
> +	dma_addr_t sramphys;
> +
> +	struct clk *core_clk;
> +	struct clk *apb_clk;
> +
> +	int irq;
> +
> +	bool coherent;
> +
> +	struct drm_ethosu_npu_info npu_info;
> +
> +	struct ethosu_job *in_flight_job;
> +	struct mutex job_lock;

I remember checkpatch.py require all locks need comments.

> +	spinlock_t fence_lock;
> +
> +	struct drm_gpu_scheduler sched;
> +	struct mutex sched_lock;
> +	u64 fence_context;
> +	u64 emit_seqno;
> +};
> +
> +#define to_ethosu_device(drm_dev) \
> +	((struct ethosu_device *)container_of(drm_dev, struct ethosu_device, base))
> +
> +#endif
> diff --git a/drivers/accel/ethosu/ethosu_drv.c b/drivers/accel/ethosu/ethosu_drv.c
> new file mode 100644
> index 000000000000..b74d803756cf
> --- /dev/null
> +++ b/drivers/accel/ethosu/ethosu_drv.c

...

> +
> +static int ethosu_device_resume(struct device *dev)
> +{
> +	struct ethosu_device *ethosudev = dev_get_drvdata(dev);
> +	int ret;
> +
> +	ret = clk_prepare_enable(ethosudev->core_clk);
> +	if (ret)
> +		return ret;
> +
> +	ret = clk_prepare_enable(ethosudev->apb_clk);
> +	if (ret)
> +		goto err_disable_core_clk;

use clk bulk api should be simple.

> +
> +	ret = ethosu_reset(ethosudev);
> +	if (!ret)
> +		return 0;
> +
> +err_disable_core_clk:
> +	clk_disable_unprepare(ethosudev->core_clk);
> +	return ret;
> +}
> +
...
> +
> +static int ethosu_init(struct ethosu_device *ethosudev)
> +{
> +	int ret;
> +	u32 id, config;
> +
> +	ret = devm_pm_runtime_enable(ethosudev->base.dev);
> +	if (ret)
> +		return ret;
> +
> +	ret = pm_runtime_resume_and_get(ethosudev->base.dev);
> +	if (ret)
> +		return ret;
> +
> +	pm_runtime_set_autosuspend_delay(ethosudev->base.dev, 50);
> +	pm_runtime_use_autosuspend(ethosudev->base.dev);

pm_runtime_use_autosuspend() should be after last register read
readl_relaxed(ethosudev->regs + NPU_REG_CONFIG);

incase schedule happen between pm_runtime_use_autosuspend(ethosudev->base.dev);
and readl().

> +
> +	/* If PM is disabled, we need to call ethosu_device_resume() manually. */
> +	if (!IS_ENABLED(CONFIG_PM)) {
> +		ret = ethosu_device_resume(ethosudev->base.dev);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	ethosudev->npu_info.id = id = readl_relaxed(ethosudev->regs + NPU_REG_ID);
> +	ethosudev->npu_info.config = config = readl_relaxed(ethosudev->regs + NPU_REG_CONFIG);
> +
> +	ethosu_sram_init(ethosudev);
> +
> +	dev_info(ethosudev->base.dev,
> +		"Ethos-U NPU, arch v%ld.%ld.%ld, rev r%ldp%ld, cmd stream ver%ld, %d MACs, %dKB SRAM\n",
> +		FIELD_GET(ID_ARCH_MAJOR_MASK, id),
> +		FIELD_GET(ID_ARCH_MINOR_MASK, id),
> +		FIELD_GET(ID_ARCH_PATCH_MASK, id),
> +		FIELD_GET(ID_VER_MAJOR_MASK, id),
> +		FIELD_GET(ID_VER_MINOR_MASK, id),
> +		FIELD_GET(CONFIG_CMD_STREAM_VER_MASK, config),
> +		1 << FIELD_GET(CONFIG_MACS_PER_CC_MASK, config),
> +		ethosudev->npu_info.sram_size / 1024);
> +
> +	return 0;
> +}
> +
> +static int ethosu_probe(struct platform_device *pdev)
> +{
> +	int ret;
> +	struct ethosu_device *ethosudev;
> +
> +	ethosudev = devm_drm_dev_alloc(&pdev->dev, &ethosu_drm_driver,
> +				      struct ethosu_device, base);
> +	if (IS_ERR(ethosudev))
> +		return -ENOMEM;
> +	platform_set_drvdata(pdev, ethosudev);
> +
> +	ret = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(40));
> +	if (ret)
> +		return ret;

bigger than 32, dma_set_mask_and_coherent() always return true. Needn't
check return value.

> +
> +	ethosudev->regs = devm_platform_ioremap_resource(pdev, 0);
> +
...
> +
> +/**
> + * ethosu_gem_create_with_handle() - Create a GEM object and attach it to a handle.
> + * @file: DRM file.
> + * @ddev: DRM device.
> + * @size: Size of the GEM object to allocate.
> + * @flags: Combination of drm_ethosu_bo_flags flags.
> + * @handle: Pointer holding the handle pointing to the new GEM object.
> + *
> + * Return: Zero on success
> + */
> +int ethosu_gem_create_with_handle(struct drm_file *file,
> +				 struct drm_device *ddev,
> +				 u64 *size, u32 flags, u32 *handle)
> +{
> +	int ret;
> +	struct drm_gem_dma_object *mem;
> +	struct ethosu_gem_object *bo;

move 'ret' here to keep reverise christmas tree order.

> +
> +	mem = drm_gem_dma_create(ddev, *size);
> +	if (IS_ERR(mem))
> +		return PTR_ERR(mem);
> +
> +	bo = to_ethosu_bo(&mem->base);
> +	bo->flags = flags;
> +
> +	/*
> +	 * Allocate an id of idr table where the obj is registered
> +	 * and handle has the id what user can see.
> +	 */
> +	ret = drm_gem_handle_create(file, &mem->base, handle);
> +	if (!ret)
> +		*size = bo->base.base.size;
> +
> +	/* drop reference from allocate - handle holds it now. */
> +	drm_gem_object_put(&mem->base);
> +
> +	return ret;
> +}
> +
...
> +
> +static void cmd_state_init(struct cmd_state *st)
> +{
> +	/* Initialize to all 1s to detect missing setup */
> +	memset(st, 0xff, sizeof(*st));
> +}
> +
> +static u64 cmd_to_addr(u32 *cmd)
> +{
> +	return ((u64)((cmd[0] & 0xff0000) << 16)) | cmd[1];

will FIELD_PREP helpful?

> +}
> +
> +static u64 dma_length(struct ethosu_validated_cmdstream_info *info,
> +		      struct dma_state *dma_st, struct dma *dma)
> +{
> +	s8 mode = dma_st->mode;
> +	u64 len = dma->len;
> +
> +	if (mode >= 1) {
> +		len += dma->stride[0];
> +		len *= dma_st->size0;
> +	}
> +	if (mode == 2) {
> +		len += dma->stride[1];
> +		len *= dma_st->size1;
> +	}
> +	if (dma->region >= 0)
> +		info->region_size[dma->region] = max(info->region_size[dma->region],
> +						     len + dma->offset);
> +
> +	return len;
> +}
> +
...

> +
> +static void ethosu_job_handle_irq(struct ethosu_device *dev)
> +{
> +	u32 status;
> +
> +	pm_runtime_mark_last_busy(dev->base.dev);

I think don't need pm_runtime_mark_last_busy() here because
pm_runtime_put_autosuspend() already call pm_runtime_mark_last_busy().

only mark last busy without pm_runtime_put() can't affect run time pm
state, still in active state.

> +
> +	status = readl_relaxed(dev->regs + NPU_REG_STATUS);
> +
> +	if (status & (STATUS_BUS_STATUS | STATUS_CMD_PARSE_ERR)) {
> +		dev_err(dev->base.dev, "Error IRQ - %x\n", status);
> +		drm_sched_fault(&dev->sched);
> +		return;
> +	}
> +
> +	scoped_guard(mutex, &dev->job_lock) {
> +		if (dev->in_flight_job) {
> +			dma_fence_signal(dev->in_flight_job->done_fence);
> +			pm_runtime_put_autosuspend(dev->base.dev);
> +			dev->in_flight_job = NULL;
> +		}
> +	}
> +}
> +
...
> +
> +int ethosu_job_init(struct ethosu_device *dev)
> +{
> +	struct drm_sched_init_args args = {
> +		.ops = &ethosu_sched_ops,
> +		.num_rqs = DRM_SCHED_PRIORITY_COUNT,
> +		.credit_limit = 1,
> +		.timeout = msecs_to_jiffies(JOB_TIMEOUT_MS),
> +		.name = dev_name(dev->base.dev),
> +		.dev = dev->base.dev,
> +	};
> +	int ret;
> +
> +	spin_lock_init(&dev->fence_lock);
> +	mutex_init(&dev->job_lock);
> +	mutex_init(&dev->sched_lock);

now perfer use dev_mutex_init().

Frank
> +
> +	dev->irq = platform_get_irq(to_platform_device(dev->base.dev), 0);
> +	if (dev->irq < 0)
> +		return dev->irq;
> +
> +	ret = devm_request_threaded_irq(dev->base.dev, dev->irq,
> +					ethosu_job_irq_handler,
> +					ethosu_job_irq_handler_thread,
> +					IRQF_SHARED, KBUILD_MODNAME,
> +					dev);
> +	if (ret) {
> +		dev_err(dev->base.dev, "failed to request irq");
> +		return ret;
> +	}
> +
> +	dev->fence_context = dma_fence_context_alloc(1);
> +
> +	ret = drm_sched_init(&dev->sched, &args);
> +	if (ret) {
> +		dev_err(dev->base.dev, "Failed to create scheduler: %d.", ret);
> +		goto err_sched;
> +	}
> +
> +	return 0;
> +
> +err_sched:
> +	drm_sched_fini(&dev->sched);
> +	return ret;
> +}
> +
...
> +	DRM_IOCTL_ETHOSU_BO_WAIT =
> +		DRM_IOCTL_ETHOSU(WR, BO_WAIT, bo_wait),
> +	DRM_IOCTL_ETHOSU_BO_MMAP_OFFSET =
> +		DRM_IOCTL_ETHOSU(WR, BO_MMAP_OFFSET, bo_mmap_offset),
> +	DRM_IOCTL_ETHOSU_CMDSTREAM_BO_CREATE =
> +		DRM_IOCTL_ETHOSU(WR, CMDSTREAM_BO_CREATE, cmdstream_bo_create),
> +	DRM_IOCTL_ETHOSU_SUBMIT =
> +		DRM_IOCTL_ETHOSU(WR, SUBMIT, submit),
> +};
> +
> +#if defined(__cplusplus)
> +}
> +#endif
> +
> +#endif /* _ETHOSU_DRM_H_ */
>
> --
> 2.51.0
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ