lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8a872e48-0743-43b0-8259-70d6b8e4c221@suse.de>
Date: Tue, 12 Aug 2025 13:01:45 +0200
From: Thomas Zimmermann <tzimmermann@...e.de>
To: "Rob Herring (Arm)" <robh@...nel.org>,
 Tomeu Vizoso <tomeu@...euvizoso.net>,
 Krzysztof Kozlowski <krzk+dt@...nel.org>, Conor Dooley
 <conor+dt@...nel.org>, Oded Gabbay <ogabbay@...nel.org>,
 Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
 Maxime Ripard <mripard@...nel.org>, David Airlie <airlied@...il.com>,
 Simona Vetter <simona@...ll.ch>, Sumit Semwal <sumit.semwal@...aro.org>,
 Christian König <christian.koenig@....com>,
 Robin Murphy <robin.murphy@....com>, Steven Price <steven.price@....com>
Cc: devicetree@...r.kernel.org, linux-kernel@...r.kernel.org,
 dri-devel@...ts.freedesktop.org, linux-media@...r.kernel.org,
 linaro-mm-sig@...ts.linaro.org
Subject: Re: [PATCH v2 2/2] accel: Add Arm Ethos-U NPU driver

Hi

Am 11.08.25 um 23:05 schrieb Rob Herring (Arm):
> Add a driver for Arm Ethos-U65/U85 NPUs. The Ethos-U NPU has a
> relatively simple interface with single command stream to describe
> buffers, operation settings, and network operations. It supports up to 8
> memory regions (though no h/w bounds on a region). The Ethos NPUs
> are designed to use an SRAM for scratch memory. Region 2 is reserved
> for SRAM (like the downstream driver stack and compiler). Userspace
> doesn't need access to the SRAM.
>
> The h/w has no MMU nor external IOMMU and is a DMA engine which can
> read and write anywhere in memory without h/w bounds checks. The user
> submitted command streams must be validated against the bounds of the
> GEM BOs. This is similar to the VC4 design which validates shaders.
>
> The job submit is based on the rocket driver for the Rockchip NPU
> utilizing the GPU scheduler. It is simpler as there's only 1 core rather
> than 3.
>
> Tested on i.MX93 platform (U65) with WIP Mesa Teflon support.
>
> Signed-off-by: Rob Herring (Arm) <robh@...nel.org>

I've looked over this patch and it looks good to me. There's a 
dev_info() in ethos_init() of which I think it should become drm_dbg(). 
Anyway

Acked-by: Thomas Zimmermann <tzimmermann@...e.de>

Side note: I noticed that there's buffer-allocation code here that 
reinvents dumb buffers. We've ocationally talked about creating a better 
dumb-buffers ioctl interface and this drivers could be another use case.

Best regards
Thomas
> ---
> v2:
>   - Rebase on v6.17-rc1 adapting to scheduler changes
>   - scheduler: Drop the reset workqueue. According to the scheduler docs,
>     we don't need it since we have a single h/w queue.
>   - scheduler: Rework the timeout handling to continue running if we are
>     making progress.
>   - Reset the NPU on resume so it's in a known state
>   - Add error handling on clk_get() calls
>   - Fix drm_mm splat on module unload. We were missing a put on the
>     cmdstream BO in the scheduler clean-up.
>   - Fix 0-day report needing explicit bitfield.h include
>
> Open issues/TODO:
>   - U85 support (pending h/w. minor changes and testing needed)
>   - Improve AXI bus config. This needs to be per platform probably and is
>     also different for U85. The downstream driver parameterizes
>     everything into DT properties (and a whole other driver!). There's
>     little reason to parameterize everything for a couple of fixed
>     registers to initialize. All that should be needed is per platform
>     match data. Not really needed until 2 platforms with different
>     settings needs to be supported.
>   - SRAM support not working on i.MX93 as some DMA writes to SRAM are
>     corrupted/dropped. Seems to be outside the Ethos block and
>     configuration. (AXI bus config?)
> ---
>   MAINTAINERS                        |   9 +
>   drivers/accel/Kconfig              |   1 +
>   drivers/accel/Makefile             |   1 +
>   drivers/accel/ethos/Kconfig        |  10 +
>   drivers/accel/ethos/Makefile       |   4 +
>   drivers/accel/ethos/ethos_device.h | 181 ++++++++++
>   drivers/accel/ethos/ethos_drv.c    | 418 ++++++++++++++++++++++
>   drivers/accel/ethos/ethos_drv.h    |  15 +
>   drivers/accel/ethos/ethos_gem.c    | 707 +++++++++++++++++++++++++++++++++++++
>   drivers/accel/ethos/ethos_gem.h    |  46 +++
>   drivers/accel/ethos/ethos_job.c    | 514 +++++++++++++++++++++++++++
>   drivers/accel/ethos/ethos_job.h    |  41 +++
>   include/uapi/drm/ethos_accel.h     | 262 ++++++++++++++
>   13 files changed, 2209 insertions(+)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index fe168477caa4..2f31678b3c10 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1991,6 +1991,15 @@ F:	arch/arm/include/asm/arch_timer.h
>   F:	arch/arm64/include/asm/arch_timer.h
>   F:	drivers/clocksource/arm_arch_timer.c
>   
> +ARM ETHOS-U NPU DRIVER
> +M:	Rob Herring (Arm) <robh@...nel.org>
> +M:	Tomeu Vizoso <tomeu@...euvizoso.net>
> +L:	dri-devel@...ts.freedesktop.org
> +S:	Supported
> +T:	git https://gitlab.freedesktop.org/drm/misc/kernel.git
> +F:	drivers/accel/ethos/
> +F:	include/uapi/drm/ethos_accel.h
> +
>   ARM GENERIC INTERRUPT CONTROLLER DRIVERS
>   M:	Marc Zyngier <maz@...nel.org>
>   L:	linux-arm-kernel@...ts.infradead.org (moderated for non-subscribers)
> diff --git a/drivers/accel/Kconfig b/drivers/accel/Kconfig
> index 5b9490367a39..6317644b6a5a 100644
> --- a/drivers/accel/Kconfig
> +++ b/drivers/accel/Kconfig
> @@ -25,6 +25,7 @@ menuconfig DRM_ACCEL
>   	  and debugfs).
>   
>   source "drivers/accel/amdxdna/Kconfig"
> +source "drivers/accel/ethos/Kconfig"
>   source "drivers/accel/habanalabs/Kconfig"
>   source "drivers/accel/ivpu/Kconfig"
>   source "drivers/accel/qaic/Kconfig"
> diff --git a/drivers/accel/Makefile b/drivers/accel/Makefile
> index a301fb6089d4..f18868037484 100644
> --- a/drivers/accel/Makefile
> +++ b/drivers/accel/Makefile
> @@ -1,6 +1,7 @@
>   # SPDX-License-Identifier: GPL-2.0-only
>   
>   obj-$(CONFIG_DRM_ACCEL_AMDXDNA)		+= amdxdna/
> +obj-$(CONFIG_DRM_ACCEL_ARM_ETHOS)	+= ethos/
>   obj-$(CONFIG_DRM_ACCEL_HABANALABS)	+= habanalabs/
>   obj-$(CONFIG_DRM_ACCEL_IVPU)		+= ivpu/
>   obj-$(CONFIG_DRM_ACCEL_QAIC)		+= qaic/
> diff --git a/drivers/accel/ethos/Kconfig b/drivers/accel/ethos/Kconfig
> new file mode 100644
> index 000000000000..127afbae43f0
> --- /dev/null
> +++ b/drivers/accel/ethos/Kconfig
> @@ -0,0 +1,10 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +config DRM_ACCEL_ARM_ETHOS
> +	tristate "Arm Ethos-U65/U85 NPU"
> +	depends on HAS_IOMEM
> +	depends on DRM_ACCEL
> +	select DRM_GEM_DMA_HELPER
> +	select DRM_SCHED
> +	help
> +	  Enables driver for Arm Ethos-U65/U85 NPUs
> diff --git a/drivers/accel/ethos/Makefile b/drivers/accel/ethos/Makefile
> new file mode 100644
> index 000000000000..8b4a8bbf41d4
> --- /dev/null
> +++ b/drivers/accel/ethos/Makefile
> @@ -0,0 +1,4 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +obj-$(CONFIG_DRM_ACCEL_ARM_ETHOS) := ethos.o
> +ethos-y += ethos_drv.o ethos_gem.o ethos_job.o
> diff --git a/drivers/accel/ethos/ethos_device.h b/drivers/accel/ethos/ethos_device.h
> new file mode 100644
> index 000000000000..71d652144341
> --- /dev/null
> +++ b/drivers/accel/ethos/ethos_device.h
> @@ -0,0 +1,181 @@
> +/* SPDX-License-Identifier: GPL-2.0-only or MIT */
> +/* Copyright 2025 Arm, Ltd. */
> +
> +#ifndef __ETHOS_DEVICE_H__
> +#define __ETHOS_DEVICE_H__
> +
> +#include <drm/drm_device.h>
> +#include <drm/gpu_scheduler.h>
> +
> +#include <drm/ethos_accel.h>
> +
> +struct clk;
> +struct gen_pool;
> +
> +#define NPU_REG_ID		0x0000
> +#define NPU_REG_STATUS		0x0004
> +#define NPU_REG_CMD		0x0008
> +#define NPU_REG_RESET		0x000C
> +#define NPU_REG_QBASE		0x0010
> +#define NPU_REG_QBASE_HI	0x0014
> +#define NPU_REG_QREAD		0x0018
> +#define NPU_REG_QCONFIG		0x001C
> +#define NPU_REG_QSIZE		0x0020
> +#define NPU_REG_PROT		0x0024
> +#define NPU_REG_CONFIG		0x0028
> +#define NPU_REG_REGIONCFG	0x003C
> +#define NPU_REG_AXILIMIT0	0x0040
> +#define NPU_REG_AXILIMIT1	0x0044
> +#define NPU_REG_AXILIMIT2	0x0048
> +#define NPU_REG_AXILIMIT3	0x004c
> +
> +#define NPU_REG_BASEP(x)	(0x0080 + (x)*8)
> +#define NPU_REG_BASEP_HI(x)	(0x0084 + (x)*8)
> +#define NPU_BASEP_REGION_MAX	8
> +
> +#define ID_ARCH_MAJOR_MASK	GENMASK(31, 28)
> +#define ID_ARCH_MINOR_MASK	GENMASK(27, 20)
> +#define ID_ARCH_PATCH_MASK	GENMASK(19, 16)
> +#define ID_VER_MAJOR_MASK	GENMASK(11, 8)
> +#define ID_VER_MINOR_MASK	GENMASK(7, 4)
> +
> +#define CONFIG_MACS_PER_CC_MASK	GENMASK(3, 0)
> +#define CONFIG_CMD_STREAM_VER_MASK	GENMASK(7, 4)
> +
> +#define STATUS_STATE_RUNNING	BIT(0)
> +#define STATUS_IRQ_RAISED	BIT(1)
> +#define STATUS_BUS_STATUS	BIT(2)
> +#define STATUS_RESET_STATUS	BIT(3)
> +#define STATUS_CMD_PARSE_ERR	BIT(4)
> +#define STATUS_CMD_END_REACHED	BIT(5)
> +
> +#define CMD_CLEAR_IRQ		BIT(1)
> +#define CMD_TRANSITION_TO_RUN	BIT(0)
> +
> +#define RESET_PENDING_CSL	BIT(1)
> +#define RESET_PENDING_CPL	BIT(0)
> +
> +#define PROT_ACTIVE_CSL		BIT(1)
> +
> +enum ethos_cmds {
> +	NPU_OP_CONV = 0x2,
> +	NPU_OP_DEPTHWISE = 0x3,
> +	NPU_OP_POOL = 0x5,
> +	NPU_OP_ELEMENTWISE = 0x6,
> +	NPU_OP_RESIZE = 0x7,	// U85 only
> +	NPU_OP_DMA_START = 0x10,
> +	NPU_SET_IFM_PAD_TOP = 0x100,
> +	NPU_SET_IFM_PAD_LEFT = 0x101,
> +	NPU_SET_IFM_PAD_RIGHT = 0x102,
> +	NPU_SET_IFM_PAD_BOTTOM = 0x103,
> +	NPU_SET_IFM_DEPTH_M1 = 0x104,
> +	NPU_SET_IFM_PRECISION = 0x105,
> +	NPU_SET_IFM_BROADCAST = 0x108,
> +	NPU_SET_IFM_WIDTH0_M1 = 0x10a,
> +	NPU_SET_IFM_HEIGHT0_M1 = 0x10b,
> +	NPU_SET_IFM_HEIGHT1_M1 = 0x10c,
> +	NPU_SET_IFM_REGION = 0x10f,
> +	NPU_SET_OFM_WIDTH_M1 = 0x111,
> +	NPU_SET_OFM_HEIGHT_M1 = 0x112,
> +	NPU_SET_OFM_DEPTH_M1 = 0x113,
> +	NPU_SET_OFM_PRECISION = 0x114,
> +	NPU_SET_OFM_WIDTH0_M1 = 0x11a,
> +	NPU_SET_OFM_HEIGHT0_M1 = 0x11b,
> +	NPU_SET_OFM_HEIGHT1_M1 = 0x11c,
> +	NPU_SET_OFM_REGION = 0x11f,
> +	NPU_SET_KERNEL_WIDTH_M1 = 0x120,
> +	NPU_SET_KERNEL_HEIGHT_M1 = 0x121,
> +	NPU_SET_KERNEL_STRIDE = 0x122,
> +	NPU_SET_WEIGHT_REGION = 0x128,
> +	NPU_SET_SCALE_REGION = 0x129,
> +	NPU_SET_DMA0_SRC_REGION = 0x130,
> +	NPU_SET_DMA0_DST_REGION = 0x131,
> +	NPU_SET_DMA0_SIZE0 = 0x132,
> +	NPU_SET_DMA0_SIZE1 = 0x133,
> +	NPU_SET_IFM2_BROADCAST = 0x180,
> +	NPU_SET_IFM2_PRECISION = 0x185,
> +	NPU_SET_IFM2_WIDTH0_M1 = 0x18a,
> +	NPU_SET_IFM2_HEIGHT0_M1 = 0x18b,
> +	NPU_SET_IFM2_HEIGHT1_M1 = 0x18c,
> +	NPU_SET_IFM2_REGION = 0x18f,
> +	NPU_SET_IFM_BASE0 = 0x4000,
> +	NPU_SET_IFM_BASE1 = 0x4001,
> +	NPU_SET_IFM_BASE2 = 0x4002,
> +	NPU_SET_IFM_BASE3 = 0x4003,
> +	NPU_SET_IFM_STRIDE_X = 0x4004,
> +	NPU_SET_IFM_STRIDE_Y = 0x4005,
> +	NPU_SET_IFM_STRIDE_C = 0x4006,
> +	NPU_SET_OFM_BASE0 = 0x4010,
> +	NPU_SET_OFM_BASE1 = 0x4011,
> +	NPU_SET_OFM_BASE2 = 0x4012,
> +	NPU_SET_OFM_BASE3 = 0x4013,
> +	NPU_SET_OFM_STRIDE_X = 0x4014,
> +	NPU_SET_OFM_STRIDE_Y = 0x4015,
> +	NPU_SET_OFM_STRIDE_C = 0x4016,
> +	NPU_SET_WEIGHT_BASE = 0x4020,
> +	NPU_SET_WEIGHT_LENGTH = 0x4021,
> +	NPU_SET_SCALE_BASE = 0x4022,
> +	NPU_SET_SCALE_LENGTH = 0x4023,
> +	NPU_SET_DMA0_SRC = 0x4030,
> +	NPU_SET_DMA0_DST = 0x4031,
> +	NPU_SET_DMA0_LEN = 0x4032,
> +	NPU_SET_DMA0_SRC_STRIDE0 = 0x4033,
> +	NPU_SET_DMA0_SRC_STRIDE1 = 0x4034,
> +	NPU_SET_DMA0_DST_STRIDE0 = 0x4035,
> +	NPU_SET_DMA0_DST_STRIDE1 = 0x4036,
> +	NPU_SET_IFM2_BASE0 = 0x4080,
> +	NPU_SET_IFM2_BASE1 = 0x4081,
> +	NPU_SET_IFM2_BASE2 = 0x4082,
> +	NPU_SET_IFM2_BASE3 = 0x4083,
> +	NPU_SET_IFM2_STRIDE_X = 0x4084,
> +	NPU_SET_IFM2_STRIDE_Y = 0x4085,
> +	NPU_SET_IFM2_STRIDE_C = 0x4086,
> +	NPU_SET_WEIGHT1_BASE = 0x4090,
> +	NPU_SET_WEIGHT1_LENGTH = 0x4091,
> +	NPU_SET_SCALE1_BASE = 0x4092,
> +	NPU_SET_WEIGHT2_BASE = 0x4092,
> +	NPU_SET_SCALE1_LENGTH = 0x4093,
> +	NPU_SET_WEIGHT2_LENGTH = 0x4093,
> +	NPU_SET_WEIGHT3_BASE = 0x4094,
> +	NPU_SET_WEIGHT3_LENGTH = 0x4095,
> +};
> +
> +#define ETHOS_SRAM_REGION	2	/* Matching Vela compiler */
> +
> +/**
> + * struct ethos_device - Ethos device
> + */
> +struct ethos_device {
> +	/** @base: Base drm_device. */
> +	struct drm_device base;
> +
> +	/** @iomem: CPU mapping of the registers. */
> +	void __iomem *regs;
> +
> +	void __iomem *sram;
> +	struct gen_pool *srampool;
> +	dma_addr_t sramphys;
> +
> +	struct clk *core_clk;
> +	struct clk *apb_clk;
> +
> +	int irq;
> +
> +	bool coherent;
> +
> +	struct drm_ethos_npu_info npu_info;
> +
> +	struct ethos_job *in_flight_job;
> +	struct mutex job_lock;
> +	spinlock_t fence_lock;
> +
> +	struct drm_gpu_scheduler sched;
> +	struct mutex sched_lock;
> +	u64 fence_context;
> +	u64 emit_seqno;
> +};
> +
> +#define to_ethos_device(drm_dev) \
> +	((struct ethos_device *)container_of(drm_dev, struct ethos_device, base))
> +
> +#endif
> diff --git a/drivers/accel/ethos/ethos_drv.c b/drivers/accel/ethos/ethos_drv.c
> new file mode 100644
> index 000000000000..dd9841cc6b73
> --- /dev/null
> +++ b/drivers/accel/ethos/ethos_drv.c
> @@ -0,0 +1,418 @@
> +// SPDX-License-Identifier: GPL-2.0-only or MIT
> +// Copyright (C) 2025 Arm, Ltd.
> +
> +#include <linux/clk.h>
> +#include <linux/genalloc.h>
> +#include <linux/io.h>
> +#include <linux/iopoll.h>
> +#include <linux/module.h>
> +#include <linux/mod_devicetable.h>
> +#include <linux/platform_device.h>
> +#include <linux/pm_runtime.h>
> +
> +#include <drm/drm_drv.h>
> +#include <drm/drm_ioctl.h>
> +#include <drm/drm_utils.h>
> +#include <drm/drm_gem.h>
> +#include <drm/drm_accel.h>
> +#include <drm/ethos_accel.h>
> +
> +#include "ethos_drv.h"
> +#include "ethos_device.h"
> +#include "ethos_gem.h"
> +#include "ethos_job.h"
> +
> +static int ethos_ioctl_dev_query(struct drm_device *ddev, void *data,
> +				 struct drm_file *file)
> +{
> +	struct ethos_device *ethosdev = to_ethos_device(ddev);
> +	struct drm_ethos_dev_query *args = data;
> +
> +	if (!args->pointer) {
> +		switch (args->type) {
> +		case DRM_ETHOS_DEV_QUERY_NPU_INFO:
> +			args->size = sizeof(ethosdev->npu_info);
> +			return 0;
> +		default:
> +			return -EINVAL;
> +		}
> +	}
> +
> +	switch (args->type) {
> +	case DRM_ETHOS_DEV_QUERY_NPU_INFO:
> +		if (args->size < offsetofend(struct drm_ethos_npu_info, sram_size))
> +			return -EINVAL;
> +		return copy_struct_to_user(u64_to_user_ptr(args->pointer),
> +					   args->size,
> +					   &ethosdev->npu_info,
> +					   sizeof(ethosdev->npu_info), NULL);
> +	default:
> +		return -EINVAL;
> +	}
> +}
> +
> +#define ETHOS_BO_FLAGS		DRM_ETHOS_BO_NO_MMAP
> +
> +static int ethos_ioctl_bo_create(struct drm_device *ddev, void *data,
> +				 struct drm_file *file)
> +{
> +	struct drm_ethos_bo_create *args = data;
> +	int cookie, ret;
> +
> +	if (!drm_dev_enter(ddev, &cookie))
> +		return -ENODEV;
> +
> +	if (!args->size || (args->flags & ~ETHOS_BO_FLAGS)) {
> +		ret = -EINVAL;
> +		goto out_dev_exit;
> +	}
> +
> +	ret = ethos_gem_create_with_handle(file, ddev, &args->size,
> +					   args->flags, &args->handle);
> +
> +out_dev_exit:
> +	drm_dev_exit(cookie);
> +	return ret;
> +}
> +
> +static int ethos_ioctl_bo_wait(struct drm_device *ddev, void *data,
> +			       struct drm_file *file)
> +{
> +	struct drm_ethos_bo_wait *args = data;
> +	int cookie, ret;
> +	unsigned long timeout = drm_timeout_abs_to_jiffies(args->timeout_ns);
> +
> +	if (args->pad)
> +		return -EINVAL;
> +
> +	if (!drm_dev_enter(ddev, &cookie))
> +		return -ENODEV;
> +
> +	ret = drm_gem_dma_resv_wait(file, args->handle, true, timeout);
> +
> +	drm_dev_exit(cookie);
> +	return ret;
> +}
> +
> +static int ethos_ioctl_bo_mmap_offset(struct drm_device *ddev, void *data,
> +				      struct drm_file *file)
> +{
> +	struct drm_ethos_bo_mmap_offset *args = data;
> +	struct drm_gem_object *obj;
> +
> +	if (args->pad)
> +		return -EINVAL;
> +
> +	obj = drm_gem_object_lookup(file, args->handle);
> +	if (!obj)
> +		return -ENOENT;
> +
> +	args->offset = drm_vma_node_offset_addr(&obj->vma_node);
> +	drm_gem_object_put(obj);
> +	return 0;
> +}
> +
> +static int ethos_ioctl_cmdstream_bo_create(struct drm_device *ddev, void *data,
> +					   struct drm_file *file)
> +{
> +	struct drm_ethos_cmdstream_bo_create *args = data;
> +	int cookie, ret;
> +
> +	if (!drm_dev_enter(ddev, &cookie))
> +		return -ENODEV;
> +
> +	if (!args->size || !args->data || args->pad || args->flags) {
> +		ret = -EINVAL;
> +		goto out_dev_exit;
> +	}
> +
> +	args->flags |= DRM_ETHOS_BO_NO_MMAP;
> +
> +	ret = ethos_gem_cmdstream_create(file, ddev, args->size, args->data,
> +					 args->flags, &args->handle);
> +
> +out_dev_exit:
> +	drm_dev_exit(cookie);
> +	return ret;
> +}
> +
> +static int ethos_open(struct drm_device *ddev, struct drm_file *file)
> +{
> +	int ret = 0;
> +	struct ethos_file_priv *priv;
> +
> +	if (!try_module_get(THIS_MODULE))
> +		return -EINVAL;
> +
> +	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
> +	if (!priv) {
> +		ret = -ENOMEM;
> +		goto err_put_mod;
> +	}
> +	priv->edev = to_ethos_device(ddev);
> +
> +	ret = ethos_job_open(priv);
> +	if (ret)
> +		goto err_free;
> +
> +	file->driver_priv = priv;
> +	return 0;
> +
> +err_free:
> +	kfree(priv);
> +err_put_mod:
> +	module_put(THIS_MODULE);
> +	return ret;
> +}
> +
> +static void ethos_postclose(struct drm_device *ddev, struct drm_file *file)
> +{
> +	ethos_job_close(file->driver_priv);
> +	kfree(file->driver_priv);
> +	module_put(THIS_MODULE);
> +}
> +
> +static const struct drm_ioctl_desc ethos_drm_driver_ioctls[] = {
> +#define ETHOS_IOCTL(n, func, flags) \
> +	DRM_IOCTL_DEF_DRV(ETHOS_##n, ethos_ioctl_##func, flags)
> +
> +	ETHOS_IOCTL(DEV_QUERY, dev_query, 0),
> +	ETHOS_IOCTL(BO_CREATE, bo_create, 0),
> +	ETHOS_IOCTL(BO_WAIT, bo_wait, 0),
> +	ETHOS_IOCTL(BO_MMAP_OFFSET, bo_mmap_offset, 0),
> +	ETHOS_IOCTL(CMDSTREAM_BO_CREATE, cmdstream_bo_create, 0),
> +	ETHOS_IOCTL(SUBMIT, submit, 0),
> +};
> +
> +DEFINE_DRM_ACCEL_FOPS(ethos_drm_driver_fops);
> +
> +/*
> + * Ethos driver version:
> + * - 1.0 - initial interface
> + */
> +static const struct drm_driver ethos_drm_driver = {
> +	.driver_features = DRIVER_COMPUTE_ACCEL | DRIVER_GEM,
> +	.open = ethos_open,
> +	.postclose = ethos_postclose,
> +	.ioctls = ethos_drm_driver_ioctls,
> +	.num_ioctls = ARRAY_SIZE(ethos_drm_driver_ioctls),
> +	.fops = &ethos_drm_driver_fops,
> +	.name = "ethos",
> +	.desc = "Arm Ethos Accel driver",
> +	.major = 1,
> +	.minor = 0,
> +
> +	.gem_create_object = ethos_gem_create_object,
> +};
> +
> +static bool ethos_is_u65(const struct ethos_device *ethosdev)
> +{
> +	return FIELD_GET(ID_ARCH_MAJOR_MASK, ethosdev->npu_info.id) == 1;
> +}
> +
> +#define AXI_LIMIT_CFG 0x1f3f0002
> +
> +static int ethos_reset(struct ethos_device *ethosdev)
> +{
> +	int ret;
> +	u32 reg;
> +
> +	writel_relaxed(RESET_PENDING_CSL, ethosdev->regs + NPU_REG_RESET);
> +	ret = readl_poll_timeout(ethosdev->regs + NPU_REG_STATUS, reg,
> +				 !FIELD_GET(STATUS_RESET_STATUS, reg),
> +				 USEC_PER_MSEC, USEC_PER_SEC);
> +	if (ret)
> +		return ret;
> +
> +	if (!FIELD_GET(PROT_ACTIVE_CSL, readl_relaxed(ethosdev->regs + NPU_REG_PROT))) {
> +		dev_warn(ethosdev->base.dev, "Could not reset to non-secure mode (PROT = %x)\n",
> +			 readl_relaxed(ethosdev->regs + NPU_REG_PROT));
> +	}
> +
> +	if (ethos_is_u65(ethosdev)) {
> +		/* Assign region 2 to AXI M0, everything else to AXI M1*/
> +		writel_relaxed(0x0000aa8a, ethosdev->regs + NPU_REG_REGIONCFG);
> +		writel_relaxed(AXI_LIMIT_CFG, ethosdev->regs + NPU_REG_AXILIMIT0);
> +		writel_relaxed(AXI_LIMIT_CFG, ethosdev->regs + NPU_REG_AXILIMIT1);
> +		writel_relaxed(AXI_LIMIT_CFG, ethosdev->regs + NPU_REG_AXILIMIT2);
> +		writel_relaxed(AXI_LIMIT_CFG, ethosdev->regs + NPU_REG_AXILIMIT3);
> +	}
> +
> +	if (ethosdev->sram)
> +		memset_io(ethosdev->sram, 0, ethosdev->npu_info.sram_size);
> +
> +	return 0;
> +}
> +
> +static int ethos_device_resume(struct device *dev)
> +{
> +	struct ethos_device *ethosdev = dev_get_drvdata(dev);
> +	int ret;
> +
> +	ret = clk_prepare_enable(ethosdev->core_clk);
> +	if (ret)
> +		return ret;
> +
> +	ret = clk_prepare_enable(ethosdev->apb_clk);
> +	if (ret)
> +		goto err_disable_core_clk;
> +
> +	ret = ethos_reset(ethosdev);
> +	if (!ret)
> +		return 0;
> +
> +err_disable_core_clk:
> +	clk_disable_unprepare(ethosdev->core_clk);
> +	return ret;
> +}
> +
> +static int ethos_device_suspend(struct device *dev)
> +{
> +	struct ethos_device *ethosdev = dev_get_drvdata(dev);
> +
> +	clk_disable_unprepare(ethosdev->apb_clk);
> +	clk_disable_unprepare(ethosdev->core_clk);
> +	return 0;
> +}
> +
> +static int ethos_sram_init(struct ethos_device *ethosdev)
> +{
> +	ethosdev->npu_info.sram_size = 0;
> +
> +	ethosdev->srampool = of_gen_pool_get(ethosdev->base.dev->of_node, "sram", 0);
> +	if (!ethosdev->srampool)
> +		return 0;
> +
> +	ethosdev->npu_info.sram_size = gen_pool_size(ethosdev->srampool);
> +
> +	ethosdev->sram = (void __iomem *)gen_pool_dma_alloc(ethosdev->srampool,
> +							    ethosdev->npu_info.sram_size,
> +							    &ethosdev->sramphys);
> +	if (!ethosdev->sram) {
> +		dev_err(ethosdev->base.dev, "failed to allocate from SRAM pool\n");
> +		return -ENOMEM;
> +	}
> +
> +	return 0;
> +}
> +
> +static int ethos_init(struct ethos_device *ethosdev)
> +{
> +	int ret;
> +	u32 id, config;
> +
> +	ret = devm_pm_runtime_enable(ethosdev->base.dev);
> +	if (ret)
> +		return ret;
> +
> +	ret = pm_runtime_resume_and_get(ethosdev->base.dev);
> +	if (ret)
> +		return ret;
> +
> +	pm_runtime_set_autosuspend_delay(ethosdev->base.dev, 50);
> +	pm_runtime_use_autosuspend(ethosdev->base.dev);
> +
> +	/* If PM is disabled, we need to call ethos_device_resume() manually. */
> +	if (!IS_ENABLED(CONFIG_PM)) {
> +		ret = ethos_device_resume(ethosdev->base.dev);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	ethosdev->npu_info.id = id = readl_relaxed(ethosdev->regs + NPU_REG_ID);
> +	ethosdev->npu_info.config = config = readl_relaxed(ethosdev->regs + NPU_REG_CONFIG);
> +
> +	ethos_sram_init(ethosdev);
> +
> +	dev_info(ethosdev->base.dev,
> +		"Ethos NPU, arch v%ld.%ld.%ld, rev r%ldp%ld, cmd stream ver%ld, %d MACs, %dKB SRAM\n",
> +		FIELD_GET(ID_ARCH_MAJOR_MASK, id),
> +		FIELD_GET(ID_ARCH_MINOR_MASK, id),
> +		FIELD_GET(ID_ARCH_PATCH_MASK, id),
> +		FIELD_GET(ID_VER_MAJOR_MASK, id),
> +		FIELD_GET(ID_VER_MINOR_MASK, id),
> +		FIELD_GET(CONFIG_CMD_STREAM_VER_MASK, config),
> +		1 << FIELD_GET(CONFIG_MACS_PER_CC_MASK, config),
> +		ethosdev->npu_info.sram_size / 1024);
> +
> +	return 0;
> +}
> +
> +static int ethos_probe(struct platform_device *pdev)
> +{
> +	int ret;
> +	struct ethos_device *ethosdev;
> +
> +	ethosdev = devm_drm_dev_alloc(&pdev->dev, &ethos_drm_driver,
> +				      struct ethos_device, base);
> +	if (IS_ERR(ethosdev))
> +		return -ENOMEM;
> +	platform_set_drvdata(pdev, ethosdev);
> +
> +	ret = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(40));
> +	if (ret)
> +		return ret;
> +
> +	ethosdev->regs = devm_platform_ioremap_resource(pdev, 0);
> +
> +	ethosdev->core_clk = devm_clk_get(&pdev->dev, "core");
> +	if (IS_ERR(ethosdev->core_clk))
> +		return PTR_ERR(ethosdev->core_clk);
> +
> +	ethosdev->apb_clk = devm_clk_get_optional(&pdev->dev, "apb");
> +	if (IS_ERR(ethosdev->apb_clk))
> +		return PTR_ERR(ethosdev->apb_clk);
> +
> +	ret = ethos_job_init(ethosdev);
> +	if (ret)
> +		return ret;
> +
> +	ret = ethos_init(ethosdev);
> +	if (ret)
> +		return ret;
> +
> +	ret = drm_dev_register(&ethosdev->base, 0);
> +	if (ret)
> +		pm_runtime_dont_use_autosuspend(ethosdev->base.dev);
> +
> +	pm_runtime_put_autosuspend(ethosdev->base.dev);
> +	return ret;
> +}
> +
> +static void ethos_remove(struct platform_device *pdev)
> +{
> +	struct ethos_device *ethosdev = dev_get_drvdata(&pdev->dev);
> +
> +	drm_dev_unregister(&ethosdev->base);
> +	ethos_job_fini(ethosdev);
> +	if (ethosdev->sram)
> +		gen_pool_free(ethosdev->srampool, (unsigned long)ethosdev->sram,
> +			      ethosdev->npu_info.sram_size);
> +}
> +
> +static const struct of_device_id dt_match[] = {
> +	{ .compatible = "arm,ethos-u65" },
> +	{ .compatible = "arm,ethos-u85" },
> +	{}
> +};
> +MODULE_DEVICE_TABLE(of, dt_match);
> +
> +static DEFINE_RUNTIME_DEV_PM_OPS(ethos_pm_ops,
> +				 ethos_device_suspend,
> +				 ethos_device_resume,
> +				 NULL);
> +
> +static struct platform_driver ethos_driver = {
> +	.probe = ethos_probe,
> +	.remove = ethos_remove,
> +	.driver = {
> +		.name = "ethos",
> +		.pm = pm_ptr(&ethos_pm_ops),
> +		.of_match_table = dt_match,
> +	},
> +};
> +module_platform_driver(ethos_driver);
> +
> +MODULE_AUTHOR("Rob Herring <robh@...nel.org>");
> +MODULE_DESCRIPTION("Arm Ethos Accel Driver");
> +MODULE_LICENSE("Dual MIT/GPL");
> diff --git a/drivers/accel/ethos/ethos_drv.h b/drivers/accel/ethos/ethos_drv.h
> new file mode 100644
> index 000000000000..39a55c1a6bd6
> --- /dev/null
> +++ b/drivers/accel/ethos/ethos_drv.h
> @@ -0,0 +1,15 @@
> +/* SPDX-License-Identifier: GPL-2.0-only OR MIT */
> +/* Copyright 2025 Arm, Ltd. */
> +#ifndef __ETHOS_DRV_H__
> +#define __ETHOS_DRV_H__
> +
> +#include <drm/gpu_scheduler.h>
> +
> +struct ethos_device;
> +
> +struct ethos_file_priv {
> +	struct ethos_device *edev;
> +	struct drm_sched_entity sched_entity;
> +};
> +
> +#endif
> diff --git a/drivers/accel/ethos/ethos_gem.c b/drivers/accel/ethos/ethos_gem.c
> new file mode 100644
> index 000000000000..7f24661d5655
> --- /dev/null
> +++ b/drivers/accel/ethos/ethos_gem.c
> @@ -0,0 +1,707 @@
> +// SPDX-License-Identifier: GPL-2.0-only or MIT
> +/* Copyright 2025 Arm, Ltd. */
> +
> +#include <linux/err.h>
> +#include <linux/slab.h>
> +
> +#include <drm/ethos_accel.h>
> +
> +#include "ethos_device.h"
> +#include "ethos_gem.h"
> +
> +static void ethos_gem_free_object(struct drm_gem_object *obj)
> +{
> +	struct ethos_gem_object *bo = to_ethos_bo(obj);
> +
> +	kfree(bo->info);
> +	drm_gem_free_mmap_offset(&bo->base.base);
> +	drm_gem_dma_free(&bo->base);
> +}
> +
> +static int ethos_gem_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma)
> +{
> +	struct ethos_gem_object *bo = to_ethos_bo(obj);
> +
> +	/* Don't allow mmap on objects that have the NO_MMAP flag set. */
> +	if (bo->flags & DRM_ETHOS_BO_NO_MMAP)
> +		return -EINVAL;
> +
> +	return drm_gem_dma_object_mmap(obj, vma);
> +}
> +
> +static const struct drm_gem_object_funcs ethos_gem_funcs = {
> +	.free = ethos_gem_free_object,
> +	.print_info = drm_gem_dma_object_print_info,
> +	.get_sg_table = drm_gem_dma_object_get_sg_table,
> +	.vmap = drm_gem_dma_object_vmap,
> +	.mmap = ethos_gem_mmap,
> +	.vm_ops = &drm_gem_dma_vm_ops,
> +};
> +
> +/**
> + * ethos_gem_create_object - Implementation of driver->gem_create_object.
> + * @ddev: DRM device
> + * @size: Size in bytes of the memory the object will reference
> + *
> + * This lets the GEM helpers allocate object structs for us, and keep
> + * our BO stats correct.
> + */
> +struct drm_gem_object *ethos_gem_create_object(struct drm_device *ddev, size_t size)
> +{
> +	struct ethos_gem_object *obj;
> +
> +	obj = kzalloc(sizeof(*obj), GFP_KERNEL);
> +	if (!obj)
> +		return ERR_PTR(-ENOMEM);
> +
> +	obj->base.base.funcs = &ethos_gem_funcs;
> +	return &obj->base.base;
> +}
> +
> +/**
> + * ethos_gem_create_with_handle() - Create a GEM object and attach it to a handle.
> + * @file: DRM file.
> + * @ddev: DRM device.
> + * @size: Size of the GEM object to allocate.
> + * @flags: Combination of drm_ethos_bo_flags flags.
> + * @handle: Pointer holding the handle pointing to the new GEM object.
> + *
> + * Return: Zero on success
> + */
> +int ethos_gem_create_with_handle(struct drm_file *file,
> +				 struct drm_device *ddev,
> +				 u64 *size, u32 flags, u32 *handle)
> +{
> +	int ret;
> +	struct drm_gem_dma_object *mem;
> +	struct ethos_gem_object *bo;
> +
> +	mem = drm_gem_dma_create(ddev, *size);
> +	if (IS_ERR(mem))
> +		return PTR_ERR(mem);
> +
> +	bo = to_ethos_bo(&mem->base);
> +	bo->flags = flags;
> +
> +	/*
> +	 * Allocate an id of idr table where the obj is registered
> +	 * and handle has the id what user can see.
> +	 */
> +	ret = drm_gem_handle_create(file, &mem->base, handle);
> +	if (!ret)
> +		*size = bo->base.base.size;
> +
> +	/* drop reference from allocate - handle holds it now. */
> +	drm_gem_object_put(&mem->base);
> +
> +	return ret;
> +}
> +
> +struct dma {
> +	s8 region;
> +	u64 len;
> +	u64 offset;
> +	s64 stride[2];
> +};
> +
> +struct dma_state {
> +	u16 size0;
> +	u16 size1;
> +	s8 mode;
> +	struct dma src;
> +	struct dma dst;
> +};
> +
> +struct buffer {
> +	u64 base;
> +	u32 length;
> +	s8 region;
> +};
> +
> +struct feat_matrix {
> +	u64 base[4];
> +	s64 stride_x;
> +	s64 stride_y;
> +	s64 stride_c;
> +	s8 region;
> +	u8 broadcast;
> +	u16 stride_kernel;
> +	u16 precision;
> +	u16 depth;
> +	u16 width;
> +	u16 width0;
> +	u16 height[3];
> +	u8 pad_top;
> +	u8 pad_left;
> +	u8 pad_bottom;
> +	u8 pad_right;
> +};
> +
> +struct cmd_state {
> +	struct dma_state dma;
> +	struct buffer scale[2];
> +	struct buffer weight[4];
> +	struct feat_matrix ofm;
> +	struct feat_matrix ifm;
> +	struct feat_matrix ifm2;
> +};
> +
> +static void cmd_state_init(struct cmd_state *st)
> +{
> +	/* Initialize to all 1s to detect missing setup */
> +	memset(st, 0xff, sizeof(*st));
> +}
> +
> +static u64 cmd_to_addr(u32 *cmd)
> +{
> +	return ((u64)((cmd[0] & 0xff0000) << 16)) | cmd[1];
> +}
> +
> +static u64 dma_length(struct ethos_validated_cmdstream_info *info,
> +		      struct dma_state *dma_st, struct dma *dma)
> +{
> +	s8 mode = dma_st->mode;
> +	u64 len = dma->len;
> +
> +	if (mode >= 1) {
> +		len += dma->stride[0];
> +		len *= dma_st->size0;
> +	}
> +	if (mode == 2) {
> +		len += dma->stride[1];
> +		len *= dma_st->size1;
> +	}
> +	if (dma->region >= 0)
> +		info->region_size[dma->region] = max(info->region_size[dma->region],
> +						     len + dma->offset);
> +
> +	return len;
> +}
> +
> +static u64 feat_matrix_length(struct ethos_validated_cmdstream_info *info,
> +			      struct feat_matrix *fm,
> +			      u32 x, u32 y, u32 c)
> +{
> +	int storage = fm->precision >> 14;
> +	int tile = 0;
> +
> +	if (fm->region < 0)
> +		return U64_MAX;
> +
> +	switch (storage) {
> +	case 0:
> +		if (x >= fm->width0 + 1) {
> +			x -= fm->width0 + 1;
> +			tile += 1;
> +		}
> +		if (y >= fm->height[tile] + 1) {
> +			y -= fm->height[tile] + 1;
> +			tile += 2;
> +		}
> +		break;
> +	case 1:
> +		if (y >= fm->height[1] + 1) {
> +			y -= fm->height[1] + 1;
> +			tile = 2;
> +		} else if (y >= fm->height[0] + 1) {
> +			y -= fm->height[0] + 1;
> +			tile = 1;
> +		}
> +		break;
> +	}
> +	if (fm->base[tile] == U64_MAX)
> +		return U64_MAX;
> +
> +	u64 addr = fm->base[tile] + y * fm->stride_y;
> +
> +	switch ((fm->precision >> 6) & 0x3) { // format
> +	case 0: //nhwc:
> +		addr += x * fm->stride_x + c;
> +		break;
> +	case 1: //nhcwb16:
> +		u32 element_size = BIT((fm->precision >> 1) & 0x3);
> +
> +		addr += (c / 16) * fm->stride_c + (16 * x + (c & 0xf)) * element_size;
> +		break;
> +	}
> +
> +	info->region_size[fm->region] = max(info->region_size[fm->region], addr + 1);
> +
> +	return addr;
> +}
> +
> +static int calc_sizes(struct drm_device *ddev,
> +		      struct ethos_validated_cmdstream_info *info,
> +		      u16 op, struct cmd_state *st,
> +		      bool ifm, bool ifm2, bool weight, bool scale)
> +{
> +	u64 len;
> +
> +	if (ifm) {
> +		if (st->ifm.stride_kernel == U16_MAX)
> +			return -EINVAL;
> +		u32 stride_y = ((st->ifm.stride_kernel >> 8) & 0x2) +
> +			((st->ifm.stride_kernel >> 1) & 0x1) + 1;
> +		u32 stride_x = ((st->ifm.stride_kernel >> 5) & 0x2) +
> +			(st->ifm.stride_kernel & 0x1) + 1;
> +		u32 ifm_height = st->ofm.height[2] * stride_y +
> +			st->ifm.height[2] - (st->ifm.pad_top + st->ifm.pad_bottom);
> +		u32 ifm_width  = st->ofm.width * stride_x +
> +			st->ifm.width - (st->ifm.pad_left + st->ifm.pad_right);
> +
> +		len = feat_matrix_length(info, &st->ifm, ifm_width,
> +					 ifm_height, st->ifm.depth);
> +		dev_dbg(ddev->dev, "op %d: IFM:%d:0x%llx-0x%llx\n",
> +			op, st->ifm.region, st->ifm.base[0], len);
> +		if (len == U64_MAX)
> +			return -EINVAL;
> +	}
> +
> +	if (ifm2) {
> +		len = feat_matrix_length(info, &st->ifm2, st->ifm.depth,
> +					 0, st->ofm.depth);
> +		dev_dbg(ddev->dev, "op %d: IFM2:%d:0x%llx-0x%llx\n",
> +			op, st->ifm2.region, st->ifm2.base[0], len);
> +		if (len == U64_MAX)
> +			return -EINVAL;
> +	}
> +
> +	if (weight) {
> +		dev_dbg(ddev->dev, "op %d: W:%d:0x%llx-0x%llx\n",
> +			op, st->weight[0].region, st->weight[0].base,
> +			st->weight[0].base + st->weight[0].length - 1);
> +		if (st->weight[0].region < 0 || st->weight[0].base == U64_MAX ||
> +		    st->weight[0].length == U32_MAX)
> +			return -EINVAL;
> +		info->region_size[st->weight[0].region] =
> +			max(info->region_size[st->weight[0].region],
> +			    st->weight[0].base + st->weight[0].length);
> +	}
> +
> +	if (scale) {
> +		dev_dbg(ddev->dev, "op %d: S:%d:0x%llx-0x%llx\n",
> +			op, st->scale[0].region, st->scale[0].base,
> +			st->scale[0].base + st->scale[0].length - 1);
> +		if (st->scale[0].region < 0 || st->scale[0].base == U64_MAX ||
> +		    st->scale[0].length == U32_MAX)
> +			return -EINVAL;
> +		info->region_size[st->scale[0].region] =
> +			max(info->region_size[st->scale[0].region],
> +			    st->scale[0].base + st->scale[0].length);
> +	}
> +
> +	len = feat_matrix_length(info, &st->ofm, st->ofm.width,
> +				 st->ofm.height[2], st->ofm.depth);
> +	dev_dbg(ddev->dev, "op %d: OFM:%d:0x%llx-0x%llx\n",
> +		op, st->ofm.region, st->ofm.base[0], len);
> +	if (len == U64_MAX)
> +		return -EINVAL;
> +	info->output_region[st->ofm.region] = true;
> +
> +	return 0;
> +}
> +
> +static int calc_sizes_elemwise(struct drm_device *ddev,
> +			       struct ethos_validated_cmdstream_info *info,
> +			       u16 op, struct cmd_state *st,
> +			       bool ifm, bool ifm2)
> +{
> +	u32 height, width, depth;
> +	u64 len;
> +
> +	if (ifm) {
> +		height = st->ifm.broadcast & 0x1 ? 0 : st->ofm.height[2];
> +		width = st->ifm.broadcast & 0x2 ? 0 : st->ofm.width;
> +		depth = st->ifm.broadcast & 0x4 ? 0 : st->ofm.depth;
> +
> +		len = feat_matrix_length(info, &st->ifm, width,
> +					 height, depth);
> +		dev_info(ddev->dev, "op %d: IFM:%d:0x%llx-0x%llx\n",
> +			 op, st->ifm.region, st->ifm.base[0], len);
> +		if (len == U64_MAX)
> +			return -EINVAL;
> +	}
> +
> +	if (ifm2) {
> +		height = st->ifm2.broadcast & 0x1 ? 0 : st->ofm.height[2];
> +		width = st->ifm2.broadcast & 0x2 ? 0 : st->ofm.width;
> +		depth = st->ifm2.broadcast & 0x4 ? 0 : st->ofm.depth;
> +
> +		len = feat_matrix_length(info, &st->ifm2, width,
> +					 height, depth);
> +		dev_dbg(ddev->dev, "op %d: IFM2:%d:0x%llx-0x%llx\n",
> +			op, st->ifm2.region, st->ifm2.base[0], len);
> +		if (len == U64_MAX)
> +			return -EINVAL;
> +	}
> +
> +	len = feat_matrix_length(info, &st->ofm, st->ofm.width,
> +				 st->ofm.height[2], st->ofm.depth);
> +	dev_dbg(ddev->dev, "op %d: OFM:%d:0x%llx-0x%llx\n",
> +		op, st->ofm.region, st->ofm.base[0], len);
> +	if (len == U64_MAX)
> +		return -EINVAL;
> +	info->output_region[st->ofm.region] = true;
> +
> +	return 0;
> +}
> +
> +static int ethos_gem_cmdstream_copy_and_validate(struct drm_device *ddev,
> +						 u32 __user *ucmds,
> +						 struct ethos_gem_object *bo,
> +						 u32 size)
> +{
> +	struct ethos_validated_cmdstream_info *info;
> +	u32 *bocmds = bo->base.vaddr;
> +	struct cmd_state st;
> +	int i, ret;
> +
> +	cmd_state_init(&st);
> +
> +	info = kzalloc(sizeof(*info), GFP_KERNEL);
> +	if (!info)
> +		return -ENOMEM;
> +	info->cmd_size = size;
> +
> +	for (i = 0; i < size/4; i++) {
> +		bool use_ifm, use_ifm2, use_scale;
> +		u16 cmd, param;
> +		u32 cmds[2];
> +		u64 addr;
> +
> +		if (get_user(cmds[0], ucmds++)) {
> +			ret = -EFAULT;
> +			goto fault;
> +		}
> +		bocmds[i] = cmds[0];
> +
> +		cmd = cmds[0];
> +		param = cmds[0] >> 16;
> +
> +		if (cmd & 0x4000) {
> +			if (get_user(cmds[1], ucmds++)) {
> +				ret = -EFAULT;
> +				goto fault;
> +			}
> +			i++;
> +			bocmds[i] = cmds[1];
> +			addr = cmd_to_addr(cmds);
> +		}
> +
> +		switch (cmd) {
> +		case NPU_OP_DMA_START:
> +			u64 srclen = dma_length(info, &st.dma, &st.dma.src);
> +			u64 dstlen = dma_length(info, &st.dma, &st.dma.dst);
> +
> +			if (st.dma.dst.region >= 0)
> +				info->output_region[st.dma.dst.region] = true;
> +			dev_info(ddev->dev, "cmd: DMA SRC:%d:0x%llx+0x%llx DST:%d:0x%llx+0x%llx\n",
> +				 st.dma.src.region, st.dma.src.offset, srclen,
> +				 st.dma.dst.region, st.dma.dst.offset, dstlen);
> +			break;
> +		case NPU_OP_CONV:
> +		case NPU_OP_DEPTHWISE:
> +			use_ifm2 = param & 0x1;  // weights_ifm2
> +			use_scale = !(st.ofm.precision & 0x100);
> +			ret = calc_sizes(ddev, info, cmd, &st, true, use_ifm2,
> +					 !use_ifm2, use_scale);
> +			if (ret)
> +				goto fault;
> +			break;
> +		case NPU_OP_POOL:
> +			use_ifm = param != 0x4;  // pooling mode
> +			use_scale = !(st.ofm.precision & 0x100);
> +			ret = calc_sizes(ddev, info, cmd, &st, use_ifm, false,
> +					 false, use_scale);
> +			if (ret)
> +				goto fault;
> +			break;
> +		case NPU_OP_ELEMENTWISE:
> +			use_ifm2 = !((st.ifm2.broadcast == 8) || (param == 5) ||
> +				(param == 6) || (param == 7) || (param == 0x24));
> +			use_ifm = st.ifm.broadcast != 8;
> +			ret = calc_sizes_elemwise(ddev, info, cmd, &st, use_ifm, use_ifm2);
> +			if (ret)
> +				goto fault;
> +			break;
> +		case NPU_OP_RESIZE: // U85 only
> +			WARN_ON(1); // TODO
> +			break;
> +		case NPU_SET_KERNEL_WIDTH_M1:
> +			st.ifm.width = param;
> +			break;
> +		case NPU_SET_KERNEL_HEIGHT_M1:
> +			st.ifm.height[2] = param;
> +			break;
> +		case NPU_SET_KERNEL_STRIDE:
> +			st.ifm.stride_kernel = param;
> +			break;
> +		case NPU_SET_IFM_PAD_TOP:
> +			st.ifm.pad_top = param & 0x7f;
> +			break;
> +		case NPU_SET_IFM_PAD_LEFT:
> +			st.ifm.pad_left = param & 0x7f;
> +			break;
> +		case NPU_SET_IFM_PAD_RIGHT:
> +			st.ifm.pad_right = param & 0xff;
> +			break;
> +		case NPU_SET_IFM_PAD_BOTTOM:
> +			st.ifm.pad_bottom = param & 0xff;
> +			break;
> +		case NPU_SET_IFM_DEPTH_M1:
> +			st.ifm.depth = param;
> +			break;
> +		case NPU_SET_IFM_PRECISION:
> +			st.ifm.precision = param;
> +			break;
> +		case NPU_SET_IFM_BROADCAST:
> +			st.ifm.broadcast = param;
> +			break;
> +		case NPU_SET_IFM_REGION:
> +			st.ifm.region = param & 0x7f;
> +			break;
> +		case NPU_SET_IFM_WIDTH0_M1:
> +			st.ifm.width0 = param;
> +			break;
> +		case NPU_SET_IFM_HEIGHT0_M1:
> +			st.ifm.height[0] = param;
> +			break;
> +		case NPU_SET_IFM_HEIGHT1_M1:
> +			st.ifm.height[1] = param;
> +			break;
> +		case NPU_SET_IFM_BASE0:
> +		case NPU_SET_IFM_BASE1:
> +		case NPU_SET_IFM_BASE2:
> +		case NPU_SET_IFM_BASE3:
> +			st.ifm.base[cmd & 0x3] = addr;
> +			break;
> +		case NPU_SET_IFM_STRIDE_X:
> +			st.ifm.stride_x = addr;
> +			break;
> +		case NPU_SET_IFM_STRIDE_Y:
> +			st.ifm.stride_y = addr;
> +			break;
> +		case NPU_SET_IFM_STRIDE_C:
> +			st.ifm.stride_c = addr;
> +			break;
> +
> +		case NPU_SET_OFM_WIDTH_M1:
> +			st.ofm.width = param;
> +			break;
> +		case NPU_SET_OFM_HEIGHT_M1:
> +			st.ofm.height[2] = param;
> +			break;
> +		case NPU_SET_OFM_DEPTH_M1:
> +			st.ofm.depth = param;
> +			break;
> +		case NPU_SET_OFM_PRECISION:
> +			st.ofm.precision = param;
> +			break;
> +		case NPU_SET_OFM_REGION:
> +			st.ofm.region = param & 0x7;
> +			break;
> +		case NPU_SET_OFM_WIDTH0_M1:
> +			st.ofm.width0 = param;
> +			break;
> +		case NPU_SET_OFM_HEIGHT0_M1:
> +			st.ofm.height[0] = param;
> +			break;
> +		case NPU_SET_OFM_HEIGHT1_M1:
> +			st.ofm.height[1] = param;
> +			break;
> +		case NPU_SET_OFM_BASE0:
> +		case NPU_SET_OFM_BASE1:
> +		case NPU_SET_OFM_BASE2:
> +		case NPU_SET_OFM_BASE3:
> +			st.ofm.base[cmd & 0x3] = addr;
> +			break;
> +		case NPU_SET_OFM_STRIDE_X:
> +			st.ofm.stride_x = addr;
> +			break;
> +		case NPU_SET_OFM_STRIDE_Y:
> +			st.ofm.stride_y = addr;
> +			break;
> +		case NPU_SET_OFM_STRIDE_C:
> +			st.ofm.stride_c = addr;
> +			break;
> +
> +		case NPU_SET_IFM2_BROADCAST:
> +			st.ifm2.broadcast = param;
> +			break;
> +		case NPU_SET_IFM2_PRECISION:
> +			st.ifm2.precision = param;
> +			break;
> +		case NPU_SET_IFM2_REGION:
> +			st.ifm2.region = param & 0x7;
> +			break;
> +		case NPU_SET_IFM2_WIDTH0_M1:
> +			st.ifm2.width0 = param;
> +			break;
> +		case NPU_SET_IFM2_HEIGHT0_M1:
> +			st.ifm2.height[0] = param;
> +			break;
> +		case NPU_SET_IFM2_HEIGHT1_M1:
> +			st.ifm2.height[1] = param;
> +			break;
> +		case NPU_SET_IFM2_BASE0:
> +		case NPU_SET_IFM2_BASE1:
> +		case NPU_SET_IFM2_BASE2:
> +		case NPU_SET_IFM2_BASE3:
> +			st.ifm2.base[cmd & 0x3] = addr;
> +			break;
> +		case NPU_SET_IFM2_STRIDE_X:
> +			st.ifm2.stride_x = addr;
> +			break;
> +		case NPU_SET_IFM2_STRIDE_Y:
> +			st.ifm2.stride_y = addr;
> +			break;
> +		case NPU_SET_IFM2_STRIDE_C:
> +			st.ifm2.stride_c = addr;
> +			break;
> +
> +		case NPU_SET_WEIGHT_REGION:
> +			st.weight[0].region = param & 0x7;
> +			break;
> +		case NPU_SET_SCALE_REGION:
> +			st.scale[0].region = param & 0x7;
> +			break;
> +		case NPU_SET_WEIGHT_BASE:
> +			st.weight[0].base = addr;
> +			break;
> +		case NPU_SET_WEIGHT_LENGTH:
> +			st.weight[0].length = cmds[1];
> +			break;
> +		case NPU_SET_SCALE_BASE:
> +			st.scale[0].base = addr;
> +			break;
> +		case NPU_SET_SCALE_LENGTH:
> +			st.scale[0].length = cmds[1];
> +			break;
> +		case NPU_SET_WEIGHT1_BASE:
> +			st.weight[1].base = addr;
> +			break;
> +		case NPU_SET_WEIGHT1_LENGTH:
> +			st.weight[1].length = cmds[1];
> +			break;
> +		case NPU_SET_SCALE1_BASE: // NPU_SET_WEIGHT2_BASE (U85)
> +			if (0 /*U85*/)
> +				st.weight[2].base = addr;
> +			else
> +				st.scale[1].base = addr;
> +			break;
> +		case NPU_SET_SCALE1_LENGTH: // NPU_SET_WEIGHT2_LENGTH (U85)
> +			if (0 /*U85*/)
> +				st.weight[1].length = cmds[1];
> +			else
> +				st.scale[1].length = cmds[1];
> +			break;
> +		case NPU_SET_WEIGHT3_BASE:
> +			st.weight[3].base = addr;
> +			break;
> +		case NPU_SET_WEIGHT3_LENGTH:
> +			st.weight[3].length = cmds[1];
> +			break;
> +
> +		case NPU_SET_DMA0_SRC_REGION:
> +			if (param & 0x100)
> +				st.dma.src.region = -1;
> +			else
> +				st.dma.src.region = param & 0x7;
> +			st.dma.mode = (param >> 9) & 0x3;
> +			break;
> +		case NPU_SET_DMA0_DST_REGION:
> +			if (param & 0x100)
> +				st.dma.dst.region = -1;
> +			else
> +				st.dma.dst.region = param & 0x7;
> +			break;
> +		case NPU_SET_DMA0_SIZE0:
> +			st.dma.size0 = param;
> +			break;
> +		case NPU_SET_DMA0_SIZE1:
> +			st.dma.size1 = param;
> +			break;
> +		case NPU_SET_DMA0_SRC_STRIDE0:
> +			st.dma.src.stride[0] = ((s64)addr << 24) >> 24;
> +			break;
> +		case NPU_SET_DMA0_SRC_STRIDE1:
> +			st.dma.src.stride[1] = ((s64)addr << 24) >> 24;
> +			break;
> +		case NPU_SET_DMA0_DST_STRIDE0:
> +			st.dma.dst.stride[0] = ((s64)addr << 24) >> 24;
> +			break;
> +		case NPU_SET_DMA0_DST_STRIDE1:
> +			st.dma.dst.stride[1] = ((s64)addr << 24) >> 24;
> +			break;
> +		case NPU_SET_DMA0_SRC:
> +			st.dma.src.offset = addr;
> +			break;
> +		case NPU_SET_DMA0_DST:
> +			st.dma.dst.offset = addr;
> +			break;
> +		case NPU_SET_DMA0_LEN:
> +			st.dma.src.len = st.dma.dst.len = addr;
> +			break;
> +		default:
> +			break;
> +		}
> +
> +	}
> +
> +	for (i = 0; i < NPU_BASEP_REGION_MAX; i++) {
> +		if (!info->region_size[i])
> +			continue;
> +		dev_dbg(ddev->dev, "region %d max size: 0x%llx\n",
> +			i, info->region_size[i]);
> +	}
> +
> +	bo->info = info;
> +	return 0;
> +
> +fault:
> +	kfree(info);
> +	return ret;
> +}
> +
> +/**
> + * ethos_gem_cmdstream_create() - Create a GEM object and attach it to a handle.
> + * @file: DRM file.
> + * @ddev: DRM device.
> + * @exclusive_vm: Exclusive VM. Not NULL if the GEM object can't be shared.
> + * @size: Size of the GEM object to allocate.
> + * @flags: Combination of drm_ethos_bo_flags flags.
> + * @handle: Pointer holding the handle pointing to the new GEM object.
> + *
> + * Return: Zero on success
> + */
> +int ethos_gem_cmdstream_create(struct drm_file *file,
> +			       struct drm_device *ddev,
> +			       u32 size, u64 data, u32 flags, u32 *handle)
> +{
> +	int ret;
> +	struct drm_gem_dma_object *mem;
> +	struct ethos_gem_object *bo;
> +
> +	mem = drm_gem_dma_create(ddev, size);
> +	if (IS_ERR(mem))
> +		return PTR_ERR(mem);
> +
> +	bo = to_ethos_bo(&mem->base);
> +	bo->flags = flags;
> +
> +	ret = ethos_gem_cmdstream_copy_and_validate(ddev, (void __user *)(uintptr_t)data, bo, size);
> +	if (ret)
> +		goto fail;
> +
> +	/*
> +	 * Allocate an id of idr table where the obj is registered
> +	 * and handle has the id what user can see.
> +	 */
> +	ret = drm_gem_handle_create(file, &mem->base, handle);
> +
> +fail:
> +	/* drop reference from allocate - handle holds it now. */
> +	drm_gem_object_put(&mem->base);
> +
> +	return ret;
> +}
> diff --git a/drivers/accel/ethos/ethos_gem.h b/drivers/accel/ethos/ethos_gem.h
> new file mode 100644
> index 000000000000..2bcca7524af5
> --- /dev/null
> +++ b/drivers/accel/ethos/ethos_gem.h
> @@ -0,0 +1,46 @@
> +/* SPDX-License-Identifier: GPL-2.0 or MIT */
> +/* Copyright 2025 Arm, Ltd. */
> +
> +#ifndef __ETHOS_GEM_H__
> +#define __ETHOS_GEM_H__
> +
> +#include "ethos_device.h"
> +#include <drm/drm_gem_dma_helper.h>
> +
> +struct ethos_validated_cmdstream_info {
> +	u32 cmd_size;
> +	u64 region_size[NPU_BASEP_REGION_MAX];
> +	bool output_region[NPU_BASEP_REGION_MAX];
> +};
> +
> +/**
> + * struct ethos_gem_object - Driver specific GEM object.
> + */
> +struct ethos_gem_object {
> +	/** @base: Inherit from drm_gem_shmem_object. */
> +	struct drm_gem_dma_object base;
> +
> +	struct ethos_validated_cmdstream_info *info;
> +
> +	/** @flags: Combination of drm_ethos_bo_flags flags. */
> +	u32 flags;
> +};
> +
> +static inline
> +struct ethos_gem_object *to_ethos_bo(struct drm_gem_object *obj)
> +{
> +	return container_of(to_drm_gem_dma_obj(obj), struct ethos_gem_object, base);
> +}
> +
> +struct drm_gem_object *ethos_gem_create_object(struct drm_device *ddev,
> +					       size_t size);
> +
> +int ethos_gem_create_with_handle(struct drm_file *file,
> +				 struct drm_device *ddev,
> +				 u64 *size, u32 flags, uint32_t *handle);
> +
> +int ethos_gem_cmdstream_create(struct drm_file *file,
> +			       struct drm_device *ddev,
> +			       u32 size, u64 data, u32 flags, u32 *handle);
> +
> +#endif /* __ETHOS_GEM_H__ */
> diff --git a/drivers/accel/ethos/ethos_job.c b/drivers/accel/ethos/ethos_job.c
> new file mode 100644
> index 000000000000..2d44e7d2a333
> --- /dev/null
> +++ b/drivers/accel/ethos/ethos_job.c
> @@ -0,0 +1,514 @@
> +// SPDX-License-Identifier: GPL-2.0-only OR MIT
> +/* Copyright 2024-2025 Tomeu Vizoso <tomeu@...euvizoso.net> */
> +/* Copyright 2025 Arm, Ltd. */
> +
> +#include <linux/bitfield.h>
> +#include <linux/genalloc.h>
> +#include <linux/interrupt.h>
> +#include <linux/iopoll.h>
> +#include <linux/platform_device.h>
> +#include <linux/pm_runtime.h>
> +
> +#include <drm/drm_file.h>
> +#include <drm/drm_gem.h>
> +#include <drm/drm_gem_dma_helper.h>
> +#include <drm/ethos_accel.h>
> +
> +#include "ethos_device.h"
> +#include "ethos_drv.h"
> +#include "ethos_gem.h"
> +#include "ethos_job.h"
> +
> +#define JOB_TIMEOUT_MS 500
> +
> +static struct ethos_job *to_ethos_job(struct drm_sched_job *sched_job)
> +{
> +	return container_of(sched_job, struct ethos_job, base);
> +}
> +
> +static const char *ethos_fence_get_driver_name(struct dma_fence *fence)
> +{
> +	return "ethos";
> +}
> +
> +static const char *ethos_fence_get_timeline_name(struct dma_fence *fence)
> +{
> +	return "ethos-npu";
> +}
> +
> +static const struct dma_fence_ops ethos_fence_ops = {
> +	.get_driver_name = ethos_fence_get_driver_name,
> +	.get_timeline_name = ethos_fence_get_timeline_name,
> +};
> +
> +static struct dma_fence *ethos_fence_create(struct ethos_device *dev)
> +{
> +	struct dma_fence *fence;
> +
> +	fence = kzalloc(sizeof(*fence), GFP_KERNEL);
> +	if (!fence)
> +		return ERR_PTR(-ENOMEM);
> +
> +	dma_fence_init(fence, &ethos_fence_ops, &dev->fence_lock,
> +		       dev->fence_context, ++dev->emit_seqno);
> +
> +	return fence;
> +}
> +
> +static void ethos_job_hw_submit(struct ethos_device *dev, struct ethos_job *job)
> +{
> +	struct drm_gem_dma_object *cmd_bo = to_drm_gem_dma_obj(job->cmd_bo);
> +	struct ethos_validated_cmdstream_info *cmd_info = to_ethos_bo(job->cmd_bo)->info;
> +
> +	for (int i = 0; i < job->region_cnt; i++) {
> +		struct drm_gem_dma_object *bo;
> +		int region = job->region_bo_num[i];
> +
> +		bo = to_drm_gem_dma_obj(job->region_bo[i]);
> +		writel_relaxed(lower_32_bits(bo->dma_addr), dev->regs + NPU_REG_BASEP(region));
> +		writel_relaxed(upper_32_bits(bo->dma_addr), dev->regs + NPU_REG_BASEP_HI(region));
> +		dev_dbg(dev->base.dev, "Region %d base addr = %pad\n", region, &bo->dma_addr);
> +	}
> +
> +	if (job->sram_size) {
> +		writel_relaxed(lower_32_bits(dev->sramphys),
> +			       dev->regs + NPU_REG_BASEP(ETHOS_SRAM_REGION));
> +		writel_relaxed(upper_32_bits(dev->sramphys),
> +			       dev->regs + NPU_REG_BASEP_HI(ETHOS_SRAM_REGION));
> +		dev_dbg(dev->base.dev, "Region %d base addr = %pad (SRAM)\n",
> +			 ETHOS_SRAM_REGION, &dev->sramphys);
> +	}
> +
> +	writel_relaxed(lower_32_bits(cmd_bo->dma_addr), dev->regs + NPU_REG_QBASE);
> +	writel_relaxed(upper_32_bits(cmd_bo->dma_addr), dev->regs + NPU_REG_QBASE_HI);
> +	writel_relaxed(cmd_info->cmd_size, dev->regs + NPU_REG_QSIZE);
> +
> +	writel(CMD_TRANSITION_TO_RUN, dev->regs + NPU_REG_CMD);
> +
> +	dev_dbg(dev->base.dev,
> +		"Submitted cmd at %pad to core\n", &cmd_bo->dma_addr);
> +}
> +
> +static int ethos_acquire_object_fences(struct ethos_job *job)
> +{
> +	int i, ret;
> +	struct drm_gem_object **bos = job->region_bo;
> +	struct ethos_validated_cmdstream_info *info = to_ethos_bo(job->cmd_bo)->info;
> +
> +	for (i = 0; i < job->region_cnt; i++) {
> +		bool is_write;
> +
> +		if (!bos[i])
> +			break;
> +
> +		ret = dma_resv_reserve_fences(bos[i]->resv, 1);
> +		if (ret)
> +			return ret;
> +
> +		is_write = info->output_region[job->region_bo_num[i]];
> +		ret = drm_sched_job_add_implicit_dependencies(&job->base, bos[i],
> +							      is_write);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +static void ethos_attach_object_fences(struct ethos_job *job)
> +{
> +	int i;
> +	struct dma_fence *fence = job->inference_done_fence;
> +	struct drm_gem_object **bos = job->region_bo;
> +	struct ethos_validated_cmdstream_info *info = to_ethos_bo(job->cmd_bo)->info;
> +
> +	for (i = 0; i < job->region_cnt; i++)
> +		if (info->output_region[job->region_bo_num[i]])
> +			dma_resv_add_fence(bos[i]->resv, fence, DMA_RESV_USAGE_WRITE);
> +}
> +
> +static int ethos_job_do_push(struct ethos_job *job)
> +{
> +	struct ethos_device *dev = job->dev;
> +	int ret;
> +
> +	guard(mutex)(&dev->sched_lock);
> +
> +	drm_sched_job_arm(&job->base);
> +
> +	job->inference_done_fence = dma_fence_get(&job->base.s_fence->finished);
> +
> +	ret = ethos_acquire_object_fences(job);
> +	if (ret)
> +		return ret;
> +
> +	kref_get(&job->refcount); /* put by scheduler job completion */
> +
> +	drm_sched_entity_push_job(&job->base);
> +
> +	return 0;
> +}
> +
> +static int ethos_job_push(struct ethos_job *job)
> +{
> +	struct ww_acquire_ctx acquire_ctx;
> +	int ret;
> +
> +	ret = drm_gem_lock_reservations(job->region_bo, job->region_cnt, &acquire_ctx);
> +	if (ret)
> +		return ret;
> +
> +	ret = ethos_job_do_push(job);
> +	if (!ret)
> +		ethos_attach_object_fences(job);
> +
> +	drm_gem_unlock_reservations(job->region_bo, job->region_cnt, &acquire_ctx);
> +	return ret;
> +}
> +
> +static void ethos_job_cleanup(struct kref *ref)
> +{
> +	struct ethos_job *job = container_of(ref, struct ethos_job,
> +						refcount);
> +	unsigned int i;
> +
> +	dma_fence_put(job->done_fence);
> +	dma_fence_put(job->inference_done_fence);
> +
> +	for (i = 0; i < job->region_cnt; i++)
> +		drm_gem_object_put(job->region_bo[i]);
> +
> +	drm_gem_object_put(job->cmd_bo);
> +
> +	kfree(job);
> +}
> +
> +static void ethos_job_put(struct ethos_job *job)
> +{
> +	kref_put(&job->refcount, ethos_job_cleanup);
> +}
> +
> +static void ethos_job_free(struct drm_sched_job *sched_job)
> +{
> +	struct ethos_job *job = to_ethos_job(sched_job);
> +
> +	drm_sched_job_cleanup(sched_job);
> +	ethos_job_put(job);
> +}
> +
> +static struct dma_fence *ethos_job_run(struct drm_sched_job *sched_job)
> +{
> +	struct ethos_job *job = to_ethos_job(sched_job);
> +	struct ethos_device *dev = job->dev;
> +	struct dma_fence *fence = NULL;
> +	int ret;
> +
> +	if (unlikely(job->base.s_fence->finished.error))
> +		return NULL;
> +
> +	fence = ethos_fence_create(dev);
> +	if (IS_ERR(fence))
> +		return fence;
> +
> +	if (job->done_fence)
> +		dma_fence_put(job->done_fence);
> +	job->done_fence = dma_fence_get(fence);
> +
> +	ret = pm_runtime_get_sync(dev->base.dev);
> +	if (ret < 0)
> +		return fence;
> +
> +	scoped_guard(mutex, &dev->job_lock) {
> +		dev->in_flight_job = job;
> +		ethos_job_hw_submit(dev, job);
> +	}
> +
> +	return fence;
> +}
> +
> +static void ethos_job_handle_irq(struct ethos_device *dev)
> +{
> +	u32 status;
> +
> +	pm_runtime_mark_last_busy(dev->base.dev);
> +
> +	status = readl_relaxed(dev->regs + NPU_REG_STATUS);
> +
> +	if (status & (STATUS_BUS_STATUS | STATUS_CMD_PARSE_ERR)) {
> +		dev_err(dev->base.dev, "Error IRQ - %x\n", status);
> +		drm_sched_fault(&dev->sched);
> +		return;
> +	}
> +
> +	scoped_guard(mutex, &dev->job_lock) {
> +		if (dev->in_flight_job) {
> +			dma_fence_signal(dev->in_flight_job->done_fence);
> +			pm_runtime_put_autosuspend(dev->base.dev);
> +			dev->in_flight_job = NULL;
> +		}
> +	}
> +}
> +
> +static irqreturn_t ethos_job_irq_handler_thread(int irq, void *data)
> +{
> +	struct ethos_device *dev = data;
> +
> +	ethos_job_handle_irq(dev);
> +
> +	return IRQ_HANDLED;
> +}
> +
> +static irqreturn_t ethos_job_irq_handler(int irq, void *data)
> +{
> +	struct ethos_device *dev = data;
> +	u32 status = readl_relaxed(dev->regs + NPU_REG_STATUS);
> +
> +	if (!(status & STATUS_IRQ_RAISED))
> +		return IRQ_NONE;
> +
> +	writel_relaxed(CMD_CLEAR_IRQ, dev->regs + NPU_REG_CMD);
> +	return IRQ_WAKE_THREAD;
> +}
> +
> +static enum drm_gpu_sched_stat ethos_job_timedout(struct drm_sched_job *bad)
> +{
> +	struct ethos_device *dev = to_ethos_job(bad)->dev;
> +	bool running;
> +
> +	running = FIELD_GET(STATUS_STATE_RUNNING, readl_relaxed(dev->regs + NPU_REG_STATUS));
> +	if (running) {
> +		int ret;
> +		u32 reg, cmdaddr = readl_relaxed(dev->regs + NPU_REG_QREAD);
> +
> +		ret = readl_relaxed_poll_timeout(dev->regs + NPU_REG_QREAD,
> +						 reg,
> +						 reg != cmdaddr,
> +						 USEC_PER_MSEC, 100 * USEC_PER_MSEC);
> +
> +		/* If still running and progress is being made, just return */
> +		if (!ret)
> +			return DRM_GPU_SCHED_STAT_NO_HANG;
> +	}
> +
> +	dev_err(dev->base.dev, "NPU sched timed out: NPU %s, cmdstream offset: 0x%x\n",
> +		running ? "running" : "stopped",
> +		readl_relaxed(dev->regs + NPU_REG_QREAD));
> +
> +	drm_sched_stop(&dev->sched, bad);
> +
> +	/*
> +	 * Remaining interrupts have been handled, but we might still have
> +	 * stuck jobs. Let's make sure the PM counters stay balanced by
> +	 * manually calling pm_runtime_put_noidle().
> +	 */
> +	scoped_guard(mutex, &dev->job_lock) {
> +		if (dev->in_flight_job)
> +			pm_runtime_put_noidle(dev->base.dev);
> +
> +		dev->in_flight_job = NULL;
> +	}
> +
> +	/* Proceed with reset now. */
> +	pm_runtime_force_suspend(dev->base.dev);
> +	pm_runtime_force_resume(dev->base.dev);
> +
> +	/* Restart the scheduler */
> +	drm_sched_start(&dev->sched, 0);
> +
> +	return DRM_GPU_SCHED_STAT_RESET;
> +}
> +
> +static const struct drm_sched_backend_ops ethos_sched_ops = {
> +	.run_job = ethos_job_run,
> +	.timedout_job = ethos_job_timedout,
> +	.free_job = ethos_job_free
> +};
> +
> +int ethos_job_init(struct ethos_device *dev)
> +{
> +	struct drm_sched_init_args args = {
> +		.ops = &ethos_sched_ops,
> +		.num_rqs = DRM_SCHED_PRIORITY_COUNT,
> +		.credit_limit = 1,
> +		.timeout = msecs_to_jiffies(JOB_TIMEOUT_MS),
> +		.name = dev_name(dev->base.dev),
> +		.dev = dev->base.dev,
> +	};
> +	int ret;
> +
> +	spin_lock_init(&dev->fence_lock);
> +	mutex_init(&dev->job_lock);
> +
> +	dev->irq = platform_get_irq(to_platform_device(dev->base.dev), 0);
> +	if (dev->irq < 0)
> +		return dev->irq;
> +
> +	ret = devm_request_threaded_irq(dev->base.dev, dev->irq,
> +					ethos_job_irq_handler,
> +					ethos_job_irq_handler_thread,
> +					IRQF_SHARED, KBUILD_MODNAME,
> +					dev);
> +	if (ret) {
> +		dev_err(dev->base.dev, "failed to request irq");
> +		return ret;
> +	}
> +
> +	dev->fence_context = dma_fence_context_alloc(1);
> +
> +	ret = drm_sched_init(&dev->sched, &args);
> +	if (ret) {
> +		dev_err(dev->base.dev, "Failed to create scheduler: %d.", ret);
> +		goto err_sched;
> +	}
> +
> +	return 0;
> +
> +err_sched:
> +	drm_sched_fini(&dev->sched);
> +	return ret;
> +}
> +
> +void ethos_job_fini(struct ethos_device *dev)
> +{
> +	drm_sched_fini(&dev->sched);
> +}
> +
> +int ethos_job_open(struct ethos_file_priv *ethos_priv)
> +{
> +	struct ethos_device *dev = ethos_priv->edev;
> +	struct drm_gpu_scheduler *sched = &dev->sched;
> +	int ret;
> +
> +	ret = drm_sched_entity_init(&ethos_priv->sched_entity,
> +				    DRM_SCHED_PRIORITY_NORMAL,
> +				    &sched, 1, NULL);
> +	return WARN_ON(ret);
> +}
> +
> +void ethos_job_close(struct ethos_file_priv *ethos_priv)
> +{
> +	struct drm_sched_entity *entity = &ethos_priv->sched_entity;
> +
> +	drm_sched_entity_destroy(entity);
> +}
> +
> +int ethos_job_is_idle(struct ethos_device *dev)
> +{
> +	/* If there are any jobs in this HW queue, we're not idle */
> +	if (atomic_read(&dev->sched.credit_count))
> +		return false;
> +
> +	return true;
> +}
> +
> +static int ethos_ioctl_submit_job(struct drm_device *dev, struct drm_file *file,
> +				   struct drm_ethos_job *job)
> +{
> +	struct ethos_device *edev = to_ethos_device(dev);
> +	struct ethos_file_priv *file_priv = file->driver_priv;
> +	struct ethos_job *ejob = NULL;
> +	struct ethos_validated_cmdstream_info *cmd_info;
> +	int ret = 0;
> +
> +	/* BO region 2 is reserved if SRAM is used */
> +	if (job->region_bo_handles[ETHOS_SRAM_REGION] && job->sram_size)
> +		return -EINVAL;
> +
> +	if (edev->npu_info.sram_size < job->sram_size)
> +		return -EINVAL;
> +
> +	ejob = kzalloc(sizeof(*ejob), GFP_KERNEL);
> +	if (!ejob)
> +		return -ENOMEM;
> +
> +	kref_init(&ejob->refcount);
> +
> +	ejob->dev = edev;
> +	ejob->sram_size = job->sram_size;
> +
> +	ret = drm_sched_job_init(&ejob->base,
> +				 &file_priv->sched_entity,
> +				 1, NULL, file->client_id);
> +	if (ret)
> +		goto out_put_job;
> +
> +	ejob->cmd_bo = drm_gem_object_lookup(file, job->cmd_bo);
> +	cmd_info = to_ethos_bo(ejob->cmd_bo)->info;
> +	if (!ejob->cmd_bo)
> +		goto out_cleanup_job;
> +
> +	for (int i = 0; i < NPU_BASEP_REGION_MAX; i++) {
> +		struct drm_gem_object *gem;
> +
> +		if (job->region_bo_handles[i] == 0)
> +			continue;
> +
> +		/* Don't allow a region to point to the cmd BO */
> +		if (job->region_bo_handles[i] == job->cmd_bo) {
> +			ret = -EINVAL;
> +			goto out_cleanup_job;
> +		}
> +
> +		gem = drm_gem_object_lookup(file, job->region_bo_handles[i]);
> +
> +		/* Verify the command stream doesn't have accesses outside the BO */
> +		if (cmd_info->region_size[i] > gem->size) {
> +			dev_err(dev->dev,
> +				"cmd stream region %d size greater than BO size (%llu > %zu)\n",
> +				i, cmd_info->region_size[i], gem->size);
> +			ret = -EOVERFLOW;
> +			goto out_cleanup_job;
> +		}
> +
> +		ejob->region_bo[ejob->region_cnt] = gem;
> +		ejob->region_bo_num[ejob->region_cnt] = i;
> +		ejob->region_cnt++;
> +	}
> +	ret = ethos_job_push(ejob);
> +	if (ret)
> +		goto out_cleanup_job;
> +
> +out_cleanup_job:
> +	if (ret)
> +		drm_sched_job_cleanup(&ejob->base);
> +out_put_job:
> +	ethos_job_put(ejob);
> +
> +	return ret;
> +}
> +
> +int ethos_ioctl_submit(struct drm_device *dev, void *data, struct drm_file *file)
> +{
> +	struct drm_ethos_submit *args = data;
> +	struct drm_ethos_job *jobs;
> +	int ret = 0;
> +	unsigned int i = 0;
> +
> +	if (args->pad) {
> +		drm_dbg(dev, "Reserved field in drm_ethos_submit struct should be 0.\n");
> +		return -EINVAL;
> +	}
> +
> +	jobs = kvmalloc_array(args->job_count, sizeof(*jobs), GFP_KERNEL);
> +	if (!jobs)
> +		return -ENOMEM;
> +
> +	if (copy_from_user(jobs,
> +			   (void __user *)(uintptr_t)args->jobs,
> +			   args->job_count * sizeof(*jobs))) {
> +		ret = -EFAULT;
> +		drm_dbg(dev, "Failed to copy incoming job array\n");
> +		goto exit;
> +	}
> +
> +	for (i = 0; i < args->job_count; i++) {
> +		ret = ethos_ioctl_submit_job(dev, file, &jobs[i]);
> +		if (ret)
> +			break;
> +	}
> +
> +exit:
> +	kfree(jobs);
> +
> +	return ret;
> +}
> diff --git a/drivers/accel/ethos/ethos_job.h b/drivers/accel/ethos/ethos_job.h
> new file mode 100644
> index 000000000000..c6bfe7361b4f
> --- /dev/null
> +++ b/drivers/accel/ethos/ethos_job.h
> @@ -0,0 +1,41 @@
> +/* SPDX-License-Identifier: GPL-2.0-only OR MIT */
> +/* Copyright 2024-2025 Tomeu Vizoso <tomeu@...euvizoso.net> */
> +/* Copyright 2025 Arm, Ltd. */
> +
> +#ifndef __ETHOS_JOB_H__
> +#define __ETHOS_JOB_H__
> +
> +#include <linux/kref.h>
> +#include <drm/gpu_scheduler.h>
> +
> +struct ethos_device;
> +struct ethos_file_priv;
> +
> +struct ethos_job {
> +	struct drm_sched_job base;
> +	struct ethos_device *dev;
> +
> +	struct drm_gem_object *cmd_bo;
> +	struct drm_gem_object *region_bo[NPU_BASEP_REGION_MAX];
> +	u8 region_bo_num[NPU_BASEP_REGION_MAX];
> +	u8 region_cnt;
> +	u32 sram_size;
> +
> +	/* Fence to be signaled by drm-sched once its done with the job */
> +	struct dma_fence *inference_done_fence;
> +
> +	/* Fence to be signaled by IRQ handler when the job is complete. */
> +	struct dma_fence *done_fence;
> +
> +	struct kref refcount;
> +};
> +
> +int ethos_ioctl_submit(struct drm_device *dev, void *data, struct drm_file *file);
> +
> +int ethos_job_init(struct ethos_device *dev);
> +void ethos_job_fini(struct ethos_device *dev);
> +int ethos_job_open(struct ethos_file_priv *ethos_priv);
> +void ethos_job_close(struct ethos_file_priv *ethos_priv);
> +int ethos_job_is_idle(struct ethos_device *dev);
> +
> +#endif
> diff --git a/include/uapi/drm/ethos_accel.h b/include/uapi/drm/ethos_accel.h
> new file mode 100644
> index 000000000000..43f8d86c70b5
> --- /dev/null
> +++ b/include/uapi/drm/ethos_accel.h
> @@ -0,0 +1,262 @@
> +/* SPDX-License-Identifier: MIT */
> +/* Copyright (C) 2025 Arm, Ltd. */
> +#ifndef _ETHOS_DRM_H_
> +#define _ETHOS_DRM_H_
> +
> +#include "drm.h"
> +
> +#if defined(__cplusplus)
> +extern "C" {
> +#endif
> +
> +/**
> + * DOC: IOCTL IDs
> + *
> + * enum drm_ethos_ioctl_id - IOCTL IDs
> + *
> + * Place new ioctls at the end, don't re-order, don't replace or remove entries.
> + *
> + * These IDs are not meant to be used directly. Use the DRM_IOCTL_ETHOS_xxx
> + * definitions instead.
> + */
> +enum drm_ethos_ioctl_id {
> +	/** @DRM_ETHOS_DEV_QUERY: Query device information. */
> +	DRM_ETHOS_DEV_QUERY = 0,
> +
> +	/** @DRM_ETHOS_BO_CREATE: Create a buffer object. */
> +	DRM_ETHOS_BO_CREATE,
> +
> +	/** @DRM_ETHOS_BO_WAIT: Wait on a buffer object's fence. */
> +	DRM_ETHOS_BO_WAIT,
> +
> +	/**
> +	 * @DRM_ETHOS_BO_MMAP_OFFSET: Get the file offset to pass to
> +	 * mmap to map a GEM object.
> +	 */
> +	DRM_ETHOS_BO_MMAP_OFFSET,
> +
> +	/**
> +	 * @DRM_ETHOS_CMDSTREAM_BO_CREATE: Create a command stream buffer
> +	 * object.
> +	 */
> +	DRM_ETHOS_CMDSTREAM_BO_CREATE,
> +
> +	/** @DRM_ETHOS_SUBMIT: Submit a job and BOs to run. */
> +	DRM_ETHOS_SUBMIT,
> +};
> +
> +/**
> + * DOC: IOCTL arguments
> + */
> +
> +/**
> + * enum drm_ethos_dev_query_type - Query type
> + *
> + * Place new types at the end, don't re-order, don't remove or replace.
> + */
> +enum drm_ethos_dev_query_type {
> +	/** @DRM_ETHOS_DEV_QUERY_NPU_INFO: Query NPU information. */
> +	DRM_ETHOS_DEV_QUERY_NPU_INFO = 0,
> +};
> +
> +/**
> + * struct drm_ethos_gpu_info - NPU information
> + *
> + * Structure grouping all queryable information relating to the NPU.
> + */
> +struct drm_ethos_npu_info {
> +	/** @id : NPU ID. */
> +	__u32 id;
> +#define DRM_ETHOS_ARCH_MAJOR(x)			((x) >> 28)
> +#define DRM_ETHOS_ARCH_MINOR(x)			(((x) >> 20) & 0xff)
> +#define DRM_ETHOS_ARCH_PATCH(x)			(((x) >> 16) & 0xf)
> +#define DRM_ETHOS_PRODUCT_MAJOR(x)		(((x) >> 12) & 0xf)
> +#define DRM_ETHOS_VERSION_MAJOR(x)		(((x) >> 8) & 0xf)
> +#define DRM_ETHOS_VERSION_MINOR(x)		(((x) >> 4) & 0xff)
> +#define DRM_ETHOS_VERSION_STATUS(x)		((x) & 0xf)
> +
> +	/** @gpu_rev: GPU revision. */
> +	__u32 config;
> +
> +	__u32 sram_size;
> +};
> +/**
> + * struct drm_ethos_dev_query - Arguments passed to DRM_ETHOS_IOCTL_DEV_QUERY
> + */
> +struct drm_ethos_dev_query {
> +	/** @type: the query type (see drm_ethos_dev_query_type). */
> +	__u32 type;
> +
> +	/**
> +	 * @size: size of the type being queried.
> +	 *
> +	 * If pointer is NULL, size is updated by the driver to provide the
> +	 * output structure size. If pointer is not NULL, the driver will
> +	 * only copy min(size, actual_structure_size) bytes to the pointer,
> +	 * and update the size accordingly. This allows us to extend query
> +	 * types without breaking userspace.
> +	 */
> +	__u32 size;
> +
> +	/**
> +	 * @pointer: user pointer to a query type struct.
> +	 *
> +	 * Pointer can be NULL, in which case, nothing is copied, but the
> +	 * actual structure size is returned. If not NULL, it must point to
> +	 * a location that's large enough to hold size bytes.
> +	 */
> +	__u64 pointer;
> +};
> +
> +/**
> + * enum drm_ethos_bo_flags - Buffer object flags, passed at creation time.
> + */
> +enum drm_ethos_bo_flags {
> +	/**
> +	 * @DRM_ETHOS_BO_NO_MMAP: The buffer object will never be CPU-mapped
> +	 * in userspace.
> +	 */
> +	DRM_ETHOS_BO_NO_MMAP = (1 << 0),
> +};
> +
> +/**
> + * struct drm_ethos_bo_create - Arguments passed to DRM_IOCTL_ETHOS_BO_CREATE.
> + */
> +struct drm_ethos_bo_create {
> +	/**
> +	 * @size: Requested size for the object
> +	 *
> +	 * The (page-aligned) allocated size for the object will be returned.
> +	 */
> +	__u64 size;
> +
> +	/**
> +	 * @flags: Flags. Must be a combination of drm_ethos_bo_flags flags.
> +	 */
> +	__u32 flags;
> +
> +	/**
> +	 * @handle: Returned handle for the object.
> +	 *
> +	 * Object handles are nonzero.
> +	 */
> +	__u32 handle;
> +};
> +
> +/**
> + * struct drm_ethos_bo_mmap_offset - Arguments passed to DRM_IOCTL_ETHOS_BO_MMAP_OFFSET.
> + */
> +struct drm_ethos_bo_mmap_offset {
> +	/** @handle: Handle of the object we want an mmap offset for. */
> +	__u32 handle;
> +
> +	/** @pad: MBZ. */
> +	__u32 pad;
> +
> +	/** @offset: The fake offset to use for subsequent mmap calls. */
> +	__u64 offset;
> +};
> +
> +/**
> + * struct drm_ethos_wait_bo - ioctl argument for waiting for
> + * completion of the last DRM_ETHOS_SUBMIT on a BO.
> + *
> + * This is useful for cases where multiple processes might be
> + * rendering to a BO and you want to wait for all rendering to be
> + * completed.
> + */
> +struct drm_ethos_bo_wait {
> +	__u32 handle;
> +	__u32 pad;
> +	__s64 timeout_ns;	/* absolute */
> +};
> +
> +
> +struct drm_ethos_cmdstream_bo_create {
> +	/* Size of the data argument. */
> +	__u32 size;
> +
> +	/* Flags, currently must be 0. */
> +	__u32 flags;
> +
> +	/* Pointer to the data. */
> +	__u64 data;
> +
> +	/** Returned GEM handle for the BO. */
> +	__u32 handle;
> +
> +	/* Pad, must be 0. */
> +	__u32 pad;
> +};
> +
> +/**
> + * struct drm_ethos_job - A job to be run on the NPU
> + *
> + * The kernel will schedule the execution of this job taking into account its
> + * dependencies with other jobs. All tasks in the same job will be executed
> + * sequentially on the same core, to benefit from memory residency in SRAM.
> + */
> +struct drm_ethos_job {
> +	/** Input: BO handle for cmdstream. */
> +	__u32 cmd_bo;
> +
> +	/** Input: Amount of SRAM to use. */
> +	__u32 sram_size;
> +
> +#define ETHOS_MAX_REGIONS	8
> +	/** Input: Array of BO handles for each region. */
> +	__u32 region_bo_handles[ETHOS_MAX_REGIONS];
> +};
> +
> +/**
> + * struct drm_ethos_submit - ioctl argument for submitting commands to the NPU.
> + *
> + * The kernel will schedule the execution of these jobs in dependency order.
> + */
> +struct drm_ethos_submit {
> +	/** Input: Pointer to an array of struct drm_ethos_job. */
> +	__u64 jobs;
> +
> +	/** Input: Number of jobs passed in. */
> +	__u32 job_count;
> +
> +	/** Reserved, must be zero. */
> +	__u32 pad;
> +};
> +
> +
> +/**
> + * DRM_IOCTL_ETHOS() - Build a ethos IOCTL number
> + * @__access: Access type. Must be R, W or RW.
> + * @__id: One of the DRM_ETHOS_xxx id.
> + * @__type: Suffix of the type being passed to the IOCTL.
> + *
> + * Don't use this macro directly, use the DRM_IOCTL_ETHOS_xxx
> + * values instead.
> + *
> + * Return: An IOCTL number to be passed to ioctl() from userspace.
> + */
> +#define DRM_IOCTL_ETHOS(__access, __id, __type) \
> +	DRM_IO ## __access(DRM_COMMAND_BASE + DRM_ETHOS_ ## __id, \
> +			   struct drm_ethos_ ## __type)
> +
> +enum {
> +	DRM_IOCTL_ETHOS_DEV_QUERY =
> +		DRM_IOCTL_ETHOS(WR, DEV_QUERY, dev_query),
> +	DRM_IOCTL_ETHOS_BO_CREATE =
> +		DRM_IOCTL_ETHOS(WR, BO_CREATE, bo_create),
> +	DRM_IOCTL_ETHOS_BO_WAIT =
> +		DRM_IOCTL_ETHOS(WR, BO_WAIT, bo_wait),
> +	DRM_IOCTL_ETHOS_BO_MMAP_OFFSET =
> +		DRM_IOCTL_ETHOS(WR, BO_MMAP_OFFSET, bo_mmap_offset),
> +	DRM_IOCTL_ETHOS_CMDSTREAM_BO_CREATE =
> +		DRM_IOCTL_ETHOS(WR, CMDSTREAM_BO_CREATE, cmdstream_bo_create),
> +	DRM_IOCTL_ETHOS_SUBMIT =
> +		DRM_IOCTL_ETHOS(WR, SUBMIT, submit),
> +};
> +
> +#if defined(__cplusplus)
> +}
> +#endif
> +
> +#endif /* _ETHOS_DRM_H_ */
>

-- 
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ