linux-kernel - Re: [PATCH v3] drm: Add UAPI for the Asahi driver

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z-B6uc7EEAdBPXmt@blossom>
Date: Sun, 23 Mar 2025 17:18:49 -0400
From: Alyssa Rosenzweig <alyssa@...enzweig.io>
To: Faith Ekstrand <faith.ekstrand@...labora.com>
Cc: David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
	Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
	Maxime Ripard <mripard@...nel.org>,
	Thomas Zimmermann <tzimmermann@...e.de>,
	Miguel Ojeda <ojeda@...nel.org>,
	Alex Gaynor <alex.gaynor@...il.com>,
	Boqun Feng <boqun.feng@...il.com>, Gary Guo <gary@...yguo.net>,
	"Björn Roy Baron" <bjorn3_gh@...tonmail.com>,
	Benno Lossin <benno.lossin@...ton.me>,
	Andreas Hindborg <a.hindborg@...nel.org>,
	Alice Ryhl <aliceryhl@...gle.com>, Trevor Gross <tmgross@...ch.edu>,
	Janne Grunau <j@...nau.net>, Sven Peter <sven@...npeter.dev>,
	Jonathan Corbet <corbet@....net>,
	Sergio Lopez Pascual <slp@...rega.org>,
	Ryan Houdek <sonicadvance1@...il.com>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	dri-devel <dri-devel@...ts.freedesktop.org>,
	rust-for-linux <rust-for-linux@...r.kernel.org>,
	asahi <asahi@...ts.linux.dev>,
	linux-arm-kernel <linux-arm-kernel@...ts.infradead.org>,
	linux-doc <linux-doc@...r.kernel.org>,
	Asahi Lina <lina@...hilina.net>
Subject: Re: [PATCH v3] drm: Add UAPI for the Asahi driver

>  >  > +/** 
>  >  > + * enum drm_asahi_bind_op - Bind operation 
>  >  > + */ 
>  >  > +enum drm_asahi_bind_op { 
>  >  > +    /** @DRM_ASAHI_BIND_OP_BIND: Bind a BO to a GPU VMA range */ 
>  >  > +    DRM_ASAHI_BIND_OP_BIND = 0, 
>  >  > + 
>  >  > +    /** @DRM_ASAHI_BIND_OP_UNBIND: Unbind a GPU VMA range */ 
>  >  > +    DRM_ASAHI_BIND_OP_UNBIND = 1, 
>  >  > + 
>  >  > +    /** @DRM_ASAHI_BIND_OP_UNBIND_ALL: Unbind all mappings of a given BO */ 
>  >  > +    DRM_ASAHI_BIND_OP_UNBIND_ALL = 2, 
> 
> Do you use this? We don't have it in nouveau and NVK gets by fine. Or does the asahi kernel do something where it expects you to unbind everything before the buffer is really destroyed? I think I remember talking to Lina about this a while ago but I don't remember the details.

We do not use it and I don't know why it's here either. In fact,
drm/asahi does an unbind_all equivalently when closing a GEM
handle, so should be definitely ok without.

Dropped in v4, thanks.

>  >  > +    /** 
>  >  > +     * @DRM_ASAHI_BIND_SINGLE_PAGE: Map a single page of the BO repeatedly 
>  >  > +     * across the VA range. 
>  >  > +     * 
>  >  > +     * This is useful to fill a VA range with scratch pages or zero pages. 
>  >  > +     * It is intended as a mechanism to accelerate sparse. 
>  >  > +     */ 
>  >  > +    DRM_ASAHI_BIND_SINGLE_PAGE = (1L << 2), 
> 
> Does this require the BO to be a single page? If so, does it require offset==0? Or does it just take whatever page is at the specified offset?

I believe the intention is that it takes whatever page is at the
specified offset and just maps that a bunch of times. HK doesn't use
this yet though it probably should (this was added to help reduce
overhead when emulating sparse with scratch/zero pages, which is still
very new functionality in hk).

Accelerating this properly involves GPUVM patches - although even without
that, moving the loop into the kernel so it's only a single ioctl
(user-kernel roundtrip) seems worth keeping the flag for.

Added comments in v4.

>  >  > +    /** @object_handle: Object handle (out for BIND, in for UNBIND) */ 
>  >  > +    __u32 object_handle; 
> 
> How is this different from the GEM handle? I mean, I know it's different, but What is this handle for? Just a thing we can pass in later?

Yes, this is just a handle that's passed with the submit, see the
comment in drm_asahi_timestamp.

>  >  > +    /** @priority: Queue priority, 0-3 */ 
>  >  > +    __u32 priority; 
> 
> Is one of these priorities REALTIME and only usable by privileged apps? If so, maybe document that and/or have an enum?

Added an enum, thanks.

I haven't actually implemented the priority check because that means
even more rust bindings, and I don't think it's actually a uAPI
regression to tighten the permissions later now that I've documented
that we may do so.

>  >  > +    /** 
>  >  > +     * @usc_exec_base: GPU base address for all USC binaries (shaders) on 
>  >  > +     * this queue. USC addresses are 32-bit relative to this 64-bit base. 
>  >  > +     * 
>  >  > +     * This sets the following registers on all queue commands: 
>  >  > +     * 
>  >  > +     *    USC_EXEC_BASE_TA  (vertex) 
>  >  > +     *    USC_EXEC_BASE_ISP (fragment) 
>  >  > +     *    USC_EXEC_BASE_CP  (compute) 
>  >  > +     * 
>  >  > +     * While the hardware lets us configure these independently per command, 
>  >  > +     * we do not have a use case for this. Instead, we expect userspace to 
>  >  > +     * fix a 4GiB VA carveout for USC memory and pass its base address here. 
>  >  > +     */ 
>  >  > +    __u64 usc_exec_base; 
> 
> I mean, you could have a command for this or or something but meh. That can be an extension on top of the current UAPI later if it's ever needed.

Yep, and I really cannot fathom a use case for doing this at
finer-than-queue granularity.

>  >  > +    /** 
>  >  > +     * @barriers: Array of command indices per subqueue to wait on. 
>  >  > +     * 
>  >  > +     * Barriers are indices relative to the beginning of a given submit. A 
>  >  > +     * barrier of 0 waits on commands submitted to the subqueue in previous 
>  >  > +     * submit ioctls. A barrier of N waits on N previous commands on the 
>  >  > +     * subqueue within the current submit ioctl. As a special case, passing 
>  >  > +     * @DRM_ASAHI_BARRIER_NONE avoids waiting on any commands in the 
>  >  > +     * subqueue. 
>  >  > +     * 
>  >  > +     * Examples: 
>  >  > +     * 
>  >  > +     *   (0, 0): This waits on all previous work. 
>  >  > +     * 
>  >  > +     *   (NONE, 0): This waits on previously submitted compute commands but 
>  >  > +     *   does not wait on any render commands. 
>  >  > +     * 
>  >  > +     *   (1, NONE): This waits on the first render command in the submit. 
>  >  > +     *   This only makes sense if there are multiple render commands in the 
>  >  > +     *   same submit. 
>  >  > +     * 
>  >  > +     * Barriers only make sense for hardware commands. Synthetic software 
>  >  > +     * commands to set attachments must pass (NONE, NONE) here. 
>  >  > +     */ 
>  >  > +    __u16 barriers[DRM_ASAHI_SUBQUEUE_COUNT]; 
> 
> I'm not sure how good of an idea this is. You said in the comment above that SUBQUEUE_COUNT must be a power of 2. However, once you use it to size an array in the command header, it can never change ever. I'm not sure what to do about that. The command header being 8B is kinda nice... But also, will we ever need more than 2? I'd hate to have to change the size of the header.
> 
> Another option would be to potentially have a barrier command which would then be extensible but that sounds kinda annoying.

I think the mistake here is making this an array instead of just
`vdm_barrier`, `cdm_barrier` fields. It will never be not-2, at least
not without such large hardware changes that we'd be due for a refresh
of this uAPI anyway.

I don't love the idea of the extra command, adds a lot more
complexity/overhead for a hard-to-fathom theoretical future hw issue
(that we could address with a drm_asahi_cmd_header_m7 if we
need that.)

Addressed in v4.

>  >  > +/** 
>  >  > + * struct drm_asahi_timestamp - Describe a timestamp write. 
>  >  > + * 
>  >  > + * The firmware can optionally write the GPU timestamp at render pass 
>  >  > + * granularities, but it needs to be mapped specially via 
>  >  > + * DRM_IOCTL_ASAHI_GEM_BIND_OBJECT. This structure therefore describes where to 
>  >  > + * write as a handle-offset pair, rather than a GPU address like normal. 
> 
> Given that this struct is embedded in other structs, it might be worth a comment saying it can never be extended without breaking those structs.

Done (for each of the places mentioned).

>  >  > +struct drm_asahi_helper_program { 
>  >  > +    /** 
>  >  > +     * @binary: USC address to the helper program binary. This is a tagged 
>  >  > +     * pointer with configuration in the bottom bits. 
>  >  > +     */ 
>  >  > +    __u32 binary; 
>  >  > + 
>  >  > +    /** @cfg: Configuration bits for the helper program. */ 
>  >  > +    __u32 cfg; 
> 
> There's configuratin bits here and in the binary pointer abov?

Yes. Not sure what the different bits mean exactly. My guess is that the
binary pointer is tagged as "enable the helper prog?".

The one known @cfg bit is when the helper program is needed within a
preamble shader, i.e. if the preamble spills.

> Woo! I made it to the end. I think that's all for now. I mostly asked a lot of questions.

Hooray! Thank you so much for reviewing!