lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20260107163250.672bcd35@fedora>
Date: Wed, 7 Jan 2026 16:32:50 +0100
From: Boris Brezillon <boris.brezillon@...labora.com>
To: Lukas Zapolskas <lukas.zapolskas@....com>
Cc: Liviu Dudau <liviu.dudau@....com>, Maarten Lankhorst
 <maarten.lankhorst@...ux.intel.com>, Maxime Ripard <mripard@...nel.org>,
 Thomas Zimmermann <tzimmermann@...e.de>, David Airlie <airlied@...il.com>,
 Simona Vetter <simona@...ll.ch>, Adrián Larumbe
 <adrian.larumbe@...labora.com>, nd@....com,
 dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org, Mihail
 Atanassov <mihail.atanassov@....com>
Subject: Re: [PATCH v6 1/7] drm/panthor: Add performance counter uAPI

Hi Lukas,

On Wed, 7 Jan 2026 15:13:38 +0000
Lukas Zapolskas <lukas.zapolskas@....com> wrote:

> Hello Boris,
> 
> On 17/12/2025 14:37, Boris Brezillon wrote:
> > Hi Lukas,
> > 
> > On Mon, 15 Dec 2025 17:14:47 +0000
> > Lukas Zapolskas <lukas.zapolskas@....com> wrote:
> >   
> >> This patch extends the DEV_QUERY ioctl to return information about the
> >> performance counter setup for userspace, and introduces the new
> >> ioctl DRM_PANTHOR_PERF_CONTROL in order to allow for the sampling of
> >> performance counters.
> >>
> >> The new design is inspired by the perf aux ringbuffer [0], with the
> >> insert and extract indices being mapped to userspace, allowing
> >> multiple samples to be exposed at any given time. To avoid pointer
> >> chasing, the sample metadata and block metadata are inline with
> >> the elements they describe.
> >>
> >> Userspace is responsible for passing in resources for samples to be
> >> exposed, including the event file descriptor for notification of new
> >> sample availability, the ringbuffer BO to store samples, and the
> >> control BO along with the offset for mapping the insert and extract
> >> indices. Though these indices are only a total of 8 bytes, userspace
> >> can then reuse the same physical page for tracking the state of
> >> multiple buffers by giving different offsets from the BO start to
> >> map them.
> >>
> >> [0]: https://docs.kernel.org/userspace-api/perf_ring_buffer.html
> >>
> >> Co-developed-by: Mihail Atanassov <mihail.atanassov@....com>
> >> Signed-off-by: Mihail Atanassov <mihail.atanassov@....com>
> >> Signed-off-by: Lukas Zapolskas <lukas.zapolskas@....com>
> >> Reviewed-by: Adrián Larumbe <adrian.larumbe@...labora.com>
> >> ---
> >>  include/uapi/drm/panthor_drm.h | 565 +++++++++++++++++++++++++++++++++
> >>  1 file changed, 565 insertions(+)
> >>
> >> diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
> >> index e238c6264fa1..d1a92172e878 100644
> >> --- a/include/uapi/drm/panthor_drm.h
> >> +++ b/include/uapi/drm/panthor_drm.h
> >> @@ -154,6 +154,9 @@ enum drm_panthor_ioctl_id {
> >>  	 * This is useful for imported BOs.
> >>  	 */
> >>  	DRM_PANTHOR_BO_QUERY_INFO,
> >> +
> >> +	/** @DRM_PANTHOR_PERF_CONTROL: Control a performance counter session. */
> >> +	DRM_PANTHOR_PERF_CONTROL,
> >>  };
> >>  
> >>  /**
> >> @@ -253,6 +256,9 @@ enum drm_panthor_dev_query_type {
> >>  	 * @DRM_PANTHOR_DEV_QUERY_GROUP_PRIORITIES_INFO: Query allowed group priorities information.
> >>  	 */
> >>  	DRM_PANTHOR_DEV_QUERY_GROUP_PRIORITIES_INFO,
> >> +
> >> +	/** @DRM_PANTHOR_DEV_QUERY_PERF_INFO: Query performance counter interface information. */
> >> +	DRM_PANTHOR_DEV_QUERY_PERF_INFO,
> >>  };
> >>  
> >>  /**
> >> @@ -445,6 +451,135 @@ struct drm_panthor_group_priorities_info {
> >>  	__u8 pad[3];
> >>  };
> >>  
> >> +/**
> >> + * enum drm_panthor_perf_feat_flags - Performance counter configuration feature flags.
> >> + */
> >> +enum drm_panthor_perf_feat_flags {
> >> +	/** @DRM_PANTHOR_PERF_BLOCK_STATES_SUPPORT: Coarse-grained block states are supported. */
> >> +	DRM_PANTHOR_PERF_BLOCK_STATES_SUPPORT = 1 << 0,
> >> +};
> >> +
> >> +/**
> >> + * enum drm_panthor_perf_block_type - Performance counter supported block types.
> >> + */
> >> +enum drm_panthor_perf_block_type {
> >> +	/** @DRM_PANTHOR_PERF_BLOCK_METADATA: Internal use only. */
> >> +	DRM_PANTHOR_PERF_BLOCK_METADATA = 0,
> >> +
> >> +	/** @DRM_PANTHOR_PERF_BLOCK_FW: The FW counter block. */
> >> +	DRM_PANTHOR_PERF_BLOCK_FW,
> >> +
> >> +	/** @DRM_PANTHOR_PERF_BLOCK_CSHW: The CSHW counter block. */
> >> +	DRM_PANTHOR_PERF_BLOCK_CSHW,
> >> +
> >> +	/** @DRM_PANTHOR_PERF_BLOCK_TILER: The tiler counter block. */
> >> +	DRM_PANTHOR_PERF_BLOCK_TILER,
> >> +
> >> +	/** @DRM_PANTHOR_PERF_BLOCK_MEMSYS: A memsys counter block. */
> >> +	DRM_PANTHOR_PERF_BLOCK_MEMSYS,
> >> +
> >> +	/** @DRM_PANTHOR_PERF_BLOCK_SHADER: A shader core counter block. */
> >> +	DRM_PANTHOR_PERF_BLOCK_SHADER,
> >> +
> >> +	/** @DRM_PANTHOR_PERF_BLOCK_FIRST: Internal use only. */
> >> +	DRM_PANTHOR_PERF_BLOCK_FIRST = DRM_PANTHOR_PERF_BLOCK_FW,
> >> +
> >> +	/** @DRM_PANTHOR_PERF_BLOCK_LAST: Internal use only. */
> >> +	DRM_PANTHOR_PERF_BLOCK_LAST = DRM_PANTHOR_PERF_BLOCK_SHADER,
> >> +
> >> +	/** @DRM_PANTHOR_PERF_BLOCK_MAX: Internal use only. */
> >> +	DRM_PANTHOR_PERF_BLOCK_MAX = DRM_PANTHOR_PERF_BLOCK_LAST + 1,
> >> +};  
> > 
> > I'd really prefer if we were not exposing block types as uAPI if those
> > are not truly needed for the UMD/KMD to agree on things. The counter
> > block knowledge exists in userspace (because it has to if we want to
> > attach meaning to counters), and I don't really see the need to
> > standardize it here. In my experience, any definition that's not
> > absolutely required might become a liability at some point. In that
> > case, I can already imagine new GPUs shuffling the block IDs, getting
> > rid of some, adding new ones, ... If we have to accommodate the enum
> > for those changes it will become a mess. On the other hand, if we make
> > the block ID an opaque u8, it just becomes HW knowledge that the
> > UMD/perfcnt lib has already (GPU_ID, plus other PERFCNT specific dev
> > queries if some stuff are implementation-defined).
> >   
> 
> These IDs are not being provided from the HW, but rather attached to the segment 
> in the kernel. Identifying the blocks in userspace was much easier in JM, since 
> they were more or less fixed in the layout. In CSF, on the other hand, the layout
> can be a lot more dynamic, and it's not always obvious from the buffer alone 
> whether a particular block type is available or not. It would require exposing 
> more of the FW values directly to the user.

Can you expand a bit here? What kind of FW values? I was assuming
counters would always be HW counters and depend on the GPU arch
major/minor plus maybe some optional features. Is the FW inserting some
SW-based counters? If that's the case, I'd still prefer to expose FW
interface versions and let the userside lib figure out where each block
is.

> 
> >> +
> >> +/**
> >> + * enum drm_panthor_perf_clock - Identifier of the clock used to produce the cycle count values
> >> + * in a given block.
> >> + *
> >> + * Since the integrator has the choice of using one or more clocks, there may be some confusion
> >> + * as to which blocks are counted by which clock values unless this information is explicitly
> >> + * provided as part of every block sample. Not every single clock here can be used: in the simplest
> >> + * case, all cycle counts will be associated with the top-level clock.
> >> + */
> >> +enum drm_panthor_perf_clock {
> >> +	/** @DRM_PANTHOR_PERF_CLOCK_TOPLEVEL: Top-level CSF clock. */
> >> +	DRM_PANTHOR_PERF_CLOCK_TOPLEVEL,
> >> +
> >> +	/**
> >> +	 * @DRM_PANTHOR_PERF_CLOCK_COREGROUP: Core group clock, responsible for the MMU, L2
> >> +	 * caches and the tiler.
> >> +	 */
> >> +	DRM_PANTHOR_PERF_CLOCK_COREGROUP,
> >> +
> >> +	/** @DRM_PANTHOR_PERF_CLOCK_SHADER: Clock for the shader cores. */
> >> +	DRM_PANTHOR_PERF_CLOCK_SHADER,
> >> +};
> >> +
> >> +/**
> >> + * struct drm_panthor_perf_info - Performance counter interface information
> >> + *
> >> + * Structure grouping all queryable information relating to the performance counter
> >> + * interfaces.
> >> + */
> >> +struct drm_panthor_perf_info {
> >> +	/**
> >> +	 * @counters_per_block: The number of 8-byte counters available in a block.
> >> +	 */
> >> +	__u32 counters_per_block;
> >> +
> >> +	/**
> >> +	 * @sample_header_size: The size of the header struct available at the beginning
> >> +	 * of every sample.
> >> +	 */
> >> +	__u32 sample_header_size;
> >> +
> >> +	/**
> >> +	 * @block_header_size: The size of the header struct inline with the counters for a
> >> +	 * single block.
> >> +	 */
> >> +	__u32 block_header_size;  
> > 
> > Are those things not directly deducible from the arch major/minor? If
> > those things are implementation-defined, I guess that's fine to expose
> > them, but otherwise I'd rely on the knowledge that exists in the UMD.
> >   
> 
> They are implementation-defined, so the sizes may be the same for several different arch major/minors
> and then change for all of them.

Can you be more specific? Which implementation are we talking about?
The FW implementation, or some fixed HW function that might or might
not be present? Actually, let's take a step back, can you explain where
those counters come from? I was assuming those were HW counters that
were simply forwarded by the FW. Are you saying the FW is more than
just a dummy proxy?

> 
> >> +
> >> +	/**
> >> +	 * @sample_size: The size of a fully annotated sample, starting with a sample header
> >> +	 *               of size @sample_header_size bytes, and all available blocks for the current
> >> +	 *               configuration, each comprised of @counters_per_block 64-bit counters and
> >> +	 *               a block header of @block_header_size bytes.  
> > 
> > Let's keep the kernel doc formatting consistent and drop the alignment
> > on the field name (IIRC, it also generate weird indentation in the
> > final htmldoc if we do that.
> >   
> 
> Will do! 
> 
> >> +	 *
> >> +	 *               The user must use this field to allocate size for the ring buffer. In
> >> +	 *               the case of new blocks being added, an old userspace can always use
> >> +	 *               this field and ignore any blocks it does not know about.
> >> +	 */
> >> +	__u32 sample_size;  
> > 
> > Same thing for the sample_size, it looks like something the UMD should
> > know already, given a specific config.
> >   
> 
> Not necessarily. One of the use-cases we have is libGPUCounters[0], which gets embedded in applications
> at a particular version with the expectation of it functioning in a forwards compatible fashion, i.e., 
> running the application with an old version of the library against a new KMD. In that case, the UMD
> cannot infer the size of the sample purely from the fields that were previously exposed to the UMD.

That's where I'm lost. Why would the HW counter layout change based on
a KMD version? Feels like you're treating HW counters as a SW concept,
which is somewhat confusing to me. Maybe the answer to my previous
question will answer that.

Regards,

Boris

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ