lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <cover.1753449448.git.lukas.zapolskas@arm.com>
Date: Fri, 25 Jul 2025 15:57:51 +0100
From: Lukas Zapolskas <lukas.zapolskas@....com>
To: dri-devel@...ts.freedesktop.org
Cc: nd@....com,
	Adrián Larumbe <adrian.larumbe@...labora.com>,
	Boris Brezillon <boris.brezillon@...labora.com>,
	Steven Price <steven.price@....com>,
	Liviu Dudau <liviu.dudau@....com>,
	Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
	Maxime Ripard <mripard@...nel.org>,
	Thomas Zimmermann <tzimmermann@...e.de>,
	David Airlie <airlied@...il.com>,
	Simona Vetter <simona@...ll.ch>,
	linux-kernel@...r.kernel.org,
	Lukas Zapolskas <lukas.zapolskas@....com>
Subject: [PATCH v5 0/7] Performance counter implementation with single manual client support

Hello,

This patch set implements initial support for performance counter
sampling in Panthor, as a follow-up for Adrián Larumbe's patch
set [1]. This version of the patch series fixes a number of issues,
including FW ring buffer wrapping and IRQ handling for the
performance counter IRQs. The size of the sample is also added
to the uAPI, allowing for the PERF_INFO DEV_QUERY to be sufficient
to handle backwards and forwards compatibility of the interface.
The Mesa implementation is also now available [2].

Existing performance counter workflows, such as those in game
engines, and user-space power models/governor implementations
require the ability to simultaneously obtain counter data. The
hardware and firmware interfaces support a single global
configuration, meaning the kernel must allow for the multiplexing.
It is also in the best position to supplement the counter data
with contextual information about elapsed sampling periods,
information on the power state transitions undergone during
the sampling period, and cycles elapsed on specific clocks chosen
by the integrator.

Each userspace client creates a session, providing an enable
mask of counter values it requires, a BO for a ring buffer,
and a separate BO for the insert and extract indices, along with
an eventfd to signal counter capture, all of which are kept fixed
for the lifetime of the session. When emitting a sample for a
session, counters that were not requested are stripped out,
and non-counter information needed to interpret counter values
is added to either the sample header, or the block header,
which are stored in-line with the counter values in the sample.

The proposed uAPI specifies two major sources of supplemental
information:
- coarse-grained block state transitions are provided on newer
  FW versions which support the metadata block, a FW-provided
  counter block which indicates the reason a sample was taken
  when entering or exiting a non-counting region, or when a
  shader core has powered down.
- clock cycles elapsed over the sampling period and
  clocks associated with a particular block. This is
  because the clock assignments depend on the system
  integration, and are needed to normalize counters
  representing clock values.

All of the sessions are then aggregated by the sampler, which
handles the programming of the FW interface and subsequent
handling of the samples coming from FW.

v5:
- Started re-using perf info size fields instead of
  recomputing the size where needed
- Removed panthor_file pointer to the drm_file
- Fixed ordering of subsystem unplug on init failure
- Using the kernel struct size to allocate memory for user-passed
  uAPI struct.
- Inlined panthor_perf_sampler_{suspend,resume} into
  panthor_perf_{suspend,resume}
- Inlined
- Updated all callers of CIRC_SPACE_TO_END to use CIRC_SPACE
  for correct ring buffer wraparound.
- Free the session and sampler enable maps on termination
- Drop the return values from panthor_perf_{suspend,resume}
- Update userdata and end timestamp on accumulation
- Removed the ptdev checks on suspend and resume paths
- Link to v4: https://lore.kernel.org/dri-devel/cover.1747148172.git.lukas.zapolskas@arm.com/

v4:
- Added sample size to the uAPI.
- Clarified the bit-to-counter mapping for enable masks.
- Fixed IRQ handling: the PERFCNT_THRESHOLD and PERFCNT_OVERFLOW
  interrupts can be handled by checking the difference between the
  REQ and ACK bits, whereas PERFCNT_SAMPLE needs external data to
  validate.
- FW ring buffer indices are now only wrapped when reading the buffer
  and are otherwise left in their pre-wrapped form.
- Accumulation index is now bumped after the first copy.
- All insert and extract index reads now use the proper, full-width
  type.
- L2 slices are now computed via a macro to extract the relevant
  bits from the MEM_FEATURES register. This macro was moved from
  the uAPI due to changes in the register making it unstable.
- Consistently take the sampler lock to check if a sample has been
  requested.
- Link to v3: https://lore.kernel.org/dri-devel/cover.1743517880.git.lukas.zapolskas@arm.com/

v3:
- Fixed offset issues into FW ring buffer
- Fixed sparse shader core handling
- Added pre- and post- reset handlers
- Added module param to control size of FW ring buffer
- Clarified naming on sampler functions
- Added error logging for PERF_SETUP
- Link to v2: https://lore.kernel.org/dri-devel/20241211165024.490748-1-lukas.zapolskas@arm.com/

RFC v2:
- Link to v1: https://lore.kernel.org/lkml/20240305165820.585245-1-adrian.larumbe@collabora.com/T/#m67d1f89614fe35dc0560e8304d6731eb1a6942b6

[1]: https://lore.kernel.org/lkml/20240305165820.585245-1-adrian.larumbe@collabora.com/T/#m67d1f89614fe35dc0560e8304d6731eb1a6942b6
[2]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35024

base commit: e48123c607a0


Adrián Larumbe (1):
  drm/panthor: Implement the counter sampler and sample handling

Lukas Zapolskas (6):
  drm/panthor: Add performance counter uAPI
  drm/panthor: Add DEV_QUERY.PERF_INFO handling for Gx10
  drm/panthor: Add panthor perf initialization and termination
  drm/panthor: Introduce sampling sessions to handle userspace clients
  drm/panthor: Add suspend, resume and reset handling
  drm/panthor: Expose the panthor perf ioctls

 drivers/gpu/drm/panthor/Makefile         |    1 +
 drivers/gpu/drm/panthor/panthor_device.c |   16 +-
 drivers/gpu/drm/panthor/panthor_device.h |    8 +-
 drivers/gpu/drm/panthor/panthor_drv.c    |  150 +-
 drivers/gpu/drm/panthor/panthor_fw.c     |    6 +
 drivers/gpu/drm/panthor/panthor_fw.h     |    9 +-
 drivers/gpu/drm/panthor/panthor_perf.c   | 1969 ++++++++++++++++++++++
 drivers/gpu/drm/panthor/panthor_perf.h   |   40 +
 drivers/gpu/drm/panthor/panthor_regs.h   |    1 +
 include/uapi/drm/panthor_drm.h           |  565 +++++++
 10 files changed, 2760 insertions(+), 5 deletions(-)
 create mode 100644 drivers/gpu/drm/panthor/panthor_perf.c
 create mode 100644 drivers/gpu/drm/panthor/panthor_perf.h


base-commit: e48123c607a0db8b9ad02f83c8c3d39918dbda06
--
2.33.0.dirty


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ