[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251029162118.40604-1-tony.luck@intel.com>
Date: Wed, 29 Oct 2025 09:20:43 -0700
From: Tony Luck <tony.luck@...el.com>
To: Fenghua Yu <fenghuay@...dia.com>,
	Reinette Chatre <reinette.chatre@...el.com>,
	Maciej Wieczor-Retman <maciej.wieczor-retman@...el.com>,
	Peter Newman <peternewman@...gle.com>,
	James Morse <james.morse@....com>,
	Babu Moger <babu.moger@....com>,
	Drew Fustini <dfustini@...libre.com>,
	Dave Martin <Dave.Martin@....com>,
	Chen Yu <yu.c.chen@...el.com>
Cc: x86@...nel.org,
	linux-kernel@...r.kernel.org,
	patches@...ts.linux.dev,
	Tony Luck <tony.luck@...el.com>
Subject: [PATCH v13 00/32] x86,fs/resctrl telemetry monitoring
Patches based on v6.18-rc3
Series available here:
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git rdt-aet-v13
Changes since v12 was posted here:
Link: https://lore.kernel.org/all/20251013223348.103390-1-tony.luck@intel.com/
My usual convention, patches from v11 named "XX/28", patches in this
series when they have a different number as "00YY".
--- v12 00/31 x86,fs/resctrl telemetry monitoring ---
No changes to "Background" section of cover letter.
--- v12 01/31 x86,fs/resctrl: Improve domain type checking ---
Added Reinette RB tag
--- v12 02/31 x86/resctrl: Move L3 initialization into new helper ---
No change
--- v12 03/31 x86/resctrl: Refactor domain_remove_cpu_mon() ready ---
Added Reinette RB tag
--- v12 04/31 x86/resctrl: Clean up domain_remove_cpu_ctrl() ---
No change
--- v12 05/31 x86,fs/resctrl: Refactor domain create/remove using ---
Move (keep) lockdep_assert_cpus_held() at start of mon_event_read()
Defer SNC change (summing cache-sharing domains) in rdtgroup_mondata_show()
to later patch.
--- v12 06/31 x86,fs/resctrl: Use struct rdt_domain_hdr when ---
New 0006 (pulled in __mon_event_count() refactor from old 18/31)
Set rr->err when returning an error from __mon_event_count() for bad r->rid.
0007 (was 06/31)
Rewrite commit message including "telemetry events" -> "monitoring events"
Drop change to resctrl_arch_cntr_read()
Set rr->err when returning an error from __mon_event_count() for invalid
domain header.
--- v12 07/31 x86,fs/resctrl: Rename struct rdt_mon_domain and ---
0008	Drop RB tag because of fir order issues introduced in v11
	Fix fir order issues.
--- v12 08/31 x86,fs/resctrl: Rename some L3 specific functions ---
0009	Update kerneldoc "Return" fixes to match existing style.
	Use imperative in commit message note about kerneldoc changes.
--- v12 09/31 fs/resctrl: Make event details accessible to ---
0010	No change
--- v12 10/31 x86,fs/resctrl: Handle events that can be read from ---
0011	Dropped cpu_on_correct_domain(). The refactor of __mon_event_count()
	in patch 0006 means it is no longer needed. The cpu checks in
	__l3_mon_event_count() continue to work without change.
--- v12 11/31 x86,fs/resctrl: Support binary fixed point event ---
0012	No change
--- v12 12/31 x86,fs/resctrl: Add an architectural hook called ---
0013	No change
--- v12 13/31 x86,fs/resctrl: Add and initialize rdt_resource for ---
Split
0014	Parts of 13/31 related to new resource. Dropped Reinette RB tag.
0015	SNC L3 checks and kerneldoc changes.
--- v12 14/31 x86/resctrl: Discover hardware telemetry events ---
0016	Major rewrite on the commit messge.
	Simplified the code to keep just one array of known event groups.
	Don't "return false" for failed intel_pmt_get_regions_by_feature()
	just continue to check other event_groups.
--- v12 15/31 x86,fs/resctrl: Fill in details of events for guid ---
0017	Clarify pmt_feature_group and event_group in commit message.
--- v12 16/31 x86,fs/resctrl: Add architectural event pointer ---
0018 Rebased and added Reinette RB tag.
--- v12 17/31 x86/resctrl: Find and enable usable telemetry ---
0019	Re-write commit to match current design.
	Move open coded check for usable regions to
	group_has_usable_regions() helper.
--- v12 18/31 fs/resctrl: Split L3 dependent parts out of ---
Pulled forward in series. Is now patch 0006.
--- v12 19/31 x86/resctrl: Read telemetry events ---
0020	Declare "tval" inside "case RDT_RESOURCE_PERF_PKG".
	Make intel_aet_read_event() sum to a local variable and
	assign to *val on success.
	Delete irrelevant paragraphs from commit comment.
	Update to say that when none of the aggregators have valid data
	the use will see "Unavailable".
--- v12 20/31 fs/resctrl: Refactor mkdir_mondata_subdir() ---
0021	Drop do_sum argument from mon_add_all_files(). Can infer from
	whether hdr argument is NULL.
	Rename mon_add_all_files() to _mkdir_mondata_subdir() and have
	it make the directory and fix user/group id to avoid code
	duplication at each call site.
	Pass explicit NULL for hdr argument to _mkdir_mondata_subdir()
	when creating files in top-level mon_L3_XX directory.
	Add check for r->rid == RDT_RESOURCE_L3 to mkdir_mondata_subdir()
	Fix commit comment to avoid code reference and just describe the
	problem.
--- v12 21/31 fs/resctrl: Refactor rmdir_mondata_subdir_allrdtgrp() ---
0022:	Move and reword comment about SNC sum directories to the new
	refactored function.
	Add check for r->rid == RDT_RESOURCE_L3 to rmdir_mondata_subdir_allrdtgrp()
	Fix commit comment as suggested.
--- v12 22/31 x86,fs/resctrl: Handle domain creation/deletion for ---
0023:	Added Reinette RB tag
--- v12 23/31 x86/resctrl: Add energy/perf choices to rdt boot ---
0024:	Added Reinette RB tag
--- v12 24/31 x86/resctrl: Handle number of RMIDs supported by ---
0025:	Replace "Limit an event group's number ..." paragraph in commit
	message with suggested alternative.
	Drop description of code ("Print r->num_rmid ...")
	New all_regions_have_sufficient_rmid() that builds on
	assumption that groups with no regions were weeded out in
	changes to patch 17/31 (now 0019).
--- v12 25/31 fs/resctrl: Move allocation/free of ---
0026:	Drop rdtgroup_mutex before error return in closid_num_dirty_rmid_alloc()
--- v12 26/31 x86,fs/resctrl: Compute number of RMIDs as minimum ---
0027:	No change
--- v12 27/31 fs/resctrl: Move RMID initialization to first mount ---
0028:	"may likely need" -> "needs"
	Added Reinette RB tag.
--- v12 28/31 x86/resctrl: Enable RDT_RESOURCE_PERF_PKG ---
0029:	Add to commit message that a console log is added.
--- v12 29/31 fs/resctrl: Provide interface to create ---
0030:	Added Reinette RB tag.
--- v12 30/31 x86/resctrl: Add debugfs files to show telemetry ---
0031:	Dropped spurious blank line from intel_aet_add_debugfs()
	Update to use new for_each_event_group() macro.
	Added Reinette RB tag.
--- v12 31/31 x86,fs/resctrl: Update documentation for telemetry ---
0032:	"if the number of RMIDs supported is lower than the number of RMIDs
	supported by the system" -> "if the number of RMIDs supported for that
	type is lower than the number of RMIDs supported by hardware for L3
	monitoring events"
	"this will reduce" -> "this may reduce"
	"the current RMID" -> "for the current monitoring group"
Background
----------
On Intel systems that support per-RMID telemetry monitoring each logical
processor keeps a local count for various events. When the
MSR_IA32_PQR_ASSOC.RMID value for the logical processor changes (or when a
two millisecond counter expires) these event counts are transmitted to
an event aggregator on the same package as the processor together with
the current RMID value. The event counters are reset to zero to begin
counting again.
Each aggregator takes the incoming event counts and adds them to
cumulative counts for each event for each RMID. Note that there can be
multiple aggregators on each package with no architectural association
between logical processors and an aggregator.
All of these aggregated counters can be read by an operating system from
the MMIO space of the Out Of Band Management Service Module (OOBMSM)
device(s) on a system. Any counter can be read from any logical processor.
Intel publishes details for each processor generation showing which
events are counted by each logical processor and the offsets for each
accumulated counter value within the MMIO space in XML files here:
https://github.com/intel/Intel-PMT.
For example there are two energy related telemetry events for the
Clearwater Forest family of processors and the MMIO space looks like this:
Offset  RMID    Event
------  ----    -----
0x0000  0       core_energy
0x0008  0       activity
0x0010  1       core_energy
0x0018  1       activity
...
0x23F0  575     core_energy
0x23F8  575     activity
In addition the XML file provides the units (Joules for core_energy,
Farads for activity) and the type of data (fixed-point binary with
bit 63 used to indicate the data is valid, and the low 18 bits as a
binary fraction).
Finally, each XML file provides a 32-bit unique id (or guid) that is
used as an index to find the correct XML description file for each
telemetry implementation.
The INTEL_PMT_TELEMETRY driver provides intel_pmt_get_regions_by_feature()
to enumerate the aggregator instances (also referred to as "telemetry
regions" in this series) on a platform. It provides:
1) guid  - so resctrl can determine which events are supported
2) MMIO base address of counters
3) package id
Resctrl accumulates counts from all aggregators on a package in order
to provide a consistent user interface across processor generations.
Directory structure for the telemetry events looks like this:
$ tree /sys/fs/resctrl/mon_data/
/sys/fs/resctrl/mon_data/
mon_data
├── mon_PERF_PKG_00
│   ├── activity
│   └── core_energy
└── mon_PERF_PKG_01
    ├── activity
    └── core_energy
Reading the "core_energy" file from some resctrl mon_data directory shows
the cumulative energy (in Joules) used by all tasks that ran with the RMID
associated with that directory on a given package. Note that "core_energy"
reports only energy consumed by CPU cores (data processing units,
L1/L2 caches, etc.). It does not include energy used in the "uncore"
(L3 cache, on package devices, etc.), or used by memory or I/O devices.
Signed-off-by: Tony Luck <tony.luck@...el.com>
Tony Luck (32):
  x86,fs/resctrl: Improve domain type checking
  x86/resctrl: Move L3 initialization into new helper function
  x86/resctrl: Refactor domain_remove_cpu_mon() ready for new domain
    types
  x86/resctrl: Clean up domain_remove_cpu_ctrl()
  x86,fs/resctrl: Refactor domain create/remove using struct
    rdt_domain_hdr
  fs/resctrl: Split L3 dependent parts out of __mon_event_count()
  x86,fs/resctrl: Use struct rdt_domain_hdr when reading counters
  x86,fs/resctrl: Rename struct rdt_mon_domain and rdt_hw_mon_domain
  x86,fs/resctrl: Rename some L3 specific functions
  fs/resctrl: Make event details accessible to functions when reading
    events
  x86,fs/resctrl: Handle events that can be read from any CPU
  x86,fs/resctrl: Support binary fixed point event counters
  x86,fs/resctrl: Add an architectural hook called for each mount
  x86,fs/resctrl: Add and initialize rdt_resource for package scope
    monitor
  fs/resctrl: Cleanup as L3 is no longer the only monitor resource
  x86/resctrl: Discover hardware telemetry events
  x86,fs/resctrl: Fill in details of events for guid 0x26696143 and
    0x26557651
  x86,fs/resctrl: Add architectural event pointer
  x86/resctrl: Find and enable usable telemetry events
  x86/resctrl: Read telemetry events
  fs/resctrl: Refactor mkdir_mondata_subdir()
  fs/resctrl: Refactor rmdir_mondata_subdir_allrdtgrp()
  x86,fs/resctrl: Handle domain creation/deletion for
    RDT_RESOURCE_PERF_PKG
  x86/resctrl: Add energy/perf choices to rdt boot option
  x86/resctrl: Handle number of RMIDs supported by RDT_RESOURCE_PERF_PKG
  fs/resctrl: Move allocation/free of closid_num_dirty_rmid[]
  x86,fs/resctrl: Compute number of RMIDs as minimum across resources
  fs/resctrl: Move RMID initialization to first mount
  x86/resctrl: Enable RDT_RESOURCE_PERF_PKG
  fs/resctrl: Provide interface to create architecture specific debugfs
    area
  x86/resctrl: Add debugfs files to show telemetry aggregator status
  x86,fs/resctrl: Update documentation for telemetry events
 .../admin-guide/kernel-parameters.txt         |   2 +-
 Documentation/filesystems/resctrl.rst         | 102 ++++-
 include/linux/resctrl.h                       |  67 ++-
 include/linux/resctrl_types.h                 |  11 +
 arch/x86/kernel/cpu/resctrl/internal.h        |  52 ++-
 fs/resctrl/internal.h                         |  68 ++-
 arch/x86/kernel/cpu/resctrl/core.c            | 275 ++++++++----
 arch/x86/kernel/cpu/resctrl/intel_aet.c       | 417 ++++++++++++++++++
 arch/x86/kernel/cpu/resctrl/monitor.c         |  50 ++-
 fs/resctrl/ctrlmondata.c                      | 123 +++++-
 fs/resctrl/monitor.c                          | 321 +++++++++-----
 fs/resctrl/rdtgroup.c                         | 293 ++++++++----
 arch/x86/Kconfig                              |  13 +
 arch/x86/kernel/cpu/resctrl/Makefile          |   1 +
 14 files changed, 1409 insertions(+), 386 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/resctrl/intel_aet.c
base-commit: dcb6fa37fd7bc9c3d2b066329b0d27dedf8becaa
-- 
2.51.0
Powered by blists - more mailing lists
 
