[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2a0ca6bd-374f-4738-8684-202920de78dc@intel.com>
Date: Thu, 13 Nov 2025 14:56:58 -0800
From: Reinette Chatre <reinette.chatre@...el.com>
To: Tony Luck <tony.luck@...el.com>, Fenghua Yu <fenghuay@...dia.com>, "Maciej
Wieczor-Retman" <maciej.wieczor-retman@...el.com>, Peter Newman
<peternewman@...gle.com>, James Morse <james.morse@....com>, Babu Moger
<babu.moger@....com>, Drew Fustini <dfustini@...libre.com>, Dave Martin
<Dave.Martin@....com>, Chen Yu <yu.c.chen@...el.com>
CC: <x86@...nel.org>, <linux-kernel@...r.kernel.org>,
<patches@...ts.linux.dev>
Subject: Re: [PATCH v13 32/32] x86,fs/resctrl: Update documentation for
telemetry events
Hi Tony,
On 10/29/25 9:21 AM, Tony Luck wrote:
> @@ -400,6 +399,24 @@ with the following files:
> bytes) at which a previously used LLC_occupancy
> counter can be considered for re-use.
>
> +If telemetry monitoring is available there will be an "PERF_PKG_MON" directory
"an" -> "a"?
> +with the following files:
> +
> +"num_rmids":
> + The number of RMIDs for telemetry monitoring events. By default,
> + resctrl will not enable telemetry events of a particular type
> + ("perf" or "energy") if the number of RMIDs supported for that
> + type is lower than the number of RMIDs supported by hardware
> + for L3 monitoring events. The user can force-enable each type
It is not clear to me how the number of L3 monitoring events is relevant here. This
is addressed later with the "The upper bound for how many "CTRL_MON" + "MON" can be
created ...", no?
How about something like: "if the number of RMIDs that can be tracked concurrently
for that type is lower than the total number of RMIDs supported by that type."?
(I am sure it can be improved)
> + of telemetry events with the "rdt=" boot command line option,
> + but this may reduce the number of "MON" groups that can be created.
Since this includes "CTRL_MON" and "MON" groups it may be simpler to just say "monitoring
groups".
> +
> +"mon_features":
> + Lists the telemetry monitoring events that are enabled on this system.
> +
> +The upper bound for how many "CTRL_MON" + "MON" can be created
> +is the smaller of the L3_MON and PERF_PKG_MON "num_rmids" values.
> +
> Finally, in the top level of the "info" directory there is a file
> named "last_cmd_status". This is reset with every "command" issued
> via the file system (making new directories or writing to any of the
...
> +Debugfs
> +=======
> +In addition to the use of debugfs for tracing of pseudo-locking performance,
> +architecture code may create debugfs directories associated with monitoring
> +features for a specific resource.
> +
> +The full pathname for these is in the form:
> +
> + /sys/kernel/debug/resctrl/info/{resource_name}_MON/{arch}/
> +
> +The presence, names, and format of these files may vary between architectures
> +even if the same resource is present.
> +
> +PERF_PKG_MON/x86_64
> +-------------------
> +Three files are present per telemetry aggregator instance that show status.
> +The prefix of each file name describes the type ("energy" or "perf") which
> +processor package it belongs to, and the instance number of the aggregator.
> +For example: "energy_pkg1_agg2".
> +
> +The suffix describes which data is reported in the file and
> +is one of:
(nit: unnecessary line break)
> +
> +data_loss_count:
> + This counts the number of times that this aggregator
> + failed to accumulate a counter value supplied by a CPU.
> +
> +data_loss_timestamp:
> + This is a "timestamp" from a free running 25MHz uncore
> + timer indicating when the most recent data loss occurred.
> +
> +last_update_timestamp:
> + Another 25MHz timestamp indicating when the
> + most recent counter update was successfully applied.
> +
> +
> Examples for RDT Monitoring along with allocation usage
> =======================================================
> Reading monitored data
Reinette
Powered by blists - more mailing lists