lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2a0ca6bd-374f-4738-8684-202920de78dc@intel.com>
Date: Thu, 13 Nov 2025 14:56:58 -0800
From: Reinette Chatre <reinette.chatre@...el.com>
To: Tony Luck <tony.luck@...el.com>, Fenghua Yu <fenghuay@...dia.com>, "Maciej
 Wieczor-Retman" <maciej.wieczor-retman@...el.com>, Peter Newman
	<peternewman@...gle.com>, James Morse <james.morse@....com>, Babu Moger
	<babu.moger@....com>, Drew Fustini <dfustini@...libre.com>, Dave Martin
	<Dave.Martin@....com>, Chen Yu <yu.c.chen@...el.com>
CC: <x86@...nel.org>, <linux-kernel@...r.kernel.org>,
	<patches@...ts.linux.dev>
Subject: Re: [PATCH v13 32/32] x86,fs/resctrl: Update documentation for
 telemetry events

Hi Tony,

On 10/29/25 9:21 AM, Tony Luck wrote:
> @@ -400,6 +399,24 @@ with the following files:
>  		bytes) at which a previously used LLC_occupancy
>  		counter can be considered for re-use.
>  
> +If telemetry monitoring is available there will be an "PERF_PKG_MON" directory

"an" -> "a"?

> +with the following files:
> +
> +"num_rmids":
> +		The number of RMIDs for telemetry monitoring events. By default,
> +		resctrl will not enable telemetry events of a particular type
> +		("perf" or "energy") if the number of RMIDs supported for that
> +		type is lower than the number of RMIDs supported by hardware
> +		for L3 monitoring events. The user can force-enable each type

It is not clear to me how the number of L3 monitoring events is relevant here. This
is addressed later with the "The upper bound for how many "CTRL_MON" + "MON" can be
created ...", no?
How about something like: "if the number of RMIDs that can be tracked concurrently
for that type is lower than the total number of RMIDs supported by that type."?
(I am sure it can be improved)


> +		of telemetry events with the "rdt=" boot command line option,
> +		but this may reduce the number of "MON" groups that can be created.

Since this includes "CTRL_MON" and "MON" groups it may be simpler to just say "monitoring
groups".
	> +
> +"mon_features":
> +		Lists the telemetry monitoring events that are enabled on this system.
> +
> +The upper bound for how many "CTRL_MON" + "MON" can be created
> +is the smaller of the L3_MON and PERF_PKG_MON "num_rmids" values.
> +
>  Finally, in the top level of the "info" directory there is a file
>  named "last_cmd_status". This is reset with every "command" issued
>  via the file system (making new directories or writing to any of the

...

> +Debugfs
> +=======
> +In addition to the use of debugfs for tracing of pseudo-locking performance,
> +architecture code may create debugfs directories associated with monitoring
> +features for a specific resource.
> +
> +The full pathname for these is in the form:
> +
> +    /sys/kernel/debug/resctrl/info/{resource_name}_MON/{arch}/
> +
> +The presence, names, and format of these files may vary between architectures
> +even if the same resource is present.
> +
> +PERF_PKG_MON/x86_64
> +-------------------
> +Three files are present per telemetry aggregator instance that show status.
> +The prefix of each file name describes the type ("energy" or "perf") which
> +processor package it belongs to, and the instance number of the aggregator.
> +For example: "energy_pkg1_agg2".
> +
> +The suffix describes which data is reported in the file and
> +is one of:

(nit: unnecessary line break)

> +
> +data_loss_count:
> +	This counts the number of times that this aggregator
> +	failed to accumulate a counter value supplied by a CPU.
> +
> +data_loss_timestamp:
> +	This is a "timestamp" from a free running 25MHz uncore
> +	timer indicating when the most recent data loss occurred.
> +
> +last_update_timestamp:
> +	Another 25MHz timestamp indicating when the
> +	most recent counter update was successfully applied.
> +
> +
>  Examples for RDT Monitoring along with allocation usage
>  =======================================================
>  Reading monitored data

Reinette

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ