[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aKzhsOWp2TlFF5mK@agluck-desk3>
Date: Mon, 25 Aug 2025 15:20:32 -0700
From: "Luck, Tony" <tony.luck@...el.com>
To: "Chatre, Reinette" <reinette.chatre@...el.com>, Fenghua Yu
<fenghuay@...dia.com>, "Wieczor-Retman, Maciej"
<maciej.wieczor-retman@...el.com>, Peter Newman <peternewman@...gle.com>,
James Morse <james.morse@....com>, Babu Moger <babu.moger@....com>, "Drew
Fustini" <dfustini@...libre.com>, Dave Martin <Dave.Martin@....com>, "Chen,
Yu C" <yu.c.chen@...el.com>
CC: "x86@...nel.org" <x86@...nel.org>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "patches@...ts.linux.dev"
<patches@...ts.linux.dev>
Subject: Re: [PATCH v8 00/32] x86,fs/resctrl telemetry monitoring
On Fri, Aug 15, 2025 at 03:47:17PM +0000, Luck, Tony wrote:
> > I think this series is close to being ready to pass to the x86 maintainers.
> > To that end I focused a lot on the changelogs with the goal to meet the
> > tip requirements that mostly involved switching to imperative tone. I do not
> > expect that I found all the cases though (and I may also have made some mistakes
> > in my suggested text!) so please ensure the changelogs are in imperative tone
> > and uses consistent terms throughout the series.
> >
> > If you disagree with any feedback or if any of the feedback is unclear please
> > let us discuss before you spin a new version. Of course it is not required
> > that you follow all feedback but if you don't I would like to learn why.
> >
> > Please note that I will be offline next week.
>
> Reinette,
>
> I took one fast pass through all your comments. I think they all
> look good (and I believe I understand each one). Thanks for all
> the suggestions.
>
> I'll try to keep (better) notes on what I change as I work through
> each patch so I'll have a summary of any areas that I'm unsure
> about for you to read when you get back before I post v9.
>
> Enjoy your time away.
Reinette,
I've completed a longer, slower, pass through making changes to prepare
for v9. Summary of changes per patch below. I didn't disagree with any
of your suggestions.
The bulk of the changes are small, and localized to each patch. The
exception being removal of pkg_mmio_info[] which dropped patch 18 (which
just counted regions prior to allocation of pkg_mmio_info[]) and patch
19 (which finished filling out the details.
discover_events() is now named enable_events(), since there are almost
no "steps" involved, it doesn't have a header comment. The name now
describes what it does.
Theoretically skip_this_region() might find some regions unusable, while
others in the same pmt_feature_group are OK. To handle this I zero the
telemetry_region::addr so that intel_aet_read_event() can easily skip
"bad" regions.
I've pushed a rdt-aet-v9-wip branch to:
Link: https://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git
if you want to take a quick look at the end result before
I patch bomb the list.
-Tony
--- cover ---
INTEL_PMT_DISCOVERY -> INTEL_PMT_TELEMETRY
--- 1 ---
No change
--- 2 ---
No change
--- 3 ---
No change
--- 4 ---
No change
--- 5 ---
"The "type" field provides" -> "rdt_domain_hdr::type provides"
Update final paragraph of commit message as suggested.
--- 6 ---
Code: in domain_add_cpu_mon() change WARN_ON_ONCE(1); in default case to
pr_warn_once("Unknown resource rid=%d\n", r->rid);
--- 7 ---
Code: in domain_remove_cpu_mon() move domain_header_is_valid()to
immediately before container_of() and switch check from r->rid to
hard-coded RDT_RESOURCE_L3.
s/list_del_rcu(&d->hdr.list);/list_del_rcu(&hdr->list);/
"separated" -> "separate"
"can skip" -> "skips"
--- 8 ---
Code: in domain_remove_cpu_ctrl() move domain_header_is_valid()
to immediately before container_of().
s/list_del_rcu(&d->hdr.list);/list_del_rcu(&hdr->list);/
(moved here from patch 9)
"refactor" -> "refactor domain_remove_cpu_ctrl()"
--- 9 ---
Change second paragraph of commit comment as suggested.
"so the rmid_read::d field is replaced" -> "so replace the L3 specific domain
pointer rmid_read::d with rmid_read::hdr that points to the generic domain
header"
Use imperative tone for last paragraph "Update kerneldoc for mon_data::sum ..."
--- 10 ---
Change rdt_mon_domain to rdt_l3_mon_domain in comment above logical_rmid_to_physical_rmid()
Replace entire commit comment with improved version.
--- 11 ---
"different resources" -> "a different resource"
"these functions" -> "the L3 resource specific functions"
"Two groups of functions renamed here:" -> "Rename three groups of functions:"
Added rdt_get_mon_l3_config() to list of renamed functions to put the "l3"
before the "mon" for consistency.
When changing names for resctrl_mon_resource_{init,exit} add the "l3_" before
"mon" for consistency with other *l3_mon*" naming.
--- 12 ---
"to lower levels" -> "via the mon_data and rmid_read structures to
the functions finally reading the monitoring data."
Replace 3rd paragraph of commit comment with supplied better version.
--- 13 ---
Code: In cpu_on_correct_domain() s/return -EINVAL;/return false;/
--- 14 ---
Code: Move definition of MAX_BINARY_BITS from <linux/resctrl.h> to
fs/resctrl/internal.h.
Be explicit in commit comment that the file system makes the determination
on which events can be displayed in floating point format.
--- 15 ---
"asyncronnous" -> "asynchronous"
"for these drivers" -> "of these drivers"
"are called" -> "completes"
"But expectations" -> "Expectations"
"The call is made with no locks held." -> "resctrl filesystem calls the
hook with no locks held."
--- 16 ---
s/CPU hotplug/CPU hot plug/
Add Reinette's RB tag.
--- 17 ---
Code:
"OOBMSM" -> "INTEL_PMT_TELEMETRY"
"INTEL_PMT_DISCOVERY" -> "INTEL_PMT_TELEMETRY"
re-wrap comment for get_pmt_feature() to use 80 columns.
"OOBMSM discovery" -> "INTEL_PMT_TELEMETRY"
Add the intel_pmt_put_feature_group() calls to intel_aet_exit()
to match the intel_pmt_get_regions_by_feature() calls in get_pmt_feature()
using a new macro for_each_enabled_event_group().
Rename discover_events() to enable_events()
Add period at end of help text in arch/x86/Kconfig.
"Data for telemetry events is collected by each CPU and sent" ->
"Each CPU collects data for telemetry events that it sends"
"is changed" -> "changes"
"or when two milliseconds have elapsed" -> "or when a two millisecond timer expires"
"mad" -> "made"
"Enumeration of support for telemetry events is done" ->
"The INTEL_PMT_TELEMETRY driver enumerates support for telemetry events."
Drop references to INTEL_PMT_DISCOVERY driver.
Merge last two paragraphs of commit message.
Reformat commit to use more of page width.
Add maintainer note about checkpatch complaints for DEFINE_FREE()
--- 18 ---
Dropped. See new patch 0019
--- 19 ---
Dropped. See new patch 0019
--- 20 ---
Now 0018
Rewrite opening paragraph to avoid "continuation of the subject"
Add note/link on the source of the XML files that describe events.
--- 21 ---
Now 0019
Drop "vague first sentence" of second paragraph.
--- NEW 0020 ---
Replaces most of parts 18/32 and 19/32.
Contains skip_this_region() from patch 18/32, but skips all the
code to count regions and allocat pkg_mmio_info[].
Also includes event enabling code from 22/32
--- 22 ---
Now 0021
Modify intel_aet_read_event() to dig into pmt_event::pfg to
find the MMIO base addresses that v8 patch stored in the pkg_mmio_info[]
structures.
--- 23 ---
Now 0022
Add domain_header_is_valid() check in domain_remove_cpu_mon()
before using container_of().
"There are structures" -> "There are per domain structures ..."
Replace my commit fix description with better version from Reinette.
--- 24 ---
Unparseable last sentence of commit message replaced with details
and examples.
--- 25 ---
RMIDS -> RMIDs
"Adjusted downwards ..." -> "May be adjusted downwards ..."
check_rmid_count() -> all_regions_have_sufficient_rmid()
"Potentially disable" -> "Disable"
Add comment:
/* e->num_rmids only adjusted lower if user forces an unusable region to be usable */
--- 26 ---
Add "during resctrl initialization" and "during resctrl exit" to commit
background statement.
Add "closid_num_dirty_rmid[]" to be specific about what is being allocated/freed.
--- 27 ---
"mon capable" -> "mon_capable" (three times)
"alloc capable" -> "alloc_capable"
"rdt_l3_mon_domain::states[]" -> "rdt_l3_mon_domain::mbm_states[] and
+rdt_l3_mon_domain::rmid_busy_llc"
"the number of RMIDs" -> "the system's number of RMIDs"
--- 28 ---
Add comment to setup_rmid_lru_list() to note that rmid_ptrs[]
is allocated of first mount and is reused on subsequent mounts.
It is freed in resctrl_exit().
Lock/unlock rdtgroup_mutex around body of free_rmid_lru_list()
Rewrite commit based on suggestions with some modifications to explain
why error paths in rdt_get_tree() do not call free_rmid_lru_list().
--- 29 ---
TODO: recheck for use of "CPU hot plug notifiers" may have been called "hooks", "callbacks", and
"handlers" through this series.
--- 30 ---
"a resource" -> "a monitoring resource"
--- 31 ---
"last_update_timestamp" -> "agg_last_update_timestamp" in commit comment
Move creation of debugfs files to the end of enable_events().
Rewrite to work based on event_group::pfg since event_group::pkginfo[] is gone.
--- 32 ---
Describe "num_rmids" file values independently for L3 and telemetry.
Move the note about upper bound on directory creation to its own
paragraph to say it is the smaller of reported "num_rmids" values.
"or of a processor package" -> "another for each processor package"
Change paragraph about contents of subdirectories of mon_data to
gize example file names instead of hard coding specifics.
prescence -> presence
will vary -> may vary
last_update_timestamp -> agg_last_update_timestamp
Simplify commit comment as suggested.
Powered by blists - more mailing lists