[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALPaoCj7FBv_vfDp+4tgqo4p8T7Eov_Ys+CQRoAX6u43a4OTDQ@mail.gmail.com>
Date: Mon, 26 May 2025 15:14:08 +0200
From: Peter Newman <peternewman@...gle.com>
To: "Luck, Tony" <tony.luck@...el.com>
Cc: "Chatre, Reinette" <reinette.chatre@...el.com>, Babu Moger <babu.moger@....com>,
"corbet@....net" <corbet@....net>, "tglx@...utronix.de" <tglx@...utronix.de>,
"mingo@...hat.com" <mingo@...hat.com>, "bp@...en8.de" <bp@...en8.de>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>, "james.morse@....com" <james.morse@....com>,
"dave.martin@....com" <dave.martin@....com>, "fenghuay@...dia.com" <fenghuay@...dia.com>,
"x86@...nel.org" <x86@...nel.org>, "hpa@...or.com" <hpa@...or.com>,
"paulmck@...nel.org" <paulmck@...nel.org>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>, "thuth@...hat.com" <thuth@...hat.com>,
"rostedt@...dmis.org" <rostedt@...dmis.org>, "ardb@...nel.org" <ardb@...nel.org>,
"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
"daniel.sneddon@...ux.intel.com" <daniel.sneddon@...ux.intel.com>,
"jpoimboe@...nel.org" <jpoimboe@...nel.org>,
"alexandre.chartre@...cle.com" <alexandre.chartre@...cle.com>,
"pawan.kumar.gupta@...ux.intel.com" <pawan.kumar.gupta@...ux.intel.com>,
"thomas.lendacky@....com" <thomas.lendacky@....com>, "perry.yuan@....com" <perry.yuan@....com>,
"seanjc@...gle.com" <seanjc@...gle.com>, "Huang, Kai" <kai.huang@...el.com>,
"Li, Xiaoyao" <xiaoyao.li@...el.com>,
"kan.liang@...ux.intel.com" <kan.liang@...ux.intel.com>, "Li, Xin3" <xin3.li@...el.com>,
"ebiggers@...gle.com" <ebiggers@...gle.com>, "xin@...or.com" <xin@...or.com>,
"Mehta, Sohil" <sohil.mehta@...el.com>,
"andrew.cooper3@...rix.com" <andrew.cooper3@...rix.com>,
"mario.limonciello@....com" <mario.limonciello@....com>,
"linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"Wieczor-Retman, Maciej" <maciej.wieczor-retman@...el.com>, "Eranian, Stephane" <eranian@...gle.com>,
"Xiaojian.Du@....com" <Xiaojian.Du@....com>, "gautham.shenoy@....com" <gautham.shenoy@....com>
Subject: Re: [PATCH v13 11/27] x86/resctrl: Implement resctrl_arch_config_cntr()
to assign a counter with ABMC
Hi Tony,
On Fri, May 23, 2025 at 11:08 PM Luck, Tony <tony.luck@...el.com> wrote:
>
> On Thu, May 22, 2025 at 10:16:16PM +0000, Luck, Tony wrote:
> > > It looks to me as though there are a couple of changes in the telemetry work
> > > that would benefit this work. https://lore.kernel.org/lkml/20250521225049.132551-2-tony.luck@intel.com/
> > > switches the monitor events to be maintained in an array indexed by event ID, eliminating the
> > > need for searching the evt_list that this work does in a couple of places. Also note the handy
> > > new for_each_mbm_event() helper (https://lore.kernel.org/lkml/20250521225049.132551-5-tony.luck@intel.com/).
> >
> > Yesterday I ran through the exercise of rebasing my AET patches on top of these
> > ABMC patches in order to check whether the ABMC patches painted resctrl
> > into some corner that would be hard to get back out of.
> >
> > Good news: they don't.
> >
> > There was a bunch of manual patching to make the first four patches fit on top
> > of the ABMC code, but I also noticed a few places where things were simpler
> > after combining the two series.
> >
> > Maybe a good path forward would be to take those first four patches from
> > my AET series and then build ABMC on top of those.
>
> As an encouragement to try this direction, I took my four patches
> on top of tip x86/cache and then applied Babu's ABMC series.
I did the same thing last week, except in the other order, so I
switched to your changes to test.
>
> Changes to Babu's code:
> 1) Adapt where needed for removal of evt_list. Use event array instead.
> 2) Use for_each_mbm_event() [Maybe didn't get all places?]
> 3) Bring the s/evt_val/evt_cfg/ fix into patch 20 from 21
> 4) Fix fir tree declaration for resctrl_process_assign()
>
> I don't have an AMD system to check if the ABMC parts still work. But
> it does pass the resctrl self tests, so legacy isn't broken.
>
> Patches in the "my_mbm_plus_babu_abmc" branch of my kernel.org
> repo: git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git
Thanks for applying my suggestion[1] about the array entry sizes, but
you needed one more dereference:
diff --git a/arch/x86/kernel/cpu/resctrl/core.c
b/arch/x86/kernel/cpu/resctrl/core.c
index 1db6a61e27746..0c27e0a5a7b96 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -399,7 +399,7 @@ static int domain_setup_ctrlval(struct
rdt_resource *r, struct rdt_ctrl_domain *
*/
static int arch_domain_mbm_alloc(u32 num_rmid, struct
rdt_hw_mon_domain *hw_dom)
{
- size_t tsize = sizeof(hw_dom->arch_mbm_states[0]);
+ size_t tsize = sizeof(*hw_dom->arch_mbm_states[0]);
enum resctrl_event_id evt;
int idx;
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 098ff002d2232..44ec33cb165f7 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -4819,7 +4823,7 @@ void resctrl_offline_mon_domain(struct
rdt_resource *r, struct rdt_mon_domain *d
static int domain_setup_mon_state(struct rdt_resource *r, struct
rdt_mon_domain *d)
{
u32 idx_limit = resctrl_arch_system_num_rmid_idx();
- size_t tsize = sizeof(d->mbm_states[0]);
+ size_t tsize = sizeof(*d->mbm_states[0]);
enum resctrl_event_id evt;
int idx;
You should be able to repro an array overrun without ABMC, and a page
fault is likely if the system implements a lot of RMIDs. The AMD EPYC
9B45 I tested on implements 4096 RMIDs.
Thanks,
-Peter
[1] https://lore.kernel.org/lkml/CALPaoCj8yfzJ=5CkxTPQXc0-WRWpu0xKRX8v4FAWFGQKtXtMUw@mail.gmail.com/
Powered by blists - more mailing lists