linux-kernel - Re: [PATCH v9 14/26] x86/resctrl: Introduce interface to display number of free counters

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALPaoCjL6FEmVgX-h3_GQEVZNAT3FcH34t9o1PwbUacVzXjZPg@mail.gmail.com>
Date: Fri, 29 Nov 2024 10:59:11 +0100
From: Peter Newman <peternewman@...gle.com>
To: babu.moger@....com
Cc: Reinette Chatre <reinette.chatre@...el.com>, corbet@....net, tglx@...utronix.de, 
	mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com, 
	fenghua.yu@...el.com, x86@...nel.org, hpa@...or.com, thuth@...hat.com, 
	paulmck@...nel.org, rostedt@...dmis.org, akpm@...ux-foundation.org, 
	xiongwei.song@...driver.com, pawan.kumar.gupta@...ux.intel.com, 
	daniel.sneddon@...ux.intel.com, perry.yuan@....com, sandipan.das@....com, 
	kai.huang@...el.com, xiaoyao.li@...el.com, seanjc@...gle.com, 
	jithu.joseph@...el.com, brijesh.singh@....com, xin3.li@...el.com, 
	ebiggers@...gle.com, andrew.cooper3@...rix.com, mario.limonciello@....com, 
	james.morse@....com, tan.shaopeng@...itsu.com, tony.luck@...el.com, 
	linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org, 
	maciej.wieczor-retman@...el.com, eranian@...gle.com, jpoimboe@...nel.org, 
	thomas.lendacky@....com
Subject: Re: [PATCH v9 14/26] x86/resctrl: Introduce interface to display
 number of free counters

Hi Babu,

On Thu, Nov 28, 2024 at 8:35 PM Moger, Babu <bmoger@....com> wrote:
>
> Hi Peter,
>
> On 11/28/2024 5:10 AM, Peter Newman wrote:
> > Hi Babu, Reinette,
> >
> > On Wed, Nov 27, 2024 at 8:05 PM Reinette Chatre
> > <reinette.chatre@...el.com> wrote:
> >>
> >> Hi Babu,
> >>
> >> On 11/27/24 6:57 AM, Moger, Babu wrote:

> >>> 1. Each group needs to remember counter ids in each domain for each event.
> >>>     For example:
> >>>     Resctrl group mon1
> >>>      Total event
> >>>      dom 0 cntr_id 1,
> >>>      dom 1 cntr_id 10
> >>>      dom 2 cntr_id 11
> >>>
> >>>     Local event
> >>>      dom 0 cntr_id 2,
> >>>      dom 1 cntr_id 15
> >>>      dom 2 cntr_id 10
> >>
> >> Indeed. The challenge here is that domains may come and go so it cannot be a simple
> >> static array. As an alternative it can be an xarray indexed by the domain ID with
> >> pointers to a struct like below to contain the counters associated with the monitor
> >> group:
> >>          struct cntr_id {
> >>                  u32     mbm_total;
> >>                  u32     mbm_local;
> >>          }
> >>
> >>
> >> Thinking more about how this array needs to be managed made me wonder how the
> >> current implementation deals with domains that come and go. I do not think
> >> this is currently handled. For example, if a new domain comes online and
> >> monitoring groups had counters dynamically assigned, then these counters are
> >> not configured to the newly online domain.
>
> I am trying to understand the details of your approach here.
> >
> > In my prototype, I allocated a counter id-indexed array to each
> > monitoring domain structure for tracking the counter allocations,
> > because the hardware counters are all domain-scoped. That way the
> > tracking data goes away when the hardware does.
> >
> > I was focused on allowing all pending counter updates to a domain
> > resulting from a single mbm_assign_control write to be batched and
> > processed in a single IPI, so I structured the counter tracker
> > something like this:
>
> Not sure what you meant here. How are you batching two IPIs for two domains?
>
> #echo "//0=t;1=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>
> This is still a single write. Two IPIs are sent separately, one for each
> domain.
>
> Are you doing something different?

I said "all pending counter updates to a domain", whereby I meant
targeting a single domain.

Depending on the CPU of the caller, your example write requires 1 or 2 IPIs.

What is important is that the following write also requires 1 or 2 IPIs:

(assuming /sys/fs/resctrl/mon_groups/[g1-g31] exist, line breaks added
for readability)

echo $'//0=t;1=t\n
/g1/0=t;1=t\n
/g2/0=t;1=t\n
/g3/0=t;1=t\n
/g4/0=t;1=t\n
/g5/0=t;1=t\n
/g6/0=t;1=t\n
/g7/0=t;1=t\n
/g8/0=t;1=t\n
/g9/0=t;1=t\n
/g10/0=t;1=t\n
/g11/0=t;1=t\n
/g12/0=t;1=t\n
/g13/0=t;1=t\n
/g14/0=t;1=t\n
/g15/0=t;1=t\n
/g16/0=t;1=t\n
/g17/0=t;1=t\n
/g18/0=t;1=t\n
/g19/0=t;1=t\n
/g20/0=t;1=t\n
/g21/0=t;1=t\n
/g22/0=t;1=t\n
/g23/0=t;1=t\n
/g24/0=t;1=t\n
/g25/0=t;1=t\n
/g26/0=t;1=t\n
/g27/0=t;1=t\n
/g28/0=t;1=t\n
/g29/0=t;1=t\n
/g30/0=t;1=t\n
/g31/0=t;1=t\n'

My ultimate goal is for a thread bound to a particular domain to be
able to unassign and reassign the local domain's 32 counters in a
single write() with no IPIs at all. And when IPIs are required, then
no more than one per domain, regardless of the number of groups
updated.


>
> >
> > struct resctrl_monitor_cfg {
> >      int closid;
> >      int rmid;
> >      int evtid;
> >      bool dirty;
> > };
> >
> > This mirrors the info needed in whatever register configures the
> > counter, plus a dirty flag to skip over the ones that don't need to be
> > updated.
>
> This is what my understanding of your implementation.
>
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index d94abba1c716..9cebf065cc97 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -94,6 +94,13 @@ struct rdt_ctrl_domain {
>          u32                             *mbps_val;
>   };
>
> +struct resctrl_monitor_cfg {
> +    int closid;
> +    int rmid;
> +    int evtid;
> +    bool dirty;
> +};
> +
>   /**
>    * struct rdt_mon_domain - group of CPUs sharing a resctrl monitor
> resource
>    * @hdr:               common header for different domain types
> @@ -116,6 +123,7 @@ struct rdt_mon_domain {
>          struct delayed_work             cqm_limbo;
>          int                             mbm_work_cpu;
>          int                             cqm_work_cpu;
> +     /* Allocate num_mbm_cntrs entries in each domain */
> +       struct resctrl_monitor_cfg      *mon_cfg;
>   };
>
>
> When a user requests an assignment for total event to the default group
> for domain 0, you go search in rdt_mon_domain(dom 0) for empty mon_cfg
> entry.
>
> If there is an empty entry, then use that entry for assignment and
> update closid, rmid, evtid and dirty = 1. We can get all these
> information from default group here.
>
> Does this make sense?

Yes, sounds correct.

>
> >
> > For the benefit of displaying mbm_assign_control, I put a pointer back
> > to any counter array entry allocated in the mbm_state struct only
> > because it's an existing structure that exists for every rmid-domain
> > combination.
>
> Pointer in mbm_state may not be required here.
>
> We are going to loop over resctrl groups. We can search the
> rdt_mon_domain to see if specific closid, rmid, evtid is already
> assigned or not in that domain.

No, not required I guess. High-performance CPUs can probably search a
32-entry array very quickly.

-Peter