lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 22 Apr 2024 11:23:50 -0700
From: Peter Newman <peternewman@...gle.com>
To: Dave Martin <Dave.Martin@....com>
Cc: Babu Moger <babu.moger@....com>, corbet@....net, fenghua.yu@...el.com, 
	reinette.chatre@...el.com, tglx@...utronix.de, mingo@...hat.com, bp@...en8.de, 
	dave.hansen@...ux.intel.com, x86@...nel.org, hpa@...or.com, 
	paulmck@...nel.org, rdunlap@...radead.org, tj@...nel.org, 
	peterz@...radead.org, yanjiewtw@...il.com, kim.phillips@....com, 
	lukas.bulwahn@...il.com, seanjc@...gle.com, jmattson@...gle.com, 
	leitao@...ian.org, jpoimboe@...nel.org, rick.p.edgecombe@...el.com, 
	kirill.shutemov@...ux.intel.com, jithu.joseph@...el.com, kai.huang@...el.com, 
	kan.liang@...ux.intel.com, daniel.sneddon@...ux.intel.com, 
	pbonzini@...hat.com, sandipan.das@....com, ilpo.jarvinen@...ux.intel.com, 
	maciej.wieczor-retman@...el.com, linux-doc@...r.kernel.org, 
	linux-kernel@...r.kernel.org, eranian@...gle.com, james.morse@....com
Subject: Re: [RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable
 Bandwidth Monitoring Counters (ABMC)

Hi Dave,

On Mon, Apr 22, 2024 at 9:33 AM Dave Martin <Dave.Martin@....com> wrote:
>
> Hi Babu,
>
> On Thu, Mar 28, 2024 at 08:06:33PM -0500, Babu Moger wrote:
> >        Assignment flags can be one of the following:
> >
> >         t  MBM total event is assigned
>
> With my MPAM hat on this looks a bit weird, although I suppose it
> follows on from the way "mbm_total_bytes" and "mbm_local_bytes" are
> already exposed in resctrlfs.
>
> From an abstract point of view, "total" and "local" are just event
> selection criteria, additional to those in mbm_cfg_mask.  The different
> way they are treated in the hardware feels like an x86 implementation
> detail.
>
> For MPAM we don't currently distinguish local from non-local traffic, so
> I guess this just reduces to a simple on-off (i.e., "t" or nothing),
> which I guess is tolerable.
>
> This might want more thought if there is an expectation that more
> categories will be added here, though (?)

There should be a path forward whenever we start supporting
user-configured counter classes. I assume the letters a-z will be
enough to cover all the counter classes which could be used at once.

>
> >         l  MBM local event is assigned
> >         tl Both total and local MBM events are assigned
> >         _  None of the MBM events are assigned
>
> This use of '_' seems unusual.  Can we not just have the empty string
> for "nothing assigned"?
>
> Since every assignment is terminated by ';' or end-of-line, I don't
> think that there would be any parsing ambiguity (?)
>
> >
> >       Examples:
> >
> >       # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> >       non_defult_group//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> >       non_defult_group/non_default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> >       //0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> >       /default_mon1/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;
> >
> >       There are four groups and all the groups have local and total event assigned.
> >
> >       "//" - This is a default CONTROL MON group
> >
> >       "non_defult_group//" - This is non default CONTROL MON group
> >
> >       "/default_mon1/"  - This is Child MON group of the defult group
> >
> >       "non_defult_group/non_default_mon1/" - This is child MON group of the non default group
> >
> >       =tl means both total and local events are assigned.
> >
> > e. Update the group assignment states using the interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control.
> >
> >       The write format is similar to the above list format with addition of
> >       op-code for the assignment operation.
>
> With by resctrl newbie hat on:
>
> It feels a bit complex (for the kernel) to have userspace needing to
> write a script into a magic file that we need to parse, specifying
> updates to a bunch of controls already visible as objects in resctrlfs
> in their own right.
>
> What's the expected use case here?

I went over the use case of iterating a small number of monitors over
a much larger number of monitoring groups here:

https://lore.kernel.org/lkml/CALPaoCi=PCWr6U5zYtFPmyaFHU_iqZtZL-LaHC2mYxbETXk3ig@mail.gmail.com/

>
> If userspace really does need to switch lots of events simultaneously
> then I guess the overhead of enumerating and poking lots of individual
> files might be unacceptable though, and we would still need some global
> interfaces for operations such as "unassign everything"...

My main goal is for the number of parallel IPI batches to all the
domains (or write syscalls) to be O(num_rmids / num_monitors) rather
than O(num_rmids * num_monitors) as I need to know how frequently we
can afford to sample the current memory bandwidth of the maximum
number of monitoring groups supported.

Thanks!
-Peter

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ