[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZOlRNTq3lE7VNHjU@agluck-desk3>
Date: Fri, 25 Aug 2023 18:11:17 -0700
From: Tony Luck <tony.luck@...el.com>
To: Reinette Chatre <reinette.chatre@...el.com>
Cc: Amit Singh Tomar <amitsinght@...vell.com>,
"Yu, Fenghua" <fenghua.yu@...el.com>,
"james.morse@....com" <james.morse@....com>,
George Cherian <gcherian@...vell.com>,
"robh@...nel.org" <robh@...nel.org>,
"peternewman@...gle.com" <peternewman@...gle.com>,
Drew Fustini <dfustini@...libre.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>
Subject: Re: resctrl2 - status
On Fri, Aug 25, 2023 at 04:08:21PM -0700, Reinette Chatre wrote:
> Hi Tony,
>
> On 8/25/2023 1:54 PM, Tony Luck wrote:
> > On Fri, Aug 25, 2023 at 01:20:22PM -0700, Reinette Chatre wrote:
> >> On 8/25/2023 12:44 PM, Luck, Tony wrote:
> >>>>>> Alternatively, can user space just take a "load all resctrl modules
> >>>>>> and see what sticks" (even modules of different architectures since
> >>>>>> a user space may want to be generic) approach?
> >>>>>
> >>>>> This mostly works. Except for the cases where different modules access
> >>>>> the same underlying hardware, so can't be loaded together.
> >>>>>
> >>>>> Examples:
> >>>>>
> >>>>> rdt_l3_cat vs. rdt_l3_cdp - user needs to decide whether they want CDP or not.
> >>>>> But this is already true ... they have to decide whether to pass the "-o cdp" option
> >>>>> to mount.
> >>>>>
> >>>>> rdt_l3_mba vs. rdt_l3_mba_MBps - does the user want to control memory bandwidth
> >>>>> with percentages, or with MB/sec values. Again the user already has to make this
> >>>>> decision when choosing mount options.
> >>>>>
> >>>>>
> >>>>> Maybe the "What resctrl options does this machine support?" question would be
> >>>>> best answered with a small utility?
> >>>>
> >>>> A user space utility or a kernel provided utility? If it is a user space utility
> >>>> I think it would end up needing to duplicate what the kernel is required to do
> >>>> to know if a particular feature is supported. It seems appropriate that this
> >>>> could be a kernel utility that can share this existing information with user
> >>>> space. resctrl already supports the interface for this via /sys/fs/resctrl/info.
> >>>
> >>> I was imagining a user space utility. Even though /proc/cpuinfo doesn't show
> >>> all features, a utility has access to all the CPUID leaves that contain the
> >>> details of each feature enumeration.
> >>
> >> For x86 that may work (in some scenarios, see later) for now but as I understand
> >> Arm would need a different solution where I believe the information is obtained
> >> via ACPI. I think it is unnecessary to require user space to have parsers for
> >> CPUID and ACPI if that same information needs to be parsed by the kernel and
> >> there already exists an interface with which the information is communicated
> >> from kernel to user space. Also, just because information CPUID shows a feature
> >> is supported by the hardware does not mean that the kernel has support for that
> >> feature. This could be because of a feature mismatch between user space and
> >> kernel, or even some features disabled for use via the, for example "rdt=!l3cat",
> >> kernel parameter.
> >
> > Agreed this is complex, and my initial resctrl2 proposal lacks
> > functionality in this area.
>
> Why is there a need to reinvent these parts?
Perhaps there isn't ... see below.
>
> >>>> fyi ... as with previous attempts to discuss this work I find it difficult
> >>>> to discuss this work when you are selective about what you want to discuss/answer
> >>>> and just wipe the rest. Through this I understand that I am not your target
> >>>> audience.
> >>>
> >>> Not my intent. I value your input highly. I'm maybe too avid a follower of the
> >>> "trim your replies" school of e-mail etiquette. I thought I'd covered the gist
> >>> of your message.
> >>>
> >>> I'll try to be more thorough in responding in the future.
> >>
> >> Two items from my previous email remain open:
> >>
> >> First, why does making the code modular require everything to be loadable
> >> modules?
> >> I think that it is great that the code is modular. Ideally it will help to
> >> support the other architectures. As you explain this modular design also
> >> has the benefit that "modules" can be loaded and unloaded after resctrl mount.
> >> Considering your example of MBA and MBA_MBps support ... if I understand
> >> correctly with code being modular it enables changes from one to the other
> >> after resctrl mount. User can start with MBA and then switch to MBA_MBps
> >> without needing to unmount resctrl. What I do not understand is why does
> >> the code being modular require everything to be modules? Why, for example,
> >> could a user not interact with a resctrl file that enables the user to make
> >> this switch from, for example, MBA to MBA_MBps? With this the existing
> >> interfaces can remain to be respected, the existing mount parameters need
> >> to remain anyway, while enabling future "more modular" usages.
> >
> > Lots of advantages to modules:
> > 1) Only load what you need.
> > - saves memory
> > - reduces potential attack surface
> > - may avoid periodic timers (e.g. for MBM overflow and
> > for LLC occupancy "limbo" mode).
> > 2) If there is a security fix, can be deployed without a reboot.
> > 3) Isolation between different features.
> > - Makes development and testing simpler
> >
>
> From what I understand (1) and (3) are accomplished through things
> being modular. To transition smoothly it may be required for all
> currently supported features to be loaded by default, with the
> option to unload afterwards by user space that understands new
> modular interfaces.
>
> (2) does not need a module for each resource and feature supported
> by resctrl. A single resctrl module would accomplish this and I
> would expect it to be something everybody would like. James also
> mentioned it being on his significant to-do list.
>
> > Sure some things like switching MBA to MBA_MBps mode by writing to
> > a control file are theoretically possible. But they would be far more
> > complex implementations with many possible oppurtunities for bugs.
> > I think Vikas made a good choice to make this a mount option rather
> > than selectable at run time.
> >
> >> Second, copied from my previous email, what is the plan to deal with current
> >> users that just mount resctrl and expect to learn from it what features are
> >> supported?
> >
> > Do such users exist? Resctrl is a sophisticated system management tool.
> > I'd expect system administrators deploying it are well aware of the
> > capabilities of the different types of systems in their data center.
> >
> > But if I'm wrong, then I have to go back to figure out a way to
> > expose this information in a better way than randomly running "modprobe"
> > to see what sticks.
>
> I always start with intel-cmt-cat but I believe that the burden would be
> on you to convince all that existing user space would not be impacted
> by this change. If I understand correctly this implementation would
> result in mounting resctrl to have an empty schemata and no resources
> in the info directory. Without knowledge about how to enable resources
> the user space could be expected to interpret that as no resources enabled
> on the system.
Reinette,
The basic issue is that my module based system has become less user
friendly. Requiring extra steps to get basic things works.
Luckily there is a simple solution. Make the modules for the basic
functions autoload. E.g. for MBA:
+static const struct x86_cpu_id mba_feature[] = {
+ X86_MATCH_FEATURE(X86_FEATURE_MBA, 0),
+ { }
+};
+MODULE_DEVICE_TABLE(x86cpu, mba_feature);
Then immediately after booting the system looks like this:
$ lsmod | grep rdt
rdt_l3_mba 16384 0
rdt_mbm_local_bytes 12288 0
rdt_mbm_total_bytes 12288 0
rdt_llc_occupancy 12288 0
rdt_l3_cat 16384 0
And mounting resctrl:
$ sudo mount -t resctrl resctrl /sys/fs/resctrl
$ tree /sys/fs/resctrl/info
/sys/fs/resctrl/info
├── L3
│ ├── bit_usage
│ ├── cbm_mask
│ ├── min_cbm_bits
│ ├── num_closids
│ └── shareable_bits
├── L3_MON
│ ├── max_threshold_occupancy
│ ├── mbm_poll_threshold
│ ├── mon_features
│ └── num_rmids
├── last_cmd_status
└── MB
├── bandwidth_gran
├── delay_linear
├── min_bandwidth
└── num_closids
3 directories, 14 files
$ cat /sys/fs/resctrl/schemata
MB: 0=0;1=0
L3: 0=fff;1=fff
Thanks for pushing me to search for this solution to make things
more compatible.
-Tony
Powered by blists - more mailing lists