lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZUqo+MsEQi2Xc/pO@agluck-desk3>
Date:   Tue, 7 Nov 2023 13:15:36 -0800
From:   Tony Luck <tony.luck@...el.com>
To:     Reinette Chatre <reinette.chatre@...el.com>
Cc:     Fenghua Yu <fenghua.yu@...el.com>,
        Peter Newman <peternewman@...gle.com>,
        Jonathan Corbet <corbet@....net>,
        Shuah Khan <skhan@...uxfoundation.org>, x86@...nel.org,
        Shaopeng Tan <tan.shaopeng@...itsu.com>,
        James Morse <james.morse@....com>,
        Jamie Iles <quic_jiles@...cinc.com>,
        Babu Moger <babu.moger@....com>,
        Randy Dunlap <rdunlap@...radead.org>,
        linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org,
        patches@...ts.linux.dev
Subject: Re: [PATCH v3] x86/resctrl: mba_MBps: Fall back to total b/w if
 local b/w unavailable

On Fri, Nov 03, 2023 at 02:43:15PM -0700, Reinette Chatre wrote:
> Hi Tony,
> 
> On 10/26/2023 1:02 PM, Tony Luck wrote:
> > On Intel the various resource director technology (RDT) features are all
> > orthogonal and independently enumerated. Thus it is possible to have
> > a system that  provides "total" memory bandwidth measurements without
> > providing "local" bandwidth measurements.
> 
> This motivation is written in support of Intel systems but from what I
> can tell the changes impact Intel as well as AMD.

If AMD were to build a system that did this, same fixes would be needed.

> > 
> > If local bandwidth measurement is not available, do not give up on
> > providing the "mba_MBps" feedback option completely, make the code fall
> > back to using total bandwidth.
> 
> It is interesting to me that the "fall back" is essentially a drop-in
> replacement without any adjustments to the data/algorithm.

The algorithm is, by necessity, very simple. Essentially "if measured
bandwidth is above desired target, apply one step extra throttling.
Reverse when bandwidth is below desired level." I'm not sure what tweaks
are possible.

> Can these measurements be considered equivalent? Could a user now perhaps
> want to experiment by disabling local bandwidth measurement to explore if
> system behaves differently when using total memory bandwidth? What
> would have a user choose one over the other (apart from when user
> is forced by system ability)?

This may be interesting. I dug around in the e-mail archives to see if
there was any discussion on why "local" was picked as the feedback
measurement rather that "total". But I couldn't find anything.

Thinking about it now, "total" feels like a better choice. Why would
you not care about off-package memory bandwidth? In pathological cases
all the memory traffic might be going off package, but the existing
mba_MBps algorithm would *reduce* the amount of throttling, eventually
to zero.

Maybe additional an mount option "mba_MBps_total" so the user can pick
total instead of local?

> > 
> > Signed-off-by: Tony Luck <tony.luck@...el.com>
> > ---
> > Change since v2:
> > 
> > Babu doesn't like the global variable. So here's a version without it.
> > 
> > Note that my preference is still the v2 version. But as I tell newbies
> > to Linux "Your job isn't to get YOUR patch upstream. You job is to get
> > the problem fixed.".  So taking my own advice I don't really mind
> > whether v2 or v3 is applied.
> > 
> >  arch/x86/kernel/cpu/resctrl/monitor.c  | 43 ++++++++++++++++++--------
> >  arch/x86/kernel/cpu/resctrl/rdtgroup.c |  2 +-
> >  2 files changed, 31 insertions(+), 14 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> > index f136ac046851..29e86310677d 100644
> > --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> > +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> > @@ -418,6 +418,20 @@ static int __mon_event_count(u32 rmid, struct rmid_read *rr)
> >  	return 0;
> >  }
> >  
> > +/*
> > + * For legacy compatibility use the local memory bandwidth to drive
> > + * the mba_MBps feedback control loop. But on platforms that do not
> > + * provide the local event fall back to use the total bandwidth event
> > + * instead.
> > + */
> > +static enum resctrl_event_id pick_mba_mbps_event(void)
> > +{
> > +	if (is_mbm_local_enabled())
> > +		return QOS_L3_MBM_LOCAL_EVENT_ID;
> > +
> > +	return QOS_L3_MBM_TOTAL_EVENT_ID;
> > +}
> 
> Can there be a WARN here to catch the unlikely event that
> !is_mbm_total_enabled()?
> This may mean the caller (in update_mba_bw()) needs to move
> to code protected by is_mbm_enabled().

All this code is under the protection of the check at mount time
done by supports_mba_mbps()

static bool supports_mba_mbps(void)
{
        struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;

        return (is_mbm_enabled() &&
                r->alloc_capable && is_mba_linear());
}

Adding even more run-time checks seems overkill.

> One option to consider is to have a single "get_mba_mbps_state()"
> call (similar to V1) that determines the eventid as above and
> then calls get_mbm_state() to return a pointer to mbm_state in one
> call. Starting to seem like nitpicking but I'd thought I'd mention it
> since it seemed a way to have V1 solution with request to use
> get_mbm_state() addressed.

It doesn't sound any better than the V3 approach.

> > +
> >  /*
> >   * mbm_bw_count() - Update bw count from values previously read by
> >   *		    __mon_event_count().
> > @@ -431,9 +445,11 @@ static int __mon_event_count(u32 rmid, struct rmid_read *rr)
> >   */
> >  static void mbm_bw_count(u32 rmid, struct rmid_read *rr)
> >  {
> > -	struct mbm_state *m = &rr->d->mbm_local[rmid];
> > +	enum resctrl_event_id mba_mbps_evt_id = pick_mba_mbps_event();
> >  	u64 cur_bw, bytes, cur_bytes;
> > +	struct mbm_state *m;
> >  
> > +	m = get_mbm_state(rr->d, rmid, mba_mbps_evt_id);
> >  	cur_bytes = rr->val;
> >  	bytes = cur_bytes - m->prev_bw_bytes;
> >  	m->prev_bw_bytes = cur_bytes;
> 
> It should not be necessary to pick the event id again. It is available
> within the struct rmid_read parameter. 

So it is. I can drop the extra pick_mba_mbps_event() call here.

-Tony

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ