linux-kernel - Re: [PATCH 1/6] x86/intel_rdt/mba_sc: Add documentation for MBA software controller

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Wed, 4 Apr 2018 11:11:10 +0200 (CEST)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Shivappa Vikas <vikas.shivappa@...el.com>
cc:     Vikas Shivappa <vikas.shivappa@...ux.intel.com>,
        tony.luck@...el.com, ravi.v.shankar@...el.com,
        fenghua.yu@...el.com, sai.praneeth.prakhya@...el.com,
        x86@...nel.org, hpa@...or.com, linux-kernel@...r.kernel.org,
        ak@...ux.intel.com
Subject: Re: [PATCH 1/6] x86/intel_rdt/mba_sc: Add documentation for MBA
 software controller

On Tue, 3 Apr 2018, Shivappa Vikas wrote:
> On Tue, 3 Apr 2018, Thomas Gleixner wrote:
> > On Thu, 29 Mar 2018, Vikas Shivappa wrote:
> > The L2 external bandwidth is higher than the L3 external bandwidth.
> > 
> >  Is there any information available from CPUID or whatever source which
> >  allows us to retrieve the bandwidth ratio or the absolute maximum
> >  bandwidth per level?
> 
> There is no information in cpuid on the bandwidth available. Also we have seen
> from our experiments that the increase is not perfectly linear (delta
> bandwidth increase from 30% to 40% may not be same as 70% to 80%). So we
> currently dynamically caliberate this delta for the software controller.

I assume you mean: calibrate

Though I don't see anything which looks remotely like calibration.
Calibration means that you determine the exact parameters by observation and
then can use the calibrated values afterwards. But that's not what you are
doing. So please don't claim its calibration.

You observe behaviour which depends on the workload and other
factors. That's not calibration. If you change the MSR by a granularity
value then you calculate the bandwidth delta vs. the previous MSR
value. That only makes sense and works when the application is having the
same memory access patterns accross both observation periods.

And of course, this won't be necessarily linear because if you throttle the
application then it gets less work done per CPU time slice and the
resulting stalls will also have side effects on the requested amount of
memory and therefore distort the measurement. Ditto the other way
around.

There are too many factors influencing this, so claiming that it's
calibration is window dressing at best. Even worse it suggests that it's
something accurate, which subverts your goal of reducing confusion.

Adaptive control might be an acceptable description, though given the
amount of factors which play into that it's still an euphemism for
'heuristic'.

> > What's also missing from your explanation is how that feedback loop behaves
> > under different workloads.
> > 
> >  Is this assuming that the involved threads/cpus actually try to utilize
> >  the bandwidth completely?
> 
> No, the feedback loop only guarentees that the usage will not exceed what the
> user specifies as max bandwidth. If it is using below the max value it does
> not matter how much less it is using.
> > 
> >  What happens if the threads/cpus are only using a small set because they
> >  are idle or their computations are mostly cache local and do not need
> >  external bandwidth? Looking at the implementation I don't see how that is
> >  taken into account.
> 
> The feedback only kicks into action if a rdtgroup uses more bandwidth than the
> max specified by the user. I specified that it is always "ensure the "actual
> b/w
> 354 < user b/w" " and can add more explanation on these scenarios.

Please finally stop to use this horrible 'b/w' thingy. It makes my eyes bleed
everytime.

> Also note that we are using the MBM counters for this feedback loop. Now that
> the interface is much more useful because we have the same rdtgroup that is
> being monitored and controlled. (vs. if we had the perf mbm the group of
> threads in resctrl mba and in mbm could be different and would be hard to
> measure what the threads/cpus in the resctrl are using).

Why does that make me smile?

Thanks,

	tglx