[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.20.1703132008440.3712@nanos>
Date: Mon, 13 Mar 2017 20:10:35 +0100 (CET)
From: Thomas Gleixner <tglx@...utronix.de>
To: David Carrillo-Cisneros <davidcc@...gle.com>
cc: Stephane Eranian <eranian@...gle.com>,
"Luck, Tony" <tony.luck@...el.com>,
Vikas Shivappa <vikas.shivappa@...ux.intel.com>,
"Shivappa, Vikas" <vikas.shivappa@...el.com>,
"x86@...nel.org" <x86@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"hpa@...or.com" <hpa@...or.com>,
"mingo@...nel.org" <mingo@...nel.org>,
"peterz@...radead.org" <peterz@...radead.org>,
"Shankar, Ravi V" <ravi.v.shankar@...el.com>,
"Yu, Fenghua" <fenghua.yu@...el.com>,
"Kleen, Andi" <andi.kleen@...el.com>
Subject: Re: [PATCH 1/1] x86/cqm: Cqm requirements
On Fri, 10 Mar 2017, David Carrillo-Cisneros wrote:
> > Fine. So we need this for ONE particular use case. And if that is not well
> > documented including the underlying mechanics to analyze the data then this
> > will be a nice source of confusion for Joe User.
> >
> > I still think that this can be done differently while keeping the overhead
> > small.
> >
> > You look at this from the existing perf mechanics which require high
> > overhead context switching machinery. But that's just wrong because that's
> > not how the cache and bandwidth monitoring works.
> >
> > Contrary to the other perf counters, CQM and MBM are based on a context
> > selectable set of counters which do not require readout and reconfiguration
> > when the switch happens.
> >
> > Especially with CAT in play, the context switch overhead is there already
> > when CAT partitions need to be switched. So switching the RMID at the same
> > time is basically free, if we are smart enough to do an equivalent to the
> > CLOSID context switch mechanics and ideally combine both into a single MSR
> > write.
> >
> > With that the low overhead periodic sampling can read N counters which are
> > related to the monitored set and provide N separate results. For bandwidth
> > the aggregation is a simple ADD and for cache residency it's pointless.
> >
> > Just because perf was designed with the regular performance counters in
> > mind (way before that CQM/MBM stuff came around) does not mean that we
> > cannot change/extend that if it makes sense.
> >
> > And looking at the way Cache/Bandwidth allocation and monitoring works, it
> > makes a lot of sense. Definitely more than shoving it into the current mode
> > of operandi with duct tape just because we can.
> >
>
> You made a point. The use case I described can be better served with
> the low overhead monitoring groups that Fenghua is working on. Then
> that info can be merged with the per-CPU profile collected for non-RDT
> events.
>
> I am ok removing the perf-like CPU filtering from the requirements.
So if I'm not missing something then ALL remaining requirements can be
solved with the RDT integrated monitoring mechanics, right?
Thanks,
tglx
Powered by blists - more mailing lists