[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALcN6mjpuiy1mMn2KG33ororsf7e68AZYTmSjtpD9b-pitsY1g@mail.gmail.com>
Date: Thu, 9 Mar 2017 10:05:36 -0800
From: David Carrillo-Cisneros <davidcc@...gle.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: Stephane Eranian <eranian@...gle.com>,
"Luck, Tony" <tony.luck@...el.com>,
Vikas Shivappa <vikas.shivappa@...ux.intel.com>,
"Shivappa, Vikas" <vikas.shivappa@...el.com>,
"x86@...nel.org" <x86@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"hpa@...or.com" <hpa@...or.com>,
"mingo@...nel.org" <mingo@...nel.org>,
"peterz@...radead.org" <peterz@...radead.org>,
"Shankar, Ravi V" <ravi.v.shankar@...el.com>,
"Yu, Fenghua" <fenghua.yu@...el.com>,
"Kleen, Andi" <andi.kleen@...el.com>
Subject: Re: [PATCH 1/1] x86/cqm: Cqm requirements
On Thu, Mar 9, 2017 at 3:01 AM, Thomas Gleixner <tglx@...utronix.de> wrote:
> On Wed, 8 Mar 2017, David Carrillo-Cisneros wrote:
>> On Wed, Mar 8, 2017 at 12:30 AM, Thomas Gleixner <tglx@...utronix.de> wrote:
>> > Same applies for per CPU measurements.
>>
>> For CPU measurements. We need perf-like CPU filtering to support tools
>> that perform low overhead monitoring by polling CPU events. These
>> tools approximate per-cgroup/task events by reconciling CPU events
>> with logs of what job run when in what CPU.
>
> Sorry, but for CQM that's just voodoo analysis.
I'll argue that. Yet, perf-like CPU is also needed for MBM, a less
contentious scenario, I believe.
>
> CPU default is CAT group 0 (20% of cache)
> T1 belongs to CAT group 1 (40% of cache)
> T2 belongs to CAT group 2 (40% of cache)
>
> Now you do low overhead samples of the CPU (all groups accounted) with 1
> second period.
>
> Lets assume that T1 runs 50% and T2 runs 20% the rest of the time is
> utilized by random other things and the kernel itself (using CAT group 0).
>
> What is the accumulated value telling you?
In this single example not much, only the sum of occupancies. But
assume I have T1...T10000 different jobs, and I randomly select a pair
of those jobs to run together in a machine, (they become the T1 and T2
in your example). Then I repeat that hundreds of thousands of times.
I can collect all data with (tasks run, time run, occupancy) and build
a simple regression to estimate the expected occupancy (and some
confidence interval). That inaccurate but approximate value is very
useful to feed into a job scheduler. Furthermore, it can be correlated
with values of other events that are currently sampled this way.
>
> How do you approximate that back to T1/T2 and the rest?
Described above for large numbers and random samples. More
sophisticated (voodo?) statistic techniques are employed in practice
to account for almost all issues I could think of (selection bias,
missing values, interaction between tasks, etc). They seem to work
fine.
>
> How do you do that when the tasks are switching between the samples several
> times?
It does not work well for a single run (your example). But for the
example I gave, one can just rely on Random Sampling, Law of Large
Numbers, and Central Limit Theorem.
Thanks,
David
Powered by blists - more mailing lists