[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140106111624.GB5623@twins.programming.kicks-ass.net>
Date: Mon, 6 Jan 2014 12:16:24 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: "Waskiewicz Jr, Peter P" <peter.p.waskiewicz.jr@...el.com>
Cc: Tejun Heo <tj@...nel.org>, Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>, Li Zefan <lizefan@...wei.com>,
"containers@...ts.linux-foundation.org"
<containers@...ts.linux-foundation.org>,
"cgroups@...r.kernel.org" <cgroups@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 0/4] x86: Add Cache QoS Monitoring (CQM) support
On Sun, Jan 05, 2014 at 05:23:07AM +0000, Waskiewicz Jr, Peter P wrote:
> The processor doesn't need to understand the grouping at all, but it
> also isn't tracking things per-process that are rolled up later.
> They're tracked via the RMID resource in the hardware, which could
> correspond to a single process, or 500 processes. It really comes down
> to the ease of management of grouping tasks in groups for two consumers,
> 1) the end user, and 2) the process scheduler.
>
> I think I still may not be explaining how the CPU side works well
> enough, in order to better understand what I'm trying to do with the
> cgroup. Let me try to be a bit more clear, and if I'm still sounding
> vague or not making sense, please tell me what isn't clear and I'll try
> to be more specific. The new Documentation addition in patch 4 also has
> a good overview, but let's try this:
>
> A CPU may have 32 RMID's in hardware. This is for the platform, not per
> core. I may want to have a single process assigned to an RMID for
> tracking, say qemu to monitor cache usage of a specific VM. But I also
> may want to monitor cache usage of all MySQL database processes with
> another RMID, or even split specific processes of that database between
> different RMID's. It all comes down to how the end-user wants to
> monitor their specific workloads, and how those workloads are impacting
> cache usage and occupancy.
>
> With this implementation I've sent, all tasks are in RMID 0 by default.
> Then one can create a subdirectory, just like the cpuacct cgroup, and
> then add tasks to that subdirectory's task list. Once that
> subdirectory's task list is enabled (through the cacheqos.monitor_cache
> handle), then a free RMID is assigned from the CPU, and when the
> scheduler switches to any of the tasks in that cgroup under that RMID,
> the RMID begins monitoring the usage.
>
> The CPU side is easy and clean. When something in the software wants to
> monitor when a particular task is scheduled and started, write whatever
> RMID that task is assigned to (through some mechanism) to the proper MSR
> in the CPU. When that task is swapped out, clear the MSR to stop
> monitoring of that RMID. When that RMID's statistics are requested by
> the software (through some mechanism), then the CPU's MSRs are written
> with the RMID in question, and the value is read of what has been
> collected so far. In my case, I decided to use a cgroup for this
> "mechanism" since so much of the grouping and task/group association
> already exists and doesn't need to be rebuilt or re-invented.
This still doesn't explain why you can't use perf-cgroup for this.
> > In general, I'm quite strongly opposed against using cgroup as
> > arbitrary grouping mechanism for anything other than resource control,
> > especially given that we're moving away from multiple hierarchies.
>
> Just to clarify then, would the mechanism in the cpuacct cgroup to
> create a group off the root subsystem be considered multi-hierarchical?
> If not, then the intent for this new cacheqos subsystem is to be
> identical in that regard to cpuacct in the behavior.
>
> This is a resource controller, it just happens to be tied to a hardware
> resource instead of an OS resource.
No, cpuacct and perf-cgroup aren't actually controllers at all. They're
resource monitors at best. Same with your Cache QoS Monitor, it doesn't
control anything.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists