linux-kernel - Re: [PATCH 0/4] x86: Add Cache QoS Monitoring (CQM) support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1388875369.9761.25.camel@ppwaskie-mobl.amr.corp.intel.com>
Date:	Sat, 4 Jan 2014 22:43:00 +0000
From:	"Waskiewicz Jr, Peter P" <peter.p.waskiewicz.jr@...el.com>
To:	Tejun Heo <tj@...nel.org>
CC:	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>, Li Zefan <lizefan@...wei.com>,
	"containers@...ts.linux-foundation.org" 
	<containers@...ts.linux-foundation.org>,
	"cgroups@...r.kernel.org" <cgroups@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 0/4] x86: Add Cache QoS Monitoring (CQM) support

On Sat, 2014-01-04 at 11:10 -0500, Tejun Heo wrote:
> Hello,

Hi Tejun,

> On Fri, Jan 03, 2014 at 12:34:41PM -0800, Peter P Waskiewicz Jr wrote:
> > The CPU features themselves are relatively straight-forward, but
> > the presentation of the data is less straight-forward.  Since this
> > tracks cache usage and occupancy per process (by swapping Resource
> > Monitor IDs, or RMIDs, when processes are rescheduled), perf would
> > not be a good fit for this data, which does not report on a
> > per-process level.  Therefore, a new cgroup subsystem, cacheqos, has
> > been added.  This operates very similarly to the cpu and cpuacct
> > cgroup subsystems, where tasks can be grouped into sub-leaves of the
> > root-level cgroup.
> 
> I don't really understand why this is implemented as part of cgroup.
> There doesn't seem to be anything which requires cgroup.  Wouldn't
> just doing it per-process make more sense?  Even grouping would be
> better done along the traditional process hierarchy, no?  And
> per-cgroup accounting can be trivially achieved from userland by just
> accumulating the stats according to the process's cgroup membership.
> What am I missing here?

Thanks for the quick response!  I knew the approach would generate
questions, so let me explain.

The feature I'm enabling in the Xeon processors is fairly simple.  It
has a set of Resource Monitoring ID's (RMIDs), and those are used by the
CPU cores to track the cache usage while any process associated with the
RMID is running.  The more complicated part is how to present the
interface of creating RMID groups and assigning processes to them for
both tracking, and for stat collection.

We discussed (internally) a few different approaches to implement this.
The first natural thought was this is similar to other PMU features, but
this deals with processes and groups of processes, not overall CPU core
or uncore state.  Given the way processes in a cgroup can be grouped
together and treated as single entities, this felt like a natural fit
with the RMID concept.

Simply put, when we want to allocate an RMID for monitoring httpd
traffic, we can create a new child in the subsystem hierarchy, and
assign the httpd processes to it.  Then the RMID can be assigned to the
subsystem, and each process inherits that RMID.  So instead of dealing
with assigning an RMID to each and every process, we can leverage the
existing cgroup mechanisms for grouping processes and their children to
a group, and they inherit the RMID.

Please let me know if this is a better explanation, and gives a better
picture of why we decided to approach the implementation this way.  Also
note that this feature, Cache QoS Monitoring, is the first in a series
of Platform QoS Monitoring features that will be coming.  So this isn't
a one-off feature, so however this first piece gets accepted, we want to
make sure it's easy to expand and not impact userspace tools repeatedly
(if possible).

Cheers,
-PJ Waskiewicz

--------------
Intel Open Source Technology Center