linux-kernel - Re: [PATCH 00/12] Cqm2: Intel Cache quality monitoring fixes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.10.1701201232060.3301@vshiva-Udesk>
Date:   Fri, 20 Jan 2017 12:40:45 -0800 (PST)
From:   Shivappa Vikas <vikas.shivappa@...el.com>
To:     David Carrillo-Cisneros <davidcc@...gle.com>
cc:     Vikas Shivappa <vikas.shivappa@...ux.intel.com>,
        Vikas Shivappa <vikas.shivappa@...el.com>,
        Stephane Eranian <eranian@...gle.com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        x86 <x86@...nel.org>, hpa@...or.com,
        Ingo Molnar <mingo@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        "Shankar, Ravi V" <ravi.v.shankar@...el.com>,
        "Luck, Tony" <tony.luck@...el.com>,
        Fenghua Yu <fenghua.yu@...el.com>, andi.kleen@...el.com,
        "H. Peter Anvin" <h.peter.anvin@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH 00/12] Cqm2: Intel Cache quality monitoring fixes



On Thu, 19 Jan 2017, David Carrillo-Cisneros wrote:

> On Thu, Jan 19, 2017 at 6:32 PM, Vikas Shivappa
> <vikas.shivappa@...ux.intel.com> wrote:
>> Resending including Thomas , also with some changes. Sorry for the spam
>>
>> Based on Thomas and Peterz feedback Can think of two design
>> variants which target:
>>
>> -Support monitoring and allocating using the same resctrl group.
>> user can use a resctrl group to allocate resources and also monitor
>> them (with respect to tasks or cpu)
>>
>> -Also allows monitoring outside of resctrl so that user can
>> monitor subgroups who use the same closid. This mode can be used
>> when user wants to monitor more than just the resctrl groups.
>>
>> The first design version uses and modifies perf_cgroup, second version
>> builds a new interface resmon.
>
> The second version would require to build a whole new set of tools,
> deploy them and maintain them. Users will have to run perf for certain
> events and resmon (or whatever is named the new tool) for rdt. I see
> it as too complex and much prefer to keep using perf.

This was so that we have the flexibility to align the tools as per the 
requirement of the feature rather than twisting the perf behaviour and also have 
that flexibility for future when new RDT features are added (something 
similar to what we did by introducing resctrl groups instead of using cgroups 
for CAT)

Sometimes thats a lot simpler as we dont need a lot code given the 
limited/specific syscalls we need to support. Just like the resctrl fs which is 
specific to RDT.

It looks like your requirement is to be able to monitor a group of tasks 
independently apart from the resctrl groups?

The task option should provide that flexibility to monitor a bunch of tasks 
independently apart from whether they are part of resctrl group or not. The 
assignment of RMID is contolled underneat by the kernel so we can optimize the 
usage of RMIDs and also RMIDs are tied to this group of tasks whether its a 
subset of resctrl group or not.

>
>> The first version is close to the patches
>> sent with some additions/changes. This includes details of the design as
>> per Thomas/Peterz feedback.
>>
>> 1> First Design option: without modifying the resctrl and using perf
>> --------------------------------------------------------------------
>> --------------------------------------------------------------------
>>
>> In this design everything in resctrl interface works like
>> before (the info, resource group files like task schemata all remain the
>> same)
>>
>>
>> Monitor cqm using perf
>> ----------------------
>>
>> perf can monitor individual tasks using the -t
>> option just like before.
>>
>> # perf stat -e llc_occupancy -t PID1,PID2
>>
>> user can monitor the cpu occupancy using the -C option in perf:
>>
>> # perf stat -e llc_occupancy -C 5
>>
>> Below shows how user can monitor cgroup occupancy:
>>
>> # mount -t cgroup -o perf_event perf_event /sys/fs/cgroup/perf_event/
>> # mkdir /sys/fs/cgroup/perf_event/g1
>> # mkdir /sys/fs/cgroup/perf_event/g2
>> # echo PID1 > /sys/fs/cgroup/perf_event/g2/tasks
>>
>> # perf stat -e intel_cqm/llc_occupancy/ -a -G g2
>>
>> To monitor a resctrl group, user can group the same tasks in resctrl
>> group into the cgroup.
>>
>> To monitor the tasks in p1 in example 2 below, add the tasks in resctrl
>> group p1 to cgroup g1
>>
>> # echo 5678 > /sys/fs/cgroup/perf_event/g1/tasks
>>
>> Introducing a new option for resctrl may complicate monitoring because
>> supporting cgroup 'task groups' and resctrl 'task groups' leads to
>> situations where:
>> if the groups intersect, then there is no way to know what
>> l3_allocations contribute to which group.
>>
>> ex:
>> p1 has tasks t1, t2, t3
>> g1 has tasks t2, t3, t4
>>
>> The only way to get occupancy for g1 and p1 would be to allocate an RMID
>> for each task which can as well be done with the -t option.
>
> That's simply recreating the resctrl group as a cgroup.
>
> I think that the main advantage of doing allocation first is that we
> could use the context switch in rdt allocation and greatly simplify
> the pmu side of it.
>
> If resctrl groups could lift the restriction of one resctl per CLOSID,
> then the user can create many resctrl in the way perf cgroups are
> created now. The advantage is that there wont be cgroup hierarchy!
> making things much simpler. Also no need to optimize perf event
> context switch to make llc_occupancy work.
>
> Then we only need a way to express that monitoring must happen in a
> resctl to the perf_event_open syscall.
>
> My first thought is to have a "rdt_monitor" file per resctl group. A
> user passes it to perf_event_open in the way cgroups are passed now.
> We could extend the meaning of the flag PERF_FLAG_PID_CGROUP to also
> cover rdt_monitor files. The syscall can figure if it's a cgroup or a
> rdt_group. The rdt_monitoring PMU would only work with rdt_monitor
> groups
>
> Then the rdm_monitoring PMU will be pretty dumb, having neither task
> nor CPU contexts. Just providing the pmu->read and pmu->event_init
> functions.
>
> Task monitoring can be done with resctrl as well by adding the PID to
> a new resctl and opening the event on it. And, since we'd allow CLOSID
> to be shared between resctrl groups, allocation wouldn't break.

It looks like we are trying to create a MONGRP and CTRLGRP like Thomas mentions.

Although resctrl group now does not have a hierarchy a task can be part of 
only one group - breaking this is just equivalent to having a seperate resmon 
group which may group the tasks independent of how they are grouped in the 
resctrl group?

That can be achieved as well with the option to monitor at task granularity ? 
that means if we support task option and the option to monitor resctrl groups we 
obtain the same functionality.