linux-kernel - Re: [PATCH 00/12] Cqm2: Intel Cache quality monitoring fixes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.10.1702071159550.3301@vshiva-Udesk>
Date:   Tue, 7 Feb 2017 12:10:58 -0800 (PST)
From:   Shivappa Vikas <vikas.shivappa@...el.com>
To:     Stephane Eranian <eranian@...gle.com>
cc:     "Luck, Tony" <tony.luck@...el.com>,
        David Carrillo-Cisneros <davidcc@...gle.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Vikas Shivappa <vikas.shivappa@...ux.intel.com>,
        "Shivappa, Vikas" <vikas.shivappa@...el.com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        x86 <x86@...nel.org>, "hpa@...or.com" <hpa@...or.com>,
        Ingo Molnar <mingo@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        "Shankar, Ravi V" <ravi.v.shankar@...el.com>,
        "Yu, Fenghua" <fenghua.yu@...el.com>,
        "Kleen, Andi" <andi.kleen@...el.com>,
        "Anvin, H Peter" <h.peter.anvin@...el.com>
Subject: Re: [PATCH 00/12] Cqm2: Intel Cache quality monitoring fixes



On Tue, 7 Feb 2017, Stephane Eranian wrote:

> Hi,
>
> I wanted to take a few steps back and look at the overall goals for
> cache monitoring.
> From the various threads and discussion, my understanding is as follows.
>
> I think the design must ensure that the following usage models can be monitored:
>   - the allocations in your CAT partitions
>   - the allocations from a task (inclusive of children tasks)
>   - the allocations from a group of tasks (inclusive of children tasks)
>   - the allocations from a CPU
>   - the allocations from a group of CPUs
>
> All cases but first one (CAT) are natural usage. So I want to describe
> the CAT in more details.
> The goal, as I understand it, it to monitor what is going on inside
> the CAT partition to detect
> whether it saturates or if it has room to "breathe". Let's take a
> simple example.
>
> Suppose, we have a CAT group, cat1:
>
> cat1: 20MB partition (CLOSID1)
>    CPUs=CPU0,CPU1
>    TASKs=PID20
>
> There can only be one CLOSID active on a CPU at a time. The kernel
> chooses to prioritize tasks over CPU when enforcing cases with multiple
> CLOSIDs.
>
> Let's review how this works for cat1 and for each scenario look at how
> the kernel enforces or not the cache partition:
>
> 1. ENFORCED: PIDx with no CLOSID runs on CPU0 or CPU1
> 2. NOT ENFORCED: PIDx with CLOSIDx (x!=1) runs on CPU0, CPU1
> 3. ENFORCED: PID20 runs with CLOSID1 on CPU0, CPU1
> 4. ENFORCED: PID20 runs with CLOSID1 on CPUx (x!=0,1) with CPU CLOSIDx (x!=1)
> 5. ENFORCED: PID20 runs with CLOSID1 on CPUx (x!=0,1) with no CLOSID
>
> Now, let's review how we could track the allocations done in cat1 using a single
> RMID. There can only be one RMID active at a time per CPU. The kernel
> chooses to prioritize tasks over CPU:
>
> cat1: 20MB partition (CLOSID1, RMID1)
>    CPUs=CPU0,CPU1
>    TASKs=PID20
>
> 1. MONITORED: PIDx with no RMID runs on CPU0 or CPU1
> 2. NOT MONITORED: PIDx with RMIDx (x!=1) runs on CPU0, CPU1
> 3. MONITORED: PID20 with RMID1 runs on CPU0, CPU1
> 4. MONITORED: PID20 with RMD1 runs on CPUx (x!=0,1) with CPU RMIDx (x!=1)
> 5. MONITORED: PID20 runs with RMID1 on CPUx (x!=0,1) with no RMID
>
> To make sense to a user, the cases where the hardware monitors MUST be
> the same as the cases where the hardware enforces the cache
> partitioning.
>
> Here we see that it works using a single RMID.
>
> However doing so limits certain monitoring modes where a user might want to
> get a breakdown per CPU of the allocations, such as with:
>  $ perf stat -a -A -e llc_occupancy -R cat1
> (where -R points to the monitoring group in rsrcfs). Here this mode would not be
> possible because the two CPUs in the group share the same RMID.

In the requirements here https://marc.info/?l=linux-kernel&m=148597969808732

8)      Can get measurements for subsets of tasks in a CAT group (to find the 
guys hogging the resources).

This should also applies to the subsets of cpus.

That would let you monitor on CPUs that is a subset or different from a CAT 
group.  That should let you create mon groups like in the second example you 
mention along with the control groups above.

mon0: RMID0
     CPUs=CPU0

mon1: RMID1
     CPUs=CPU1

mon2: RMID2
     CPUs=CPU2

...


>
> Now let's take another scenario, and suppose you have two monitoring groups
> as follows:
>
> mon1: RMID1
>    CPUs=CPU0,CPU1
> mon2: RMID2
>    TASKS=PID20
>
> If PID20 runs on CP0, then RMID2 is activated, and thus allocations
> done by PID20 are not counted towards RMID1. There is a blind spot.
>
> Whether or not this is a problem depends on the semantic exported by
> the interface for CPU mode:
>   1-Count all allocations from any tasks running on CPU
>   2-Count all allocations from tasks which are NOT monitoring themselves
>
> If the kernel choses 1, then there is a blind spot and the measurement
> is not as accurate as it could be because of the decision to use only one RDMID.
> But if the kernel choses 2, then everything works fine with a single RMID.
>
> If the kernel treats occupancy monitoring as measuring cycles on a CPU, i.e.,
> measure any activity from any thread (choice 1), then the single RMID per group
> does not work.
>
> If the kernel treats occupancy monitoring as measuring cycles in a cgroup on a
> CPU, i.e., measures only when threads of the cgroup run on that CPU, then using
> a single RMID per group works.
>

Agree there are blind spots in both. But the requirements is trying to be based 
on the resctrl allocation as Thomas suggested.
Which is aligned to monitoring real time tasks as i understand.
for the above example, some tasks which donot have an RMID(say in the root 
group) are the real time tasks that are specially configured to running on a cpux which need to be 
allocated or monitored.


> Hope this helps clarifies the usage model and design choices.
>