lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABPqkBQW80CFY7PLjDO_EKRrr0TA+tu3zwoSU7tnL7DgdwV+Wg@mail.gmail.com>
Date:   Tue, 7 Feb 2017 00:08:09 -0800
From:   Stephane Eranian <eranian@...gle.com>
To:     "Luck, Tony" <tony.luck@...el.com>
Cc:     David Carrillo-Cisneros <davidcc@...gle.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Vikas Shivappa <vikas.shivappa@...ux.intel.com>,
        "Shivappa, Vikas" <vikas.shivappa@...el.com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        x86 <x86@...nel.org>, "hpa@...or.com" <hpa@...or.com>,
        Ingo Molnar <mingo@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        "Shankar, Ravi V" <ravi.v.shankar@...el.com>,
        "Yu, Fenghua" <fenghua.yu@...el.com>,
        "Kleen, Andi" <andi.kleen@...el.com>,
        "Anvin, H Peter" <h.peter.anvin@...el.com>
Subject: Re: [PATCH 00/12] Cqm2: Intel Cache quality monitoring fixes

Hi,

I wanted to take a few steps back and look at the overall goals for
cache monitoring.
>From the various threads and discussion, my understanding is as follows.

I think the design must ensure that the following usage models can be monitored:
   - the allocations in your CAT partitions
   - the allocations from a task (inclusive of children tasks)
   - the allocations from a group of tasks (inclusive of children tasks)
   - the allocations from a CPU
   - the allocations from a group of CPUs

All cases but first one (CAT) are natural usage. So I want to describe
the CAT in more details.
The goal, as I understand it, it to monitor what is going on inside
the CAT partition to detect
whether it saturates or if it has room to "breathe". Let's take a
simple example.

Suppose, we have a CAT group, cat1:

cat1: 20MB partition (CLOSID1)
    CPUs=CPU0,CPU1
    TASKs=PID20

There can only be one CLOSID active on a CPU at a time. The kernel
chooses to prioritize tasks over CPU when enforcing cases with multiple
CLOSIDs.

Let's review how this works for cat1 and for each scenario look at how
the kernel enforces or not the cache partition:

 1. ENFORCED: PIDx with no CLOSID runs on CPU0 or CPU1
 2. NOT ENFORCED: PIDx with CLOSIDx (x!=1) runs on CPU0, CPU1
 3. ENFORCED: PID20 runs with CLOSID1 on CPU0, CPU1
 4. ENFORCED: PID20 runs with CLOSID1 on CPUx (x!=0,1) with CPU CLOSIDx (x!=1)
 5. ENFORCED: PID20 runs with CLOSID1 on CPUx (x!=0,1) with no CLOSID

Now, let's review how we could track the allocations done in cat1 using a single
RMID. There can only be one RMID active at a time per CPU. The kernel
chooses to prioritize tasks over CPU:

cat1: 20MB partition (CLOSID1, RMID1)
    CPUs=CPU0,CPU1
    TASKs=PID20

 1. MONITORED: PIDx with no RMID runs on CPU0 or CPU1
 2. NOT MONITORED: PIDx with RMIDx (x!=1) runs on CPU0, CPU1
 3. MONITORED: PID20 with RMID1 runs on CPU0, CPU1
 4. MONITORED: PID20 with RMD1 runs on CPUx (x!=0,1) with CPU RMIDx (x!=1)
 5. MONITORED: PID20 runs with RMID1 on CPUx (x!=0,1) with no RMID

To make sense to a user, the cases where the hardware monitors MUST be
the same as the cases where the hardware enforces the cache
partitioning.

Here we see that it works using a single RMID.

However doing so limits certain monitoring modes where a user might want to
get a breakdown per CPU of the allocations, such as with:
  $ perf stat -a -A -e llc_occupancy -R cat1
(where -R points to the monitoring group in rsrcfs). Here this mode would not be
possible because the two CPUs in the group share the same RMID.

Now let's take another scenario, and suppose you have two monitoring groups
as follows:

mon1: RMID1
    CPUs=CPU0,CPU1
mon2: RMID2
    TASKS=PID20

If PID20 runs on CP0, then RMID2 is activated, and thus allocations
done by PID20 are not counted towards RMID1. There is a blind spot.

Whether or not this is a problem depends on the semantic exported by
the interface for CPU mode:
   1-Count all allocations from any tasks running on CPU
   2-Count all allocations from tasks which are NOT monitoring themselves

If the kernel choses 1, then there is a blind spot and the measurement
is not as accurate as it could be because of the decision to use only one RDMID.
But if the kernel choses 2, then everything works fine with a single RMID.

If the kernel treats occupancy monitoring as measuring cycles on a CPU, i.e.,
measure any activity from any thread (choice 1), then the single RMID per group
does not work.

If the kernel treats occupancy monitoring as measuring cycles in a cgroup on a
CPU, i.e., measures only when threads of the cgroup run on that CPU, then using
a single RMID per group works.

Hope this helps clarifies the usage model and design choices.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ