linux-kernel - Re: [PATCH 00/12] Cqm2: Intel Cache quality monitoring fixes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALcN6mjaO0Hnuiu-98JzMBUytRiOCODMvm-uLFOrO=gO4PS9BQ@mail.gmail.com>
Date:   Thu, 19 Jan 2017 23:37:28 -0800
From:   David Carrillo-Cisneros <davidcc@...gle.com>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     Shivappa Vikas <vikas.shivappa@...el.com>,
        Vikas Shivappa <vikas.shivappa@...ux.intel.com>,
        Stephane Eranian <eranian@...gle.com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        x86 <x86@...nel.org>, hpa@...or.com,
        Ingo Molnar <mingo@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        "Shankar, Ravi V" <ravi.v.shankar@...el.com>,
        "Luck, Tony" <tony.luck@...el.com>,
        Fenghua Yu <fenghua.yu@...el.com>, andi.kleen@...el.com,
        "H. Peter Anvin" <h.peter.anvin@...el.com>
Subject: Re: [PATCH 00/12] Cqm2: Intel Cache quality monitoring fixes

On Thu, Jan 19, 2017 at 9:41 AM, Thomas Gleixner <tglx@...utronix.de> wrote:
> On Wed, 18 Jan 2017, David Carrillo-Cisneros wrote:
>> On Wed, Jan 18, 2017 at 12:53 AM, Thomas Gleixner <tglx@...utronix.de> wrote:
>> There are use cases where the RMID to CLOSID mapping is not that simple.
>> Some of them are:
>>
>> 1. Fine-tuning of cache allocation. We may want to have a CLOSID for a thread
>> during phases that initialize relevant data, while changing it to another during
>> phases that pollute cache. Yet, we want the RMID to remain the same.
>
> That's fine. I did not say that you need fixed RMD <-> CLOSID mappings. The
> point is that monitoring across different CLOSID domains is pointless.
>
> I have no idea how you want to do that with the proposed implementation to
> switch the RMID of the thread on the fly, but that's a different story.
>
>> A different variation is to change CLOSID to increase/decrease the size of the
>> allocated cache when high/low contention is detected.
>>
>> 2. Contention detection. I start with:
>>    - T1 has RMID 1.
>>    - T1 changes RMID to 2.
>>  will expect llc_occupancy(1) to decrease while llc_occupancy(2) increases.
>
> Of course does RMID1 decrease because it's not longer in use. Oh well.
>
>> The rate of change will be relative to the level of cache contention present
>> at the time. This all happens without changing the CLOSID.
>
> See above.
>
>> >
>> > So when I monitor CPU4, i.e. CLOSID 1 and T1 runs on CPU4, then I do not
>> > care at all about the occupancy of T1 simply because that is running on a
>> > seperate reservation.
>>
>> It is not useless for scenarios where CLOSID and RMIDs change dynamically
>> See above.
>
> Above you are talking about the same CLOSID and different RMIDS and not
> about changing both.

The scenario I talked about implies changing CLOSID without affecting
monitoring.
It happens when the allocation needs for a thread/cgroup/CPU change
dynamically. Forcing to change the RMID together with the CLOSID would
give wrong monitoring values unless the old RMID is kept around until
becomes free, which is ugly and would waste a RMID.

>
>> > Trying to make that an aggregated value in the first
>> > place is completely wrong. If you want an aggregate, which is pretty much
>> > useless, then user space tools can generate it easily.
>>
>> Not useless, see above.
>
> It is prettey useless, because CPU4 has CLOSID1 while T1 has CLOSID4 and
> making an aggregate over those two has absolutely nothing to do with your
> scenario above.

That's true. It is useless in the case you mentioned. I erroneously
interpreted the "useless" in your comment as a general statement about
aggregating RMID occupancies.

>
> If you want the aggregate value, then create it in user space and oracle
> (or should I say google) out of it whatever you want, but do not impose
> that to the kernel.
>
>> Having user space tools to aggregate implies wasting some of the already
>> scarce RMIDs.
>
> Oh well. Can you please explain how you want to monitor the scenario I
> explained above:
>
> CPU4      CLOSID 1
> T1        CLOSID 4
>
> So if T1 runs on CPU4 then it uses CLOSID 4 which does not at all affect
> the cache occupancy of CLOSID 1. So if you use the same RMID then you
> pollute either the information of CPU4 (CLOSID1) or the information of T1
> (CLOSID4)
>
> To gather any useful information for both CPU1 and T1 you need TWO
> RMIDs. Everything else is voodoo and crystal ball analysis and we are not
> going to support that.
>

Correct. Yet, having two RMIDs to monitor the same task/cgroup/CPU
just because the CLOSID changed is wasteful.

>> > The whole approach you and David have taken is to whack some desired cgroup
>> > functionality and whatever into CQM without rethinking the overall
>> > design. And that's fundamentaly broken because it does not take cache (and
>> > memory bandwidth) allocation into account.
>>
>> Monitoring and allocation are closely related yet independent.
>
> Independent to some degree. Sure you can claim they are completely
> independent, but lots of the resulting combinations make absolutely no
> sense at all. And we really don't want to support non-sensical measurements
> just because we can. The outcome of this is complexity, inaccuracy and code
> which is too horrible to look at.
>
>> I see the advantages of allowing a per-cpu RMID as you describe in the example.
>>
>> Yet, RMIDs and CLOSIDs should remain independent to allow use cases beyond
>> one simply monitoring occupancy per allocation.
>
> I agree there are use cases where you want to monitor across allocations,
> like monitoring a task which has no CLOSID assigned and runs on different
> CPUs and therefor potentially on different CLOSIDs which are assigned to
> the different CPUs.
>
> That's fine and you want a seperate RMID for this.
>
> But once you have a fixed CLOSID association then reusing and aggregating
> across CLOSID domains is more than useless.
>

Correct. But there may not be a fixed CLOSID association if loads
exhibit dynamic behavior and/or system load changes dynamically.

>> > I seriously doubt, that the existing CQM/MBM code can be refactored in any
>> > useful way. As Peter Zijlstra said before: Remove the existing cruft
>> > completely and start with completely new design from scratch.
>> >
>> > And this new design should start from the allocation angle and then add the
>> > whole other muck on top so far its possible. Allocation related monitoring
>> > must be the primary focus, everything else is just tinkering.
>>
>> Assuming that my stated need for more than one RMID per CLOSID or more
>> than one CLOSID per RMID is recognized, what would be the advantage of
>> starting the design of monitoring from the allocation perspective?
>>
>> It's quite doable to create a new version of CQM/CMT without all the
>> cgroup murk.
>>
>> We can also create an easy way to open events to monitor CLOSIDs. Yet, I
>> don't see the advantage of dissociating monitoring from perf and directly
>> building in on top of allocation without the assumption of 1 CLOSID : 1
>> RMID.
>
> I did not say that you need to remove it from perf. perf is still going to
> be the interface to interact with monitoring, but it needs to be done in a
> way which makes sense. The current cgroup focussed proposal which is
> completely oblivious of the allocation mechanism does not make any sense to
> me at all.
>
> Starting the design from the allocation POV makes a lot of sense because
> that's the point where you start to make the decisions about useful and
> useless monitoring choices. And limiting the choices is the best way to
> limit the RMID exhaustion in the first place.

Thanks for the extra explanation.
David

>
> Thanks,
>
>         tglx
>
>