linux-kernel - Re: [PATCH 00/12] Cqm2: Intel Cache quality monitoring fixes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALcN6mhmVfTU3o4L1FuaAXdgtnXwEOW-tP=O_aLFfFGC9dX_Kw@mail.gmail.com>
Date:   Wed, 18 Jan 2017 18:09:24 -0800
From:   David Carrillo-Cisneros <davidcc@...gle.com>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     Shivappa Vikas <vikas.shivappa@...el.com>,
        Vikas Shivappa <vikas.shivappa@...ux.intel.com>,
        Stephane Eranian <eranian@...gle.com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        x86 <x86@...nel.org>, hpa@...or.com,
        Ingo Molnar <mingo@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        "Shankar, Ravi V" <ravi.v.shankar@...el.com>,
        "Luck, Tony" <tony.luck@...el.com>,
        Fenghua Yu <fenghua.yu@...el.com>, andi.kleen@...el.com,
        "H. Peter Anvin" <h.peter.anvin@...el.com>
Subject: Re: [PATCH 00/12] Cqm2: Intel Cache quality monitoring fixes

On Wed, Jan 18, 2017 at 12:53 AM, Thomas Gleixner <tglx@...utronix.de> wrote:
> On Tue, 17 Jan 2017, Shivappa Vikas wrote:
>> On Tue, 17 Jan 2017, Thomas Gleixner wrote:
>> > On Fri, 6 Jan 2017, Vikas Shivappa wrote:
>> > > - Issue(1): Inaccurate data for per package data, systemwide. Just prints
>> > > zeros or arbitrary numbers.
>> > >
>> > > Fix: Patches fix this by just throwing an error if the mode is not
>> > > supported.
>> > > The modes supported is task monitoring and cgroup monitoring.
>> > > Also the per package
>> > > data for say socket x is returned with the -C <cpu on socketx> -G cgrpy
>> > > option.
>> > > The systemwide data can be looked up by monitoring root cgroup.
>> >
>> > Fine. That just lacks any comment in the implementation. Otherwise I would
>> > not have asked the question about cpu monitoring. Though I fundamentaly
>> > hate the idea of requiring cgroups for this to work.
>> >
>> > If I just want to look at CPU X why on earth do I have to set up all that
>> > cgroup muck? Just because your main focus is cgroups?
>>
>> The upstream per cpu data is broken because its not overriding the other task
>> event RMIDs on that cpu with the cpu event RMID.
>>
>> Can be fixed by adding a percpu struct to hold the RMID thats affinitized
>> to the cpu, however then we miss all the task llc_occupancy in that - still
>> evaluating it.
>
> The point here is that CQM is closely connected to the cache allocation
> technology. After a lengthy discussion we ended up having
>
>   - per cpu CLOSID
>   - per task CLOSID
>
> where all tasks which do not have a CLOSID assigned use the CLOSID which is
> assigned to the CPU they are running on.
>
> So if I configure a system by simply partitioning the cache per cpu, which
> is the proper way to do it for HPC and RT usecases where workloads are
> partitioned on CPUs as well, then I really want to have an equaly simple
> way to monitor the occupancy for that reservation.
>
> And looking at that from the CAT point of view, which is the proper way to
> do it, makes it obvious that CQM should be modeled to match CAT.
>
> So lets assume the following:
>
>    CPU 0-3     default CLOSID 0
>    CPU 4               CLOSID 1
>    CPU 5               CLOSID 2
>    CPU 6               CLOSID 3
>    CPU 7               CLOSID 3
>
>    T1                  CLOSID 4
>    T2                  CLOSID 5
>    T3                  CLOSID 6
>    T4                  CLOSID 6
>
>    All other tasks use the per cpu defaults, i.e. the CLOSID of the CPU
>    they run on.
>
> then the obvious basic monitoring requirement is to have a RMID for each
> CLOSID.
>
> So when I monitor CPU4, i.e. CLOSID 1 and T1 runs on CPU4, then I do not
> care at all about the occupancy of T1 simply because that is running on a
> seperate reservation. Trying to make that an aggregated value in the first
> place is completely wrong. If you want an aggregate, which is pretty much
> useless, then user space tools can generate it easily.
>
> The whole approach you and David have taken is to whack some desired cgroup
> functionality and whatever into CQM without rethinking the overall
> design. And that's fundamentaly broken because it does not take cache (and
> memory bandwidth) allocation into account.
>
> I seriously doubt, that the existing CQM/MBM code can be refactored in any
> useful way. As Peter Zijlstra said before: Remove the existing cruft
> completely and start with completely new design from scratch.
>
> And this new design should start from the allocation angle and then add the
> whole other muck on top so far its possible. Allocation related monitoring
> must be the primary focus, everything else is just tinkering.
>

If in this email you meant "Resource group" where you wrote "CLOSID", then
please disregard my previous email. It seems like a good idea to me to have
a 1:1 mapping between RMIDs and "Resource groups".

The distinction matter because changing the schemata in the resource group
would likely trigger a change of CLOSID, which is useful.

Thanks,
David