linux-kernel - Re: [PATCH 00/12] Cqm2: Intel Cache quality monitoring fixes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALcN6mhhOreO9xAqv3ox7kiBDoCN2J1Wyj2H8hB0upfy6W4TvQ@mail.gmail.com>
Date:   Mon, 6 Feb 2017 14:16:05 -0800
From:   David Carrillo-Cisneros <davidcc@...gle.com>
To:     "Luck, Tony" <tony.luck@...el.com>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Vikas Shivappa <vikas.shivappa@...ux.intel.com>,
        "Shivappa, Vikas" <vikas.shivappa@...el.com>,
        Stephane Eranian <eranian@...gle.com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        x86 <x86@...nel.org>, "hpa@...or.com" <hpa@...or.com>,
        Ingo Molnar <mingo@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        "Shankar, Ravi V" <ravi.v.shankar@...el.com>,
        "Yu, Fenghua" <fenghua.yu@...el.com>,
        "Kleen, Andi" <andi.kleen@...el.com>,
        "Anvin, H Peter" <h.peter.anvin@...el.com>
Subject: Re: [PATCH 00/12] Cqm2: Intel Cache quality monitoring fixes

On Mon, Feb 6, 2017 at 1:22 PM, Luck, Tony <tony.luck@...el.com> wrote:
>> 12) Whatever fs or syscall is provided instead of perf syscalls, it
>> should provide total_time_enabled in the way perf does, otherwise is
>> hard to interpret MBM values.
>
> It seems that it is hard to define what we even mean by memory bandwidth.
>
> If you are measuring just one task and you find that the total number of bytes
> read is 1GB at some point, and one second later the total bytes is 2GB, then
> it is clear that the average bandwidth for this process is 1GB/s. If you know
> that the task was only running for 50% of the cycles during that 1s interval,
> you could say that it is doing 2GB/s ... which is I believe what you were
> thinking when you wrote #12 above.

Yes, that's one of the cases.

> But whether that is right depends a
> bit on *why* it only ran 50% of the time. If it was time-sliced out by the
> scheduler ... then it may have been trying to be a 2GB/s app. But if it
> was waiting for packets from the network, then it really is using 1 GB/s.

IMO, "right" means that measured bandwidth and running time are
correct. The *why* is a bigger question.

>
> All bets are off if you are measuring a service that consists of several
> tasks running concurrently. All you can really talk about is the aggregate
> average bandwidth (total bytes / wall-clock time). It makes no sense to
> try and factor in how much cpu time each of the individual tasks got.

cgroup mode gives a per-CPU breakdown of event and running time, the
tool aggregates it into running time vs event count. Both per-cpu
breakdown and the aggregate are useful.

Piggy-backing on perf's cgroup mode would give us all the above for free.

>
> -Tony