linux-kernel - Re: [PATCH 0/2] x86/intel_rdt and perf/x86: Fix lack of coordination with perf

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <57c011e1-113d-c38f-c318-defbad085843@intel.com>
Date:   Mon, 6 Aug 2018 12:50:50 -0700
From:   Reinette Chatre <reinette.chatre@...el.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Dave Hansen <dave.hansen@...el.com>, tglx@...utronix.de,
        mingo@...hat.com, fenghua.yu@...el.com, tony.luck@...el.com,
        vikas.shivappa@...ux.intel.com, gavin.hindman@...el.com,
        jithu.joseph@...el.com, hpa@...or.com, x86@...nel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/2] x86/intel_rdt and perf/x86: Fix lack of coordination
 with perf

Hi Peter,

On 8/3/2018 11:37 AM, Reinette Chatre wrote:
> On 8/3/2018 8:25 AM, Peter Zijlstra wrote:
>> On Fri, Aug 03, 2018 at 08:18:09AM -0700, Reinette Chatre wrote:
>>> You state that you understand what we are trying to do and I hope that I
>>> convinced you that we are not able to accomplish the same by following
>>> your guidance.
>>
>> No, I said I understood your pmc reserve patch and its implications.
>>
>> I have no clue what you're trying to do with resctl, nor why you think
>> this is not feasible with perf. And if it really is not feasible, you'll
>> have to live without it.

In my previous email I provided the details of the Cache Pseudo-Locking
feature implemented on top of resctrl. Please let me know if you would
like any more details about that. I can send you more materials.

In my previous message I also provided the thoughts on why I believe
same is not feasible with perf as commented below ...

> Looking at if we were to build on top of the kernel perf event API
> (perf_event_create_kernel_counter(), perf_event_enable(),
> perf_event_disable(), ...). Just looking at perf_event_enable() -
> ideally this would be as lean as possible to only enable the event and
> not result in itself contributing the the measurement. First, the
> enabling of the event is not as lean to fulfill this requirement since
> it executes more code after the event was actually enabled. Also, the
> code relies on a mutex so we cannot use it with interrupts disabled.

I proceeded to modify the implemented debugfs measurements to build on
top of the perf APIs mentioned above. As anticipated the events could
not be enabled in interrupt context. I get a clear message in this regard:

BUG: sleeping function called from invalid context at
kernel/locking/mutex.c:748

I thus continued to use the API with interrupts enabled did the following:

Two new event attributes:
static struct perf_event_attr l2_miss_attr = {
        .type           = PERF_TYPE_RAW,
        .config         = (0x10ULL << 8) | 0xd1,
        .size           = sizeof(struct perf_event_attr),
        .pinned         = 1,
        .disabled       = 1,
        .exclude_user   = 1
};

static struct perf_event_attr l2_hit_attr = {
        .type           = PERF_TYPE_RAW,
        .config         = (0x2ULL << 8) | 0xd1,
        .size           = sizeof(struct perf_event_attr),
        .pinned         = 1,
        .disabled       = 1,
        .exclude_user   = 1
};

Create the two new events using these attributes:
l2_miss_event = perf_event_create_kernel_counter(&l2_miss_attr, cpu,
NULL, NULL, NULL);
l2_hit_event = perf_event_create_kernel_counter(&l2_hit_attr, cpu, NULL,
NULL, NULL);

Take measurements:
perf_event_enable(l2_miss_event);
perf_event_enable(l2_hit_event);
local_irq_disable();
/* Disable hardware prefetchers */
/* Loop through pseudo-locked memory */
/* Enable hardware prefetchers */
local_irq_enable();
perf_event_disable(l2_hit_event);
perf_event_disable(l2_miss_event);

Read results:
l2_hits = perf_event_read_value(l2_hit_event, &enabled, &running);
l2_miss = perf_event_read_value(l2_miss_event, &enabled, &running);
/* Make results available in tracepoints */


With the above implementation and a 256KB pseudo-locked memory region I
obtain the following results:
pseudo_lock_mea-755   [002] ....   396.946953: pseudo_lock_l2: hits=4140
miss=5
pseudo_lock_mea-762   [002] ....   397.998864: pseudo_lock_l2: hits=4138
miss=8
pseudo_lock_mea-765   [002] ....   399.041868: pseudo_lock_l2: hits=4142
miss=5
pseudo_lock_mea-768   [002] ....   400.086871: pseudo_lock_l2: hits=4141
miss=7
pseudo_lock_mea-771   [002] ....   401.132921: pseudo_lock_l2: hits=4138
miss=10
pseudo_lock_mea-774   [002] ....   402.216700: pseudo_lock_l2: hits=4238
miss=46
pseudo_lock_mea-777   [002] ....   403.312148: pseudo_lock_l2: hits=4142
miss=5
pseudo_lock_mea-780   [002] ....   404.381674: pseudo_lock_l2: hits=4139
miss=8
pseudo_lock_mea-783   [002] ....   405.422820: pseudo_lock_l2: hits=4472
miss=79
pseudo_lock_mea-786   [002] ....   406.495065: pseudo_lock_l2: hits=4140
miss=8
pseudo_lock_mea-793   [002] ....   407.561383: pseudo_lock_l2: hits=4143
miss=4

The above results are not accurate since it does not reflect the success
of the pseudo-locked region. Expected results are as we can currently
obtain (copying results from previous email):
pseudo_lock_mea-26090 [002] .... 61838.488027: pseudo_lock_l2: hits=4096
miss=0
pseudo_lock_mea-26097 [002] .... 61843.689381: pseudo_lock_l2: hits=4096
miss=0
pseudo_lock_mea-26100 [002] .... 61848.751411: pseudo_lock_l2: hits=4096
miss=0
pseudo_lock_mea-26108 [002] .... 61853.820361: pseudo_lock_l2: hits=4096
miss=0
pseudo_lock_mea-26111 [002] .... 61858.880364: pseudo_lock_l2: hits=4096
miss=0
pseudo_lock_mea-26118 [002] .... 61863.937343: pseudo_lock_l2: hits=4096
miss=0
pseudo_lock_mea-26121 [002] .... 61869.008341: pseudo_lock_l2: hits=4096
miss=0

Could you please guide me on how you would prefer us to use perf in
order to obtain the same accurate results we can now?

Thank you very much

Reinette