[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1af731f8-b5d3-5aca-af02-575802a961b9@intel.com>
Date: Thu, 2 Aug 2018 09:14:10 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: tglx@...utronix.de, mingo@...hat.com, fenghua.yu@...el.com,
tony.luck@...el.com, vikas.shivappa@...ux.intel.com,
gavin.hindman@...el.com, jithu.joseph@...el.com,
dave.hansen@...el.com, hpa@...or.com, x86@...nel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/2] x86/intel_rdt and perf/x86: Fix lack of coordination
with perf
Hi Peter,
On 8/2/2018 5:39 AM, Peter Zijlstra wrote:
> On Tue, Jul 31, 2018 at 12:38:27PM -0700, Reinette Chatre wrote:
>> Dear Maintainers,
>>
>> The success of Cache Pseudo-Locking can be measured via the use of
>> performance events. Specifically, the number of cache hits and misses
>> reading a memory region after it has been pseudo-locked to cache. This
>> measurement is triggered via the resctrl debugfs interface.
>>
>> To ensure most accurate results the performance counters and their
>> configuration registers are accessed directly.
>
> NAK on that.
>
After data is locked to cache we need to measure the success of that.
There is no instruction that we can use to query if a memory address has
been cached but we can use performance monitoring events that are
especially valuable on the platforms where they are precise event capable.
To ensure that we are only measuring the presence of data that should be
locked to cache we need to tightly control how this measurement is done.
For example, on my test system I locked 256KB to the cache and with the
current implementation (tip.git on branch x86/cache) I am able to
accurately measure that this was successful as seen below (each cache
line within the 256KB is accessed while the performance monitoring
events are active):
pseudo_lock_mea-26090 [002] .... 61838.488027: pseudo_lock_l2: hits=4096
miss=0
pseudo_lock_mea-26097 [002] .... 61843.689381: pseudo_lock_l2: hits=4096
miss=0
pseudo_lock_mea-26100 [002] .... 61848.751411: pseudo_lock_l2: hits=4096
miss=0
pseudo_lock_mea-26108 [002] .... 61853.820361: pseudo_lock_l2: hits=4096
miss=0
pseudo_lock_mea-26111 [002] .... 61858.880364: pseudo_lock_l2: hits=4096
miss=0
pseudo_lock_mea-26118 [002] .... 61863.937343: pseudo_lock_l2: hits=4096
miss=0
pseudo_lock_mea-26121 [002] .... 61869.008341: pseudo_lock_l2: hits=4096
miss=0
The current implementation does not coordinate with perf and this is
what I am trying to fix in this series.
I do respect your NAK but it is not clear to me how to proceed after
obtaining it. Could you please elaborate on what you would prefer as a
solution to ensure accurate measurement of cache-locked data that is
better integrated?
Thank you very much
Reinette
Powered by blists - more mailing lists