[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <719f19f7-a368-7a5c-7e08-84deafbf8473@linux.intel.com>
Date: Mon, 10 Aug 2020 18:36:03 -0400
From: "Liang, Kan" <kan.liang@...ux.intel.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: acme@...nel.org, mingo@...hat.com, linux-kernel@...r.kernel.org,
mark.rutland@....com, alexander.shishkin@...ux.intel.com,
jolsa@...hat.com, eranian@...gle.com, ak@...ux.intel.com,
dave.hansen@...el.com, kirill.shutemov@...ux.intel.com
Subject: Re: [PATCH V6 01/16] perf/core: Add PERF_SAMPLE_DATA_PAGE_SIZE
On 8/10/2020 5:39 PM, Peter Zijlstra wrote:
> On Mon, Aug 10, 2020 at 02:24:21PM -0700, Kan Liang wrote:
>> Current perf can report both virtual addresses and physical addresses,
>> but not the page size. Without the page size information of the utilized
>> page, users cannot decide whether to promote/demote large pages to
>> optimize memory usage.
>>
>> Add a new sample type for the data page size.
>>
>> Current perf already has a facility to collect data virtual addresses.
>> A page walker is required to walk the pages tables and calculate the
>> page size from a given virtual address.
>>
>> On some platforms, e.g., X86, the page walker is invoked in an NMI
>> handler. So the page walker must be IRQ-safe and low overhead. Besides,
>> the page walker should work for both user and kernel virtual address.
>> The existing generic page walker, e.g., walk_page_range_novma(), is a
>> little bit complex and doesn't guarantee the IRQ-safe. The follow_page()
>> is only for user-virtual address.
>>
>> Add a new function perf_get_page_size() to walk the page tables and
>> calculate the page size. In the function:
>> - Interrupts have to be disabled to prevent any teardown of the page
>> tables.
>> - The size of a normal page is from the pre-defined page size macros.
>> - The size of a compound page is retrieved from the helper function,
>> page_size().
>>
>> Suggested-by: Peter Zijlstra <peterz@...radead.org>
>> Signed-off-by: Kan Liang <kan.liang@...ux.intel.com>
>
>> /* default value for data source */
>> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
>> index 52ca2093831c..32484accc7a3 100644
>> --- a/include/uapi/linux/perf_event.h
>> +++ b/include/uapi/linux/perf_event.h
>> @@ -143,8 +143,9 @@ enum perf_event_sample_format {
>> PERF_SAMPLE_PHYS_ADDR = 1U << 19,
>> PERF_SAMPLE_AUX = 1U << 20,
>> PERF_SAMPLE_CGROUP = 1U << 21,
>> + PERF_SAMPLE_DATA_PAGE_SIZE = 1U << 22,
>>
>> - PERF_SAMPLE_MAX = 1U << 22, /* non-ABI */
>> + PERF_SAMPLE_MAX = 1U << 23, /* non-ABI */
>>
>> __PERF_SAMPLE_CALLCHAIN_EARLY = 1ULL << 63, /* non-ABI; internal use */
>> };
>
>> @@ -7151,6 +7269,9 @@ void perf_prepare_sample(struct perf_event_header *header,
>> }
>> #endif
>>
>> + if (sample_type & PERF_SAMPLE_DATA_PAGE_SIZE)
>> + data->data_page_size = perf_get_page_size(data->addr);
>> +
>
> We could just require SAMPLE_DATA_PAGE requires SAMPLE_ADDR.
>
If we only require the SAMPLE_DATA_PAGE_SIZE and no SAMPLE_ADDR, the
data->addr will be updated implicitly, but the value will not dump to
userspace tool. I will add a comment here.
Thanks,
Kan
Powered by blists - more mailing lists