linux-kernel - Re: [PATCH v5 02/10] perf record: implement -f,--mmap-flush=<threshold> option

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <50578e9f-5e3f-64de-fa68-5e2ec25be594@linux.intel.com>
Date:   Thu, 7 Mar 2019 11:54:51 +0300
From:   Alexey Budankov <alexey.budankov@...ux.intel.com>
To:     Jiri Olsa <jolsa@...hat.com>
Cc:     Arnaldo Carvalho de Melo <acme@...nel.org>,
        Namhyung Kim <namhyung@...nel.org>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Andi Kleen <ak@...ux.intel.com>,
        linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v5 02/10] perf record: implement
 -f,--mmap-flush=<threshold> option


On 05.03.2019 15:26, Jiri Olsa wrote:
> On Fri, Mar 01, 2019 at 06:41:44PM +0300, Alexey Budankov wrote:
>>
>> Implemented -f,--mmap-flush option that specifies minimal size of data
>> chunk that is extracted from mmaped kernel buffer to store into a trace.
>>
>>   $ tools/perf/perf record -f 1024 -e cycles -- matrix.gcc
>>   $ tools/perf/perf record --aio -f 1024 -e cycles -- matrix.gcc
>>
>> Option can serve two purposes the first one is to increase the compression
>> ratio of a trace data and the second one is to avoid live-lock-like self 
>> monitoring in system wide (-a) profiling mode.
>>
>> The default option value is 1 byte what means that every time trace
>> writing thread finds some new data in the mmaped buffer the data is
>> extracted, possibly compressed and written to a trace. Larger data chunks
>> are compressed more effectively in comparison to smaller chunks so
>> extraction of larger chunks from the kernel buffer is preferable from
>> perspective of trace size reduction. So the implemented option allows 
>> specifying minimal data chunk size that is more than 1 byte to influence 
>> data compression ratio. Also at some cases executing more write syscalls 
>> with smaller data size can take longer than executing less write syscalls 
>> with bigger data size due to syscall overhead so extracting bigger data 
>> chunks specified by the option value could additionally decrease runtime 
>> overhead.
>>
>> Profiling in system wide mode with compression (-a -z) can additionally 
>> induce data into the kernel buffers along with the data from monitored 
>> processes. If performance data rate and volume from the monitored processes 
>> is high then trace streaming and compression activity in the tool is also 
>> high and it can lead to subtle live-lock effect of endless activity when 
>> compression of single new byte from some of mmaped kernel buffer leads to 
>> eneration of the next single byte at some mmaped buffer so perf tool trace 
>> writing thread never stops on polling event file descriptors.
>>
>> Implemented sync param is the mean to force data move independently from
>> the threshold value. Despite the provided flush value from the command
>> line, the tool needs capability to drain memory buffers, at least in the
>> end of the collection.
>>
>> Signed-off-by: Alexey Budankov <alexey.budankov@...ux.intel.com>
>> ---
>>  tools/perf/Documentation/perf-record.txt | 13 ++++++
>>  tools/perf/builtin-record.c              | 53 +++++++++++++++++++++---
>>  tools/perf/perf.h                        |  1 +
>>  tools/perf/util/evlist.c                 |  6 +--
>>  tools/perf/util/evlist.h                 |  3 +-
>>  tools/perf/util/mmap.c                   |  4 +-
>>  tools/perf/util/mmap.h                   |  3 +-
>>  7 files changed, 71 insertions(+), 12 deletions(-)
>>
>> diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
>> index 8f0c2be34848..9fa33ce9bc00 100644
>> --- a/tools/perf/Documentation/perf-record.txt
>> +++ b/tools/perf/Documentation/perf-record.txt
>> @@ -459,6 +459,19 @@ Set affinity mask of trace reading thread according to the policy defined by 'mo
>>    node - thread affinity mask is set to NUMA node cpu mask of the processed mmap buffer
>>    cpu  - thread affinity mask is set to cpu of the processed mmap buffer
>>  
>> +-f::
>> +--mmap-flush=n::
>> +Specify minimal number of bytes that is extracted from mmap data pages and stored
>> +into a trace. Maximal allowed value is a quarter of the size of mmaped data pages.
>> +The default option value is 1 what means that every time trace writing thread finds
>> +some new data in the mmaped buffer the data is extracted, possibly compressed (-z)
>> +and written to a trace. Larger data chunks are compressed more effectively in
>> +comparison to smaller chunks so extraction of larger chunks from the mmap data pages
>> +is preferable from perspective of trace size reduction. Also at some cases
>> +executing less trace write syscalls with bigger data size can take shorter than
>> +executing more trace write syscalls with smaller data size thus lowering runtime
>> +profiling overhead.
> 
> I was wondering if that's the same we would achieve with ring buffer
> watermak config on kernel side.. but I guess it does not hurt to
> have something on user side.. I'm just not sure it makes sense to have
> a config option for that
> 
> I'd understand if we configure some sane value when compression is
> enabled.. if it makes sense to have this option, I'd allow it only
> when compression is enabled

This threshold already exists in the code with default value of 1 byte. New option 
provides configuration capability for the threshold keeping default the same.

The option is the mean to get around the live lock issue in the tool in case of 
intensive system wide profiling and the issue is not specific to the compression.

The option gives benefit jointly with compression. The bigger chink is compressed 
the higher compression ratio it has, limit exists, of course.

~Alexey

> 
> jirka
>