linux-kernel - Re: [PATCH v1 2/2]: perf record: enable asynchronous trace writing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <3a8286ff-124f-fb56-1d1c-5d5e1f3b8512@linux.intel.com>
Date:   Mon, 27 Aug 2018 12:20:50 +0300
From:   Alexey Budankov <alexey.budankov@...ux.intel.com>
To:     Jiri Olsa <jolsa@...hat.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Namhyung Kim <namhyung@...nel.org>,
        Andi Kleen <ak@...ux.intel.com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        linux-perf-users@...r.kernel.org
Subject: Re: [PATCH v1 2/2]: perf record: enable asynchronous trace writing

Hi,

On 27.08.2018 0:48, Jiri Olsa wrote:
> On Tue, Aug 21, 2018 at 11:27:03AM +0300, Alexey Budankov wrote:
> 
> SNIP
> 
>> -static int record__pushfn(void *to, void *bf, size_t size)
>> +static int record__pushfn(void *to, void *bf, size_t size, off_t off)
>>  {
>>  	struct record *rec = to;
>> +	struct perf_mmap *map = bf;
>>  
>>  	rec->samples++;
>> -	return record__write(rec, bf, size);
>> +	return record__aio_write(rec->session->data->file.fd, &map->cblock,
>> +			map->data, size, off);
>>  }
>>  
>>  static volatile int done;
>> @@ -528,13 +530,85 @@ static struct perf_event_header finished_round_event = {
>>  	.type = PERF_RECORD_FINISHED_ROUND,
>>  };
>>  
>> +static int record__mmap_read_sync(int trace_fd, struct aiocb **cblocks,
>> +		int cblocks_size, struct record *rec)
>> +{
>> +	size_t rem;
>> +	ssize_t size;
>> +	off_t rem_off;
>> +	int i, aio_ret, aio_errno, do_suspend;
>> +	struct perf_mmap *md;
>> +	struct timespec timeout0 = { 0, 0 };
>> +	struct timespec timeoutS = { 0, 1000 * 500  * 1 };
>> +
>> +	if (!cblocks_size)
>> +		return 0;
>> +
>> +	do {
>> +		do_suspend = 0;
>> +		nanosleep(&timeoutS, NULL);
> 
> why the extra sleep in here and not sleeping through aio_suspend call?

Yep. Good question. That requires explicit explanation:

+               /* aio_suspend() implementation inside glibc (as of v2.27) is
+                * intrusive and not just blocks waiting io requests completion
+                * but polls requests queue inducing context switches in perf
+                * tool process. When profiling in system wide mode with tracing
+                * context switches the trace may be polluted by context switches
+                * from the perf process and the trace size becomes about 3-5
+                * times bigger than that of when writing the trace serially.
+                * To limit the volume of context switches from perf tool
+                * process nonsleep() call limits aio_suspend()
+                * polling till every half of the kernel timer tick which is
+                * usually 1ms (depends on CONFIG_HZ value).
+                */

> 
> jirka
> 
>> +		if (aio_suspend((const struct aiocb**)cblocks, cblocks_size, &timeout0)) {
>> +			if (errno == EAGAIN || errno == EINTR) {
>> +				do_suspend = 1;
>> +				continue;
>> +			} else {
>> +				pr_err("failed to sync perf data, error: %m\n");
> 
> SNIP
>