linux-kernel - Re: [PATCH] perf tools: session: avoid infinite loop

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <55FBED85.8020603@intel.com>
Date:	Fri, 18 Sep 2015 13:55:01 +0300
From:	Adrian Hunter <adrian.hunter@...el.com>
To:	Mark Rutland <mark.rutland@....com>
Cc:	Arnaldo Carvalho de Melo <acme@...nel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...hat.com>, Jiri Olsa <jolsa@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH] perf tools: session: avoid infinite loop

On 18/09/15 12:51, Mark Rutland wrote:
> On Fri, Sep 18, 2015 at 07:09:16AM +0100, Adrian Hunter wrote:
>> On 17/09/15 18:41, Mark Rutland wrote:
>>> Hi,
>>>
>>> On Wed, Sep 16, 2015 at 09:54:54PM +0100, Arnaldo Carvalho de Melo wrote:
>>>> Em Wed, Sep 16, 2015 at 06:18:49PM +0100, Mark Rutland escreveu:
>>>>> This has been observed to result in an exit-time hang when counting
>>>>> rare/unschedulable events with perf record, and can be triggered
>>>>> artificially with the script below:
>>>>>
>>>>> ----
>>>>> #!/bin/sh
>>>>> printf "REPRO: launching perf\n";
>>>>> ./perf record -e software/config=9/ sleep 1 &
>>>>> PERF_PID=$!;
>>>>> sleep 0.002;
>>>>> kill -2 $PERF_PID;
>>>>> printf "REPRO: waiting for perf (%d) to exit...\n" "$PERF_PID";
>>>>> wait $PERF_PID;
>>>>> printf "REPRO: perf exited\n";
>>>>> ----
>>>>
>>>> So, I run it here, without this patch, and get:
>>>>
>>>>   [root@zoo ~]# time ./repro.sh 
>>>>   REPRO: launching perf
>>>>   REPRO: waiting for perf (766) to exit...
>>>>   [ perf record: Woken up 1 times to write data ]
>>>>   [ perf record: Captured and wrote 0.015 MB perf.data ]
>>>>   REPRO: perf exited
>>>>   real	0m1.060s
>>>>   user	0m0.018s
>>>>   sys	0m0.037s
>>>
>>> [...]
>>>
>>>> What am I doing wrong? Trying to reproduce this before even looking at
>>>> the patch :-)
>>>
>>> I suspect you have a shinier computer than I do! ;)
>>>
>>
>> I imagine you would also need to contrive not to write any synthesized MMAP
>> or COMM events i.e. no access to /proc perhaps?
> 
> Taking a look through __cmd_record:
> 
> I'm not writing to a pipe, so nothing gets synthesized for attrs or tracepoints.
> 
> I'm not using aux trace, so nothing gets synthesized for that.
> 
> Running as a regular user I don't have synthesized events for the kernel mmap
> or modules:
> 
> 	WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted,
> 	check /proc/sys/kernel/kptr_restrict.
> 
> 	Samples in kernel functions may not be resolved if a suitable vmlinux
> 	file is not found in the buildid cache or in the vmlinux path.
> 
> 	Samples in kernel modules won't be resolved at all.
> 
> 	If some relocation was applied (e.g. kexec) symbols may be misresolved
> 	even with a suitable vmlinux or kallsyms file.
> 
> 	Cannot read kernel map
> 	Couldn't record kernel reference relocation symbol
> 	Symbol resolution may be skewed if relocation was used (e.g. kexec).
> 	Check /proc/kallsyms permission or run as root.
> 
> I'm not running a guest, so I don't have any synthesized events for the guest
> os.
> 
> __machine__synthesize_threads doesn't synthesize any task events
> (target__has_task(target) is false at this point, as is
> target__has_cpu(target)), because perf starts the workload later via
> perf_evlist__start_workload.
> 
> So it looks like I shouldn't have any synthesized events. Have I missed
> anything?

Yes, you are right.  But you are not getting the COMM and MMAP events from
the exec which means you are killing perf before it execs the workload.
Perf writes through a pipe to its forked child to do the exec, so
to reproduce it you just need to put everything on the same cpu and play around
with the sleep number

	taskset -c 0 tools/perf/perf record -- sleep 1 & sleep 0.005 ; kill -2 $!

reproduces the problem for me.  Of course, since the workload doesn't get exec'ed
it doesn't matter if it is bogus i.e.

	taskset -c 0 tools/perf/perf record -- sdfgsdgdg & sleep 0.005 ; kill -2 $!

also reproduces the problem.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/