lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 17 Sep 2015 16:41:52 +0100
From:	Mark Rutland <mark.rutland@....com>
To:	Arnaldo Carvalho de Melo <acme@...nel.org>
Cc:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Adrian Hunter <adrian.hunter@...el.com>,
	Ingo Molnar <mingo@...hat.com>, Jiri Olsa <jolsa@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH] perf tools: session: avoid infinite loop

Hi,

On Wed, Sep 16, 2015 at 09:54:54PM +0100, Arnaldo Carvalho de Melo wrote:
> Em Wed, Sep 16, 2015 at 06:18:49PM +0100, Mark Rutland escreveu:
> > This has been observed to result in an exit-time hang when counting
> > rare/unschedulable events with perf record, and can be triggered
> > artificially with the script below:
> > 
> > ----
> > #!/bin/sh
> > printf "REPRO: launching perf\n";
> > ./perf record -e software/config=9/ sleep 1 &
> > PERF_PID=$!;
> > sleep 0.002;
> > kill -2 $PERF_PID;
> > printf "REPRO: waiting for perf (%d) to exit...\n" "$PERF_PID";
> > wait $PERF_PID;
> > printf "REPRO: perf exited\n";
> > ----
> 
> So, I run it here, without this patch, and get:
> 
>   [root@zoo ~]# time ./repro.sh 
>   REPRO: launching perf
>   REPRO: waiting for perf (766) to exit...
>   [ perf record: Woken up 1 times to write data ]
>   [ perf record: Captured and wrote 0.015 MB perf.data ]
>   REPRO: perf exited
>   real	0m1.060s
>   user	0m0.018s
>   sys	0m0.037s

[...]

> What am I doing wrong? Trying to reproduce this before even looking at
> the patch :-)

I suspect you have a shinier computer than I do! ;)

It's easier to trigger on slower machines -- decreasing the sleep time
to 0.001 or below may help, though if it's set too low we exit early
without exercising the failing path.

It looks like to trigger the bug we need to successfully call
record__mmap_read_all at least once (but not read any data), then break out of
the loop because hits == rec->samples && done. That way we call
process_buildids with no data having been written.

I added some additional logging of rec->bytes_written immediately before and
after the main loop. In all cases where we've written something we exit, and in
all cases we didn't, we hang. I've left some examples (from an arm64 system)
below.

Thanks,
Mark.

mark@...bensteg:~$ ./repro.sh 
REPRO: launching perf
WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted,
check /proc/sys/kernel/kptr_restrict.

Samples in kernel functions may not be resolved if a suitable vmlinux
file is not found in the buildid cache or in the vmlinux path.

Samples in kernel modules won't be resolved at all.

If some relocation was applied (e.g. kexec) symbols may be misresolved
even with a suitable vmlinux or kallsyms file.

Waiting for perf (2104) to exit...
Cannot read kernel map
Couldn't record kernel reference relocation symbol
Symbol resolution may be skewed if relocation was used (e.g. kexec).
Check /proc/kallsyms permission or run as root.
Pre-loop rec wrote 0
Post-loop rec wrote 2128
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.002 MB perf.data ]
Perf exited

mark@...bensteg:~$ ./repro.sh 
REPRO: launching perf
WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted,
check /proc/sys/kernel/kptr_restrict.

Samples in kernel functions may not be resolved if a suitable vmlinux
file is not found in the buildid cache or in the vmlinux path.

Samples in kernel modules won't be resolved at all.

If some relocation was applied (e.g. kexec) symbols may be misresolved
even with a suitable vmlinux or kallsyms file.

Waiting for perf (2099) to exit...
Cannot read kernel map
Couldn't record kernel reference relocation symbol
Symbol resolution may be skewed if relocation was used (e.g. kexec).
Check /proc/kallsyms permission or run as root.
Pre-loop rec wrote 0
Post-loop rec wrote 0
[ perf record: Woken up 0 times to write data ]
< hang here >

mark@...bensteg:~$ ./repro.sh 
REPRO: launching perf
WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted,
check /proc/sys/kernel/kptr_restrict.

Samples in kernel functions may not be resolved if a suitable vmlinux
file is not found in the buildid cache or in the vmlinux path.

Samples in kernel modules won't be resolved at all.

If some relocation was applied (e.g. kexec) symbols may be misresolved
even with a suitable vmlinux or kallsyms file.

Cannot read kernel map
Couldn't record kernel reference relocation symbol
Symbol resolution may be skewed if relocation was used (e.g. kexec).
Check /proc/kallsyms permission or run as root.
Pre-loop rec wrote 0
Waiting for perf (2108) to exit...
Post-loop rec wrote 0
[ perf record: Woken up 1 times to write data ]
< hang here >
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ