[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20190930125121.GG9622@kernel.org>
Date: Mon, 30 Sep 2019 09:51:21 -0300
From: Arnaldo Carvalho de Melo <arnaldo.melo@...il.com>
To: Russell King - ARM Linux admin <linux@...linux.org.uk>
Cc: linux-arm-kernel@...ts.infradead.org,
Will Deacon <will@...nel.org>, linux-kernel@...r.kernel.org,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...hat.com>,
Namhyung Kim <namhyung@...nel.org>
Subject: Re: perf annotate fails with "Invalid -1 error code"
Em Mon, Sep 30, 2019 at 01:15:37PM +0100, Russell King - ARM Linux admin escreveu:
> Hi,
>
> While using perf report on aarch64, I try to annotate
> __arch_copy_to_user, and it fails with:
>
> Error: Couldn't annotate __arch_copy_to_user: Internal error: Invalid -1 error code
>
> which is not very helpful. Looking at the code, the error message
> appended to the "Couldn't annotate ...:" comes from
> symbol__strerror_disassemble(), which expects either an errno or
> one of the special SYMBOL_ANNOTATE_ERRNO_* constants in its 3rd
> argument.
>
> symbol__tui_annotate() passes the 3rd argument as the return value
> from symbol__annotate2(). symbol__annotate2() returns either zero or
> -1. This calls symbol__annotate(), which returns -1 (which would
> generally conflict with -EPERM), -errno, the return value of
> arch->init, or the return value of symbol__disassemble().
>
> This seems to be something of a mess - different places seem to use
> different approaches to handling errors, and some don't bother
> propagating the error code up.
>
> The upshot is, the error message reported when trying to annotate
> gives the user no clue why perf is unable to annotate, and you have
> to resort to stracing perf in an attempt to find out - which also
> isn't useful:
>
> 3431 pselect6(1, [0], NULL, NULL, NULL, NULL) = 1 (in [0])
> 3431 pselect6(5, [4], NULL, NULL, {tv_sec=10, tv_nsec=0}, NULL) = 1 (in [4], left {tv_sec=9, tv_nsec=999995480})
> 3431 read(4, "\r", 1) = 1
> 3431 uname({sysname="Linux", nodename="cex7", ...}) = 0
> 3431 openat(AT_FDCWD, "/usr/lib/aarch64-linux-gnu/gconv/gconv-modules.cache", O_RDONLY) = 26
> 3431 fstat(26, {st_mode=S_IFREG|0644, st_size=26404, ...}) = 0
> 3431 mmap(NULL, 26404, PROT_READ, MAP_SHARED, 26, 0) = 0x7fa1fd9000
> 3431 close(26) = 0
> 3431 futex(0x7fa172b830, FUTEX_WAKE_PRIVATE, 2147483647) = 0
> 3431 write(1, "\33[10;21H\33[37m\33[40m\342\224\214\342\224\200Error:\342\224"..., 522) = 522
> 3431 pselect6(1, [0], NULL, NULL, NULL, NULL <detached ...>
>
> Which makes it rather difficult to know what is actually failing...
> so the only way is to resort to gdb.
>
> It seems that dso__disassemble_filename() is returning -10000, which
> seems to be SYMBOL_ANNOTATE_ERRNO__NO_VMLINUX and as described above,
> this is lost due to the lack of error code propagation.
>
> Specifically, the failing statement is:
>
> if (dso->symtab_type == DSO_BINARY_TYPE__KALLSYMS &&
> !dso__is_kcore(dso))
> return SYMBOL_ANNOTATE_ERRNO__NO_VMLINUX;
>
> Looking at "dso" shows:
>
> kernel = DSO_TYPE_KERNEL,
> symtab_type = DSO_BINARY_TYPE__KALLSYMS,
> binary_type = DSO_BINARY_TYPE__KALLSYMS,
> load_errno = DSO_LOAD_ERRNO__MISMATCHING_BUILDID,
> name = 0x555588781c "/boot/vmlinux",
>
> and we finally get to the reason - it's using the wrong vmlinux.
> So, obvious solution (once the failure reason is known), give it
> the correct vmlinux.
>
> Should it really be necessary to resort to gdb to discover why perf
> is failing?
>
> It looks like this was introduced by ecda45bd6cfe ("perf annotate:
> Introduce symbol__annotate2 method") which did this:
>
> - err = symbol__annotate(sym, map, evsel, 0, &browser.arch);
> + err = symbol__annotate2(sym, map, evsel, &annotate_browser__opts, &browser.arch);
>
> +int symbol__annotate2(struct symbol *sym, struct map *map, struct perf_evsel *evsel,
> + struct annotation_options *options, struct arch **parch)
> +{
> ...
> + err = symbol__annotate(sym, map, evsel, 0, parch);
> + if (err)
> + goto out_free_offsets;
> ...
> +out_free_offsets:
> + zfree(¬es->offsets);
> + return -1;
> +}
>
> introducing this problem by the "return -1" disease.
>
> So, given that this function's return value is used as an error code
> in the way I've described above, should this function also be fixed
> to return ENOMEM when the zalloc fails, as well as propagating the
> return value from symbol__annotate() ?
>
> I haven't yet checked to see if there's other places that call this
> function but now rely on it returning -1... but I'd like to lodge a
> plea that perf gets some consistency wrt how errors are passed and
> propagated from one function to another.
Note taken, will address the points raised here.
- Arnaldo
Powered by blists - more mailing lists