lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190930121537.GG25745@shell.armlinux.org.uk>
Date:   Mon, 30 Sep 2019 13:15:37 +0100
From:   Russell King - ARM Linux admin <linux@...linux.org.uk>
To:     linux-arm-kernel@...ts.infradead.org,
        Will Deacon <will@...nel.org>, linux-kernel@...r.kernel.org,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...hat.com>,
        Namhyung Kim <namhyung@...nel.org>
Subject: perf annotate fails with "Invalid -1 error code"

Hi,

While using perf report on aarch64, I try to annotate
__arch_copy_to_user, and it fails with:

Error: Couldn't annotate __arch_copy_to_user: Internal error: Invalid -1 error code

which is not very helpful.  Looking at the code, the error message
appended to the "Couldn't annotate ...:" comes from
symbol__strerror_disassemble(), which expects either an errno or
one of the special SYMBOL_ANNOTATE_ERRNO_* constants in its 3rd
argument.

symbol__tui_annotate() passes the 3rd argument as the return value
from symbol__annotate2().  symbol__annotate2() returns either zero or
-1.  This calls symbol__annotate(), which returns -1 (which would
generally conflict with -EPERM), -errno, the return value of
arch->init, or the return value of symbol__disassemble().

This seems to be something of a mess - different places seem to use
different approaches to handling errors, and some don't bother
propagating the error code up.

The upshot is, the error message reported when trying to annotate
gives the user no clue why perf is unable to annotate, and you have
to resort to stracing perf in an attempt to find out - which also
isn't useful:

3431  pselect6(1, [0], NULL, NULL, NULL, NULL) = 1 (in [0])
3431  pselect6(5, [4], NULL, NULL, {tv_sec=10, tv_nsec=0}, NULL) = 1 (in [4], left {tv_sec=9, tv_nsec=999995480})
3431  read(4, "\r", 1)                  = 1
3431  uname({sysname="Linux", nodename="cex7", ...}) = 0
3431  openat(AT_FDCWD, "/usr/lib/aarch64-linux-gnu/gconv/gconv-modules.cache", O_RDONLY) = 26
3431  fstat(26, {st_mode=S_IFREG|0644, st_size=26404, ...}) = 0
3431  mmap(NULL, 26404, PROT_READ, MAP_SHARED, 26, 0) = 0x7fa1fd9000
3431  close(26)                         = 0
3431  futex(0x7fa172b830, FUTEX_WAKE_PRIVATE, 2147483647) = 0
3431  write(1, "\33[10;21H\33[37m\33[40m\342\224\214\342\224\200Error:\342\224"..., 522) = 522
3431  pselect6(1, [0], NULL, NULL, NULL, NULL <detached ...>

Which makes it rather difficult to know what is actually failing...
so the only way is to resort to gdb.

It seems that dso__disassemble_filename() is returning -10000, which
seems to be SYMBOL_ANNOTATE_ERRNO__NO_VMLINUX and as described above,
this is lost due to the lack of error code propagation.

Specifically, the failing statement is:

        if (dso->symtab_type == DSO_BINARY_TYPE__KALLSYMS &&
            !dso__is_kcore(dso))
                return SYMBOL_ANNOTATE_ERRNO__NO_VMLINUX;

Looking at "dso" shows:

	kernel = DSO_TYPE_KERNEL,
	symtab_type = DSO_BINARY_TYPE__KALLSYMS,
	binary_type = DSO_BINARY_TYPE__KALLSYMS,
	load_errno = DSO_LOAD_ERRNO__MISMATCHING_BUILDID,
	name = 0x555588781c "/boot/vmlinux",

and we finally get to the reason - it's using the wrong vmlinux.
So, obvious solution (once the failure reason is known), give it
the correct vmlinux.

Should it really be necessary to resort to gdb to discover why perf
is failing?

It looks like this was introduced by ecda45bd6cfe ("perf annotate:
Introduce symbol__annotate2 method") which did this:

-       err = symbol__annotate(sym, map, evsel, 0, &browser.arch);
+       err = symbol__annotate2(sym, map, evsel, &annotate_browser__opts, &browser.arch);

+int symbol__annotate2(struct symbol *sym, struct map *map, struct perf_evsel *evsel,
+                     struct annotation_options *options, struct arch **parch)
+{
...
+       err = symbol__annotate(sym, map, evsel, 0, parch);
+       if (err)
+               goto out_free_offsets;
...
+out_free_offsets:
+       zfree(&notes->offsets);
+       return -1;
+}

introducing this problem by the "return -1" disease.

So, given that this function's return value is used as an error code
in the way I've described above, should this function also be fixed
to return ENOMEM when the zalloc fails, as well as propagating the
return value from symbol__annotate() ?

I haven't yet checked to see if there's other places that call this
function but now rely on it returning -1... but I'd like to lodge a
plea that perf gets some consistency wrt how errors are passed and
propagated from one function to another.

Thanks.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ