lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 5 Jul 2012 14:18:18 -0700
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Sam Ravnborg <sam@...nborg.org>, Michal Marek <mmarek@...e.cz>,
	Arnaud Lacombe <lacombar@...il.com>,
	Nick Bowler <nbowler@...iptictech.com>,
	Jan Beulich <jbeulich@...ell.com>
Cc:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: "Inconsistent kallsyms data" error

So for some unknown reason I'm hitting this on just one particular
machine, and it's *very* annoying.

It's annoying for three reasons:

 - it's breaking the build (duh)

 - the error is printed out to stderr, so you don't even *see* it as
an error if you redirect the normal messages somewhere else (like any
sane person, ie me, does)

 - when the error happens, it doesn't show *what* went wrong, and in
fact it explicitly cleans up all the files that could show what
happened.

And no, "make KALLSYMS_EXTRA_PASS=1" does not fix anything.

Interestingly, making a trivial change to actually show the difference
actually made the problem go away. It was entirely reliable with that
particular config and that particular kernel version with a *clean*
tree, but it looks like just changing the tree to be dirty (and thus
changing the version string) hides the problem. Which makes it even
harder to debug, because now I can't see what the difference actually
is that causes things to fail.

VERY annoying.

This is not a new bug - according to google this has been reported
before, back in October 2011. In that case the workaround worked. In
my case it does not.

Anyway, after hacking the source to actually show the difference, and
to also *not* change the version string just becuse it's dirty, I see
this difference:

 - System.map:

    ...
    ffffffff8189b4d0 R kallsyms_addresses
    ffffffff818ee910 R kallsyms_num_syms
    ffffffff818ee918 R kallsyms_names
    ...
    ffffffff819fa9a0 R __stop___modver
    ffffffff819fb000 R __end_rodata
    ...

 - .tmp_System.map:

    ...
    ffffffff8189b4d0 R kallsyms_addresses
    ffffffff818ee850 R kallsyms_num_syms
    ffffffff818ee858 R kallsyms_names
    ...
    ffffffff819fa720 R __stop___modver
    ffffffff819fb000 R __end_rodata

(the diff itself is huge, because once the addresses change, they stay
different).

Notice how 'kallsyms_addresses' has the same value, but
'kallsyms_num_syms' (and subsequent symbols until the page-aligned
__end_rodata symbol that gets them back in sync) do not. I have no
idea *why* this happens, but it definitely does.

It seems the real difference is the size of the "kallsyms_addresses"
data structure. No idea why, though.

This happens with current git (commit c4aed353b1b0), on an x86-64
machine running current F17 as of today, with the attached config.
Maybe that makes somebody else able to recreate this and figure out
what is so magical about the layout that the exact kernel version and
config (and likely compiler/binutils versions) matter.

Any ideas? Added a fairly random set of people who get mentioned in
the linker script commits etc.

                           Linus

Download attachment "tove-config.gz" of type "application/x-gzip" (17487 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ