lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200317114605.GG2156@tucnak>
Date:   Tue, 17 Mar 2020 12:46:05 +0100
From:   Jakub Jelinek <jakub@...hat.com>
To:     Sergei Trofimovich <slyfox@...too.org>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        "H. Peter Anvin" <hpa@...or.com>,
        Andy Lutomirski <luto@...nel.org>, x86@...nel.org
Subject: Re: [PATCH] x86: fix early boot crash on gcc-10

On Mon, Mar 16, 2020 at 10:12:51PM +0000, Sergei Trofimovich wrote:
> In case you are still interested in preprocessed files and results I've collected
> all the bits in a single tarball:
>     https://dev.gentoo.org/~slyfox/bugs/linux-gcc-10-boot-2020-03-14.tar.gz
> Same available in separate files in:
>     https://dev.gentoo.org/~slyfox/bugs/linux-gcc-10-boot-2020-03-14/

Thanks, that is helpful.
So, a few comments.

One thing I've noticed in the command line is that
--param=allow-store-data-races=0
got dropped.  That is fine, the parameter is gone, but it has been replaced
in https://gcc.gnu.org/PR92046 by the
-fno-allow-store-data-races
option.  Like the param which defaulted to 0 and has been enabled only with
-Ofast this option is also -fno-allow-store-data-races by default unless
-Ofast, but if kernel wanted to be explicit or make sure not to introduce
them even with -Ofast, I'd say it should:
 # Tell gcc to never replace conditional load with a non-conditional one
 KBUILD_CFLAGS   += $(call cc-option,--param=allow-store-data-races=0)
+KBUILD_CFLAGS   += $(call cc-option,-fno-allow-store-data-races)
in the toplevel Makefile.

The actual change that causes cpu_startup_entry to be tail called in GCC 10
was my https://gcc.gnu.org/PR59813 https://gcc.gnu.org/PR89060
https://gcc.gnu.org/r10-230-gab87ac8d53f3d903bfc9eeb0f0b5e7eed1f38cbc
optimization.  Previously, GCC punted just because it saw some earlier call
which passed address of some automatic variable to some other function and
escaped it that way (and could be possibly used during the function
considered for tail call, thus making the tail call not possible as with
tail call the frame is gone then).  Now, GCC tries to use information about
scopes to determine that eventhough some automatic variables can escape that
way, if they aren't in the scope anymore during the last call, they
shouldn't be problematic.  There are two variables that prevented the tail
call optimization in the past it seems, int cpuid; in smp_callin function
which is inlined, then its address escapes:
__dynamic_pr_debug(&__UNIQUE_ID_ddebug114, "smpboot" ": " "Stack at about %p\n", &cpuid);
and then cpuid goes out of scope.
Similarly, the u64 canary; variable in boot_init_stack_canary which is also
inlined.  The address escapes in
get_random_bytes(&canary, sizeof(canary));
and later on is used and eventually goes out of scope.
Finally, there is the cpu_startup_entry call at the end of function.

Regarding the reason why GCC doesn't tailcall noreturn functions, that is a
deliberate decision, often that results in shorter code (there is no need to
restore stack etc. before the call and because it is noreturn, anything
after the call can be thrown away as unreachable) and more importantly,
results in more useful backtraces when something calls abort etc.

Hope this helps.

	Jakub

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ