lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 7 Mar 2017 20:12:25 -0800
From:   Guenter Roeck <linux@...ck-us.net>
To:     Tobias Klauser <tklauser@...tanz.ch>
Cc:     Sven Schmidt <4sschmid@...ormatik.uni-hamburg.de>,
        Sandra Loosemore <sandra@...esourcery.com>,
        Arnd Bergmann <arnd@...db.de>,
        Andrew Morton <akpm@...ux-foundation.org>,
        linux-kernel@...r.kernel.org, Ley Foon Tan <lftan@...era.com>,
        nios2-dev@...ts.rocketboards.org
Subject: Re: nios2 crash/hang in mainline due to 'lib: update LZ4 compressor
 module'

On 03/07/2017 04:46 AM, Tobias Klauser wrote:
> On 2017-03-03 at 04:04:41 +0100, Guenter Roeck <linux@...ck-us.net> wrote:
>> On 03/02/2017 08:38 AM, Tobias Klauser wrote:
>>> On 2017-03-01 at 20:45:21 +0100, Guenter Roeck <linux@...ck-us.net> wrote:
>>>> On Wed, Mar 01, 2017 at 07:58:17PM +0100, Sven Schmidt wrote:
>>>>> Hi Guenter, Tobias and Sandra,
>>>>>
>>>>> thanks for your effort here.
>>>>>
>>>>> On Tue, Feb 28, 2017 at 10:14:13AM -0800, Guenter Roeck wrote:
>>>>>> On Tue, Feb 28, 2017 at 10:53:56AM -0700, Sandra Loosemore wrote:
>>>>>>> On 02/28/2017 08:53 AM, Tobias Klauser wrote:
>>>>>>>> (adding Sandra Loosemore to Cc due to possible relation to gcc/binutils
>>>>>>>> for nios2)
>>>>>>>>
>>>>>>>> On 2017-02-26 at 22:03:38 +0100, Guenter Roeck <linux@...ck-us.net> wrote:
>>>>>>>>> Hi Sven,
>>>>>>>>>
>>>>>>>>> my qemu test for nios2 started failing with commit 4e1a33b105dd ("lib:
>>>>>>>>> update LZ4 compressor module"). The test hangs early during boot before
>>>>>>>>> any console output is seen. Reverting the offending patch as well as the
>>>>>>>>> subsequent lz4 related patches fixes the problem. Disabling CONFIG_RD_LZ4
>>>>>>>>> and with it other LZ4 options also fixes it (as does adding "return -EINVAL;"
>>>>>>>>> at the top of the LZ4 decompression code). For reference, bisect log
>>>>>>>>> is attached.
>>>>>>>>>
>>>>>>>>> I tried with buildroot toolchains using gcc 6.1.0 as well as 6.3.0
>>>>>>>>> and binutils 2.26.1. Scripts used to run the tests are available at
>>>>>>>>> https://github.com/groeck/linux-build-test/tree/master/rootfs/nios2.
>>>>>>>>> Qemu is from qemu mainline or qemu v2.8 with nios2 patches applied.
>>>>>>>>
>>>>>>>> Looks like this is somehow related to gcc/binutils. Using GCC 4.8.3 and
>>>>>>>> binutils 2.24.51 (both from from Sourcery CodeBench Lite 2014.05) I can
>>>>>>>> get a kernel booting on latest master branch. AFAICT, none of the
>>>>>>>> LZ4_decompress_* functions are called during boot.
>>>>>>>>
>>>>>
>>>>> It seems a bit strange that code which is not actually called causes problems like that.
>>>>>
>>>> Yes, it is, though it is always possible. The code isn't exactly easy to
>>>> understand; there may be some hidden caveats such as global variables. It may
>>>> also be that some jump target exceeds its range (though why that would only
>>>> be seen with the LZ4 code is another question), or that the compiler gets
>>>> confused by the forced inlines (disabling that didn't make a difference,
>>>> though, nor did disabling -O3).
>>>>
>>>>> Please let me know if and how I may help you figure out what's happening, especially
>>>>> regarding the differences between the previous LZ4 and the current implementation.
>>>>>
>>>>
>>>> For my part I am all but clueless. Unless someone has an idea, we may to
>>>> disable LZ4 support for nios2 for the time being. Does anyone have thoughts
>>>> on that ? Of course, that would not help if the problem also affects
>>>> recent gcc/binutil versions on other architectures.
>>>
>>> After some further investigations, I'd say this isn't "caused" by LZ4
>>> specifically but by a more general problem with one of the nios2 arch
>>> specific tools involved.
>>>
>>> I manually enabled random additional CONFIG_* options and in some cases
>>> I got the kernel to boot (with CONFIG_RD_LZ4 enabled and no return
>>> -EINVAL in place) while in others I didn't. So I'd rather suspect this
>>> problem to be connected to the size or structure of the generated vmlinux
>>> image.
>>>
>>> Or could this even be a problem with qemu? Did anyone already verify
>>> this on the 10m50 devboard? (Unfortunately I don't have any nios2
>>> devboard available right now, otherwise I would have done this...)
>>>
>>
>> That is of course always possible.
>>
>>> Other than that I'm also becoming all but clueless... One option I
>>> thought of was using the QEMU monitor to dump the CPU state after the
>>> hang but so far I didn't manage to get it to work (hints appreciated ;)
>>>
>>
>> Something like
>>
>> qemu-system-nios2 -M 10m50-ghrd -kernel vmlinux -no-reboot \
>> 	-dtb arch/nios2/boot/dts/10m50_devboard.dtb \
>> 	--append "rdinit=/sbin/init" -initrd busybox-nios2.cpio
>>
>> gives you a qemu monitor window. Use "info registers" to see registers.
>> Looks like it is stuck in init_bootmem_core, or at least that is what it
>> shows for me.
>
> Thanks a lot for the hint, this worked perfectly. I'm not all that
> familiar with qemu :-/
>
> Using the qemu gdbserver I can indeed confirm that it seems to be stuck
> in init_bootmem_core:
>
> (gdb) file vmlinux
> Reading symbols from vmlinux...done.
> (gdb) target remote localhost:1234
> Remote debugging using localhost:1234
> link_bootmem (bdata=<optimized out>) at mm/bootmem.c:80
> 80			if (bdata->node_min_pfn < ent->node_min_pfn) {
>
> This looks like a very weird place for it to get stuck...
>
> So I followed a different path and implemented early printk support for
> the 8250/16650 serial console on nios2, so I could get debug outputs
> earlier on (patch below, I'll also officially submit this later one).
>

That is great; I'll add that to my own tests to get some output.

> Now I get the following output on boot:
>
> Linux version 4.11.0-rc1-dirty (tobiask@...s08) (gcc version 7.0.1 20170226 (experimental) (GCC) ) #46 Tue Mar 7 13:40:53 CET 2017
> bootconsole [early0] enabled
> Early console on uart16650 initialized at 0xf8001600
> OF: fdt: Error -11 processing FDT
> Kernel panic - not syncing: setup_cpuinfo: No CPU found in devicetree!
>
> ---[ end Kernel panic - not syncing: setup_cpuinfo: No CPU found in devicetree!
>
> Looks like the in-memory device tree somehow gets corrupted. Not sure
> yet why and how this is linked to the Kconfig options selected but at
> least we now have a possibility to use debug messages earlier on.
>
Interesting. I was able to confirm that the lz4 patch is not the root
cause. I was not able to reproduce the problem in v4.10, but after
adding more and more configuration options I get it to fail starting
with commit ac1820fb286b552 ("Merge tag 'for-next-dma_ops' of
git://git.kernel.org/pub/ scm/linux/kernel/git/dledford/rdma").
No idea if that is the root cause either. Kernel configuration for that
case is attached.

Of course ac1820fb286b552 doesn't crash anymore with that after applying
your patch below, and v4.11-rc1 crashes without any output :-(.

I think I'll add some logging into qemu to see where it puts the dtb.

Guenter


> ---%<---%<---
>
> Patch for 8250/16650 early printk support on nios2 (make sure to select
> CONFIG_EARLY_PRINTK):
>
> diff --git a/arch/nios2/Kconfig.debug b/arch/nios2/Kconfig.debug
> index 2fd08cbfdddb..35b5dd67b15a 100644
> --- a/arch/nios2/Kconfig.debug
> +++ b/arch/nios2/Kconfig.debug
> @@ -18,7 +18,7 @@ config EARLY_PRINTK
>  	bool "Activate early kernel debugging"
>  	default y
>  	select SERIAL_CORE_CONSOLE
> -	depends on SERIAL_ALTERA_JTAGUART_CONSOLE || SERIAL_ALTERA_UART_CONSOLE
> +	depends on SERIAL_ALTERA_JTAGUART_CONSOLE || SERIAL_ALTERA_UART_CONSOLE || SERIAL_8250_CONSOLE
>  	help
>  	  Enable early printk on console
>  	  This is useful for kernel debugging when your machine crashes very
> diff --git a/arch/nios2/kernel/early_printk.c b/arch/nios2/kernel/early_printk.c
> index c08e4c1486fc..24b4506f4969 100644
> --- a/arch/nios2/kernel/early_printk.c
> +++ b/arch/nios2/kernel/early_printk.c
> @@ -22,6 +22,8 @@ static unsigned long base_addr;
>
>  #if defined(CONFIG_SERIAL_ALTERA_JTAGUART_CONSOLE)
>
> +#define UART_NAME "altera_jtaguart"
> +
>  #define ALTERA_JTAGUART_DATA_REG		0
>  #define ALTERA_JTAGUART_CONTROL_REG		4
>  #define ALTERA_JTAGUART_CONTROL_WSPACE_MSK	0xFFFF0000
> @@ -53,6 +55,8 @@ static void early_console_write(struct console *con, const char *s, unsigned n)
>
>  #elif defined(CONFIG_SERIAL_ALTERA_UART_CONSOLE)
>
> +#define UART_NAME "altera_uart"
> +
>  #define ALTERA_UART_TXDATA_REG		4
>  #define ALTERA_UART_STATUS_REG		8
>  #define ALTERA_UART_STATUS_TRDY		0x0040
> @@ -80,9 +84,40 @@ static void early_console_write(struct console *con, const char *s, unsigned n)
>  	}
>  }
>
> +#elif defined(CONFIG_SERIAL_8250_CONSOLE)
> +
> +#define UART_NAME "uart16650"
> +
> +#define UART_LSR_TEMT	0x40 /* Transmitter empty */
> +#define UART_LSR_THRE	0x20 /* Transmit-hold-register empty */
> +#define BOTH_EMPTY (UART_LSR_TEMT | UART_LSR_THRE)
> +
> +#define UART_GET_SR() \
> +	__builtin_ldwio((void *)(base_addr + 0x14))
> +#define UART_SET_TX(v) \
> +	__builtin_stwio((void *)(base_addr), v)
> +
> +static void early_console_putc(char c)
> +{
> +	while (!((UART_GET_SR() & BOTH_EMPTY) == BOTH_EMPTY))
> +		;
> +
> +	UART_SET_TX(c & 0xff);
> +}
> +
> +static void early_console_write(struct console *con, const char *s, unsigned n)
> +{
> +	while (n-- && *s) {
> +		early_console_putc(*s);
> +		if (*s == '\n')
> +			early_console_putc('\r');
> +		s++;
> +	}
> +}
> +
>  #else
> -# error Neither SERIAL_ALTERA_JTAGUART_CONSOLE nor SERIAL_ALTERA_UART_CONSOLE \
> -selected
> +# error Neither SERIAL_ALTERA_JTAGUART_CONSOLE, SERIAL_ALTERA_UART_CONSOLE, \
> +        nor SERIAL_8250_CONSOLE selected
>  #endif
>
>  static struct console early_console_prom = {
> @@ -95,7 +130,8 @@ static struct console early_console_prom = {
>  void __init setup_early_printk(void)
>  {
>  #if defined(CONFIG_SERIAL_ALTERA_JTAGUART_CONSOLE) ||	\
> -	defined(CONFIG_SERIAL_ALTERA_UART_CONSOLE)
> +	defined(CONFIG_SERIAL_ALTERA_UART_CONSOLE) ||	\
> +	defined(CONFIG_SERIAL_8250_CONSOLE)
>  	base_addr = of_early_console();
>  #else
>  	base_addr = 0;
> @@ -114,5 +150,5 @@ void __init setup_early_printk(void)
>
>  	early_console = &early_console_prom;
>  	register_console(early_console);
> -	pr_info("early_console initialized at 0x%08lx\n", base_addr);
> +	pr_info("Early console on %s initialized at 0x%08lx\n", UART_NAME, base_addr);
>  }
>


View attachment "test_defconfig" of type "text/plain" (10488 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ