lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 23 Mar 2016 18:40:51 +0100
From:	Matthias Schiffer <mschiffer@...verse-factory.net>
To:	Peter Hurley <peter@...leysoftware.com>
Cc:	Ralf Baechle <ralf@...ux-mips.org>, gregkh@...uxfoundation.org,
	jslaby@...e.com, linux-mips@...ux-mips.org,
	linux-serial@...r.kernel.org,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: Nonterministic hang during bootconsole/console handover on ath79

On 03/22/2016 04:38 PM, Peter Hurley wrote:
> On 03/22/2016 06:07 AM, Matthias Schiffer wrote:
>> I've tried your patch and I can't reproduce the issue anymore with it; I
>> have no idea if this actually has to do something with the issue, or the
>> change of the code path just hid the bug again.
>>
>> Regarding your other mail: with "small change", I was not talking about
>> adding an additional printk; as mentioned, even changing the numbers in
>> UTS_VERSION can hide the issue. I diffed a working and a broken kernel
>> image, and the UTS_VERSION is really the only difference. I have no idea
>> how to explain this.
> 
> If _any_ change may hide the problem, that will make it impossible
> to determine if any attempted fix actually works, regardless of what
> debugging method you use.
> 
> FWIW, you could still use the boot console to debug the problem by
> disabling the regular command-line console.
> 
> Regards,
> Peter Hurley

Hi,
it seems Peter was on the right track. With some help from Ralf, I was able
to narrow down the issue a bit, and I'm fairly sure the hang happens
somewhere in autoconfig().

autoconfig_16550a() is doing all kinds of weird checks to detect different
hardware by writing a lot of register values which are documented as
reserved in the AR7242 datasheet (there's a leaked version going around
that can be easily googled...), no idea if any of those are problematic.
Just setting UPF_FIXED_TYPE as suggested by Peter would avoid that code
altogether.

That being said, I found another minimal change that seems to fix the
issue: prom_putchar_ar71xx() in arch/mips/ath79/early_printk.c only waits
for UART_LSR_THRE, while serial_putc() in
drivers/tty/serial/8250/8250_early.c waits for (UART_LSR_TEMT |
UART_LSR_THRE). Adjusting arch/mips/ath79/early_printk.c in the same way
makes the hangs go away. Maybe the AR7242 doesn't like its serial config
registers being poked while there's still something in the FIFO? Waiting
for UART_LSR_TEMT seems like a good idea anyways to ensure that all
characters have been printed before autoconfig() starts taking things
apart. (Why do these two versions of essentially the same code exist anyways?)

Regards,
Matthias



Download attachment "signature.asc" of type "application/pgp-signature" (820 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ