lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140804135452.GJ15082@console-pimps.org>
Date:	Mon, 4 Aug 2014 14:54:52 +0100
From:	Matt Fleming <matt@...sole-pimps.org>
To:	Bruno Prémont <bonbons@...ux-vserver.org>
Cc:	P J P <ppandit@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, linux-efi@...r.kernel.org
Subject: Re: 3.12 to 3.13 boot regression bisected - still applies to 3.16

On Mon, 04 Aug, at 03:06:27PM, Bruno Prémont wrote:
> 
> Yes, I did as I have seen that patch flying by, but it did not help
> (I tried at 3.16-rc7).
 
:-( Thanks for testing.

> On 3.16-rc7 I even tried adding earlyprintk=efi,keep, console=efi,
> ignore_loglevel and added some efi_printk() in EFI stub (in the spirit
> of https://bugzilla.kernel.org/show_bug.cgi?id=68761)
> The last message I get is my efi_printk() right before exiting boot
> services. Without my efi_printk() there is no output at all.
> 
> Then system reboots.

OK, so the fact that the system reboots suggests that the boot
stub/kernel caused a fault.
 
> There is no output on serial console either (via BMC),
> (earlycon=uart,io,0x3f8,115200 or earlyprintk=serial,ttyS0,115200)
> 
> 
> I even tried without initrd (setting CONFIG_INITRAMFS_SOURCE="")
> and got the same end-result.

Oh that's interesting.

> I could share a slightly modified one, replacing the
> contained /etc/passwd. It's about 16MiB in size due to RAID controller
> management blobs for recovery. Except for that it just tries to find
> ROOT partition, setting up dmcrypt if needed.
 
This shouldn't be necessary if you can reproduce the issue without an
initrd as you stated above.

> Any hint on how to find out what fails would be nice!
> initrd issues tend not to be easy to debug (it would help if initrd
> issues could be reported at the time kernel tries to start init - e.g.
> when console outputs are up and running).

I don't think this is necessarily an initrd issue.

The way that I would debug this is to insert while(1); into strategic
places. Yes, it's lame and time consuming, but it's effective.

My first suggestion would be setup_arch(). In particular, because your
machine is resetting, I'd guess that the kernel's early trap handlers
haven't yet been installed.

So throw a,

	while (1);

in there and see if you can get your machine to hang instead of reset.
If it doesn't hang, the reset occurs earlier in boot - work backwards.
If it does hang then you know that execution gets at least that far -
work forwards. Like I said, lame but effective.

Meanwhile I'm going to go and stare at the EFI boot stub code and
instrument OVMF to check for more memory corruption bugs like the one
Michael found in commit c7fb93ec51d4 ("x86/efi: Include a .bss section
within the PE/COFF headers").

-- 
Matt Fleming, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ