linux-kernel - Re: early microcode on amd is broken when no initramfs provided

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAPVoSvQuzUQUgRgk6fhsxEhjWS8oPreo00u_urntaCzmERpOtg@mail.gmail.com>
Date:	Sat, 20 Jul 2013 21:01:33 +0200
From:	Torsten Kaiser <just.for.lkml@...glemail.com>
To:	Borislav Petkov <bp@...en8.de>
Cc:	Johannes Hirte <johannes.hirte@....tu-ilmenau.de>,
	Jacob Shin <jacob.shin@....com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Jacob Shin <jacob.w.shin@...il.com>
Subject: Re: early microcode on amd is broken when no initramfs provided

On Tue, Jul 16, 2013 at 7:00 PM, Borislav Petkov <bp@...en8.de> wrote:
> On Thu, Jul 11, 2013 at 11:05:25PM +0200, Johannes Hirte wrote:
>> config is attached
>
> Ok, I can reproduce the hang with your config but even with:
>
> $ grep MICROCODE .config
> # CONFIG_MICROCODE is not set
> # CONFIG_MICROCODE_INTEL_EARLY is not set
> # CONFIG_MICROCODE_AMD_EARLY is not set
>
> which means, it cannot be microcode-related.
>
> And I'd bet if you wait a minute (yep, it should be exactly 60 seconds)
> the boot would probably continue. And if so, this is that 60 sec delay
> where the kernel tries to find firmware.
>
> Hmm...

I have the same problem: Booting 3.11-rc1 hangs after the line:
ACPI: Executed 3 blocks of module-level executable AML code

I bisected it down to the early microcode changes:
757885e94a22bcc82beb9b1445c95218cb20ceab (the new early loading
implementation) and 6b3389ac21b5e557b957f1497d0ff22bf733e8c3 (small
fixup) completely fail to boot (No output beyond "Booting kernel") ,
from 275bbe2e299f1820ec8faa443d689469a9e6ecc5 ("Make
find_ucode_in_initrd() __init") I'm seeing this hang.

Just turning CONFIG_MICROCODE_EARLY off solves the problem: The system
now sucessfully boots 3.11-rc1.

Trying to debug this I found the following hack to also solve the boot problem:
Removing the following two lines from collect_cpu_info_amd_early()
from arch/x86/kernel/microcode_amd_early.c:
       c->microcode = rev;
        c->x86 = ((eax >> 8) & 0xf) + ((eax >> 20) & 0xff);

But I can't make sense out of that. And if I try to trace who updates
->x86 it get even more confusing.
Normaly only cpu_detect() seems to update cpuinfo_x86.x86 but now it
seems to fight with collect_cpu_info_amd_early().
On my system this happens:
(Output is always address of the struct cpuinfo_x86 -> value that gets
written into it)

Very early boot:
cpu_detect ffffffff81c8ba40 -> 16

BSP == CPU0 calls load_ucode_ap() via cpu_init():
collect_cpu_info_amd_early ffff880337c10fc0 -> 16
(That is the place I patched out to get the system to boot)

BSP == CPU0 via identify_boot_cpu():
cpu_detect ffffffff81c8ba40 -> 16

BSP == CPU0 stores boot_cpu_data in its per-cpu structure via
smp_store_boot_cpu_info():
smpboot: BSP: store ffffffff81c8ba40 in ffff880337c10fc0

smpboot starts activating the secondary CPUs: Each would in
start_secondary() first call load_ucode_ap() via cpu_init() and then
identidfy_secondary_cpu() via smp_callin():
collect_cpu_info_amd_early ffff880337c50fc0
smpboot: identify_sec_cpu:1/ffff880337c50fc0
cpu_detect ffff880337c50fc0 -> 16

collect_cpu_info_amd_early ffff880337c90fc0
smpboot: identify_sec_cpu:2/ffff880337c90fc0
cpu_detect ffff880337c90fc0 -> 16

collect_cpu_info_amd_early ffff880337cd0fc0
smpboot: identify_sec_cpu:3/ffff880337cd0fc0
cpu_detect ffff880337cd0fc0 -> 16

collect_cpu_info_amd_early ffff880337d10fc0
smpboot: identify_sec_cpu:4/ffff880337d10fc0
cpu_detect ffff880337d10fc0 -> 16

collect_cpu_info_amd_early ffff880337d50fc0
smpboot: identify_sec_cpu:5/ffff880337d50fc0
cpu_detect ffff880337d50fc0 -> 16


It seems the code for updating 'struct cpuinfo_x86 *C' in
collect_cpu_info_amd_early() is useless, because it will be
overwritten first by smp_store_cpu_info() and then again by
identify_secondary_cpu(c) and wrong, because at that point the per-cpu
structure should not be used yet, as smp_store_cpu_info() did not run
yet.
But something else seems to be using the per-cpu structure of the BSP
between its cpu_init() and smp_store_boot_cpu_info().

And its cpu_has_amd_erratum(): It uses cpuinfo_x86.x86 do decide if it
need to fall back to boot_cpu_data, but because
collect_cpu_info_amd_early() has filled that field, but not
.x86_vendor (that is still 0 == X86_VENDOR_INTEL) the erratas are not
applied to the BSP and then something in ACPI gets stuck.

Does this diagnostic make sense / should I send a patch?

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/