linux-kernel - Re: [PATCH v3 0/9] Parallel CPU bringup for x86

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b798bcef-d750-ce42-986c-0d11d0bb47b0@amd.com>
Date:   Fri, 17 Dec 2021 13:46:22 -0600
From:   Tom Lendacky <thomas.lendacky@....com>
To:     David Woodhouse <dwmw2@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>
Cc:     Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        "x86@...nel.org" <x86@...nel.org>,
        "H . Peter Anvin" <hpa@...or.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        "Paul E . McKenney" <paulmck@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "rcu@...r.kernel.org" <rcu@...r.kernel.org>,
        "mimoja@...oja.de" <mimoja@...oja.de>,
        "hewenliang4@...wei.com" <hewenliang4@...wei.com>,
        "hushiyuan@...wei.com" <hushiyuan@...wei.com>,
        "luolongjun@...wei.com" <luolongjun@...wei.com>,
        "hejingxian@...wei.com" <hejingxian@...wei.com>
Subject: Re: [PATCH v3 0/9] Parallel CPU bringup for x86_64

On 12/17/21 1:11 PM, David Woodhouse wrote:
> On Fri, 2021-12-17 at 11:48 -0600, Tom Lendacky wrote:
>> On 12/16/21 6:13 PM, David Woodhouse wrote:
>>> On Thu, 2021-12-16 at 16:52 -0600, Tom Lendacky wrote:
>>>> On baremetal, I haven't seen an issue. This only seems to have a problem
>>>> with Qemu/KVM.
>>>>
>>>> With 191f08997577 I could boot without issues with and without the
>>>> no_parallel_bringup. Only after I applied e78fa57dd642 did the failure happen.
>>>>
>>>> With e78fa57dd642 I could boot 64 vCPUs pretty consistently, but when I
>>>> jumped to 128 vCPUs it failed again. When I moved the series to
>>>> df9726cb7178, then 64 vCPUs also failed pretty consistently.
>>>>
>>>> Strange thing is it is random. Sometimes (rarely) it works on the first
>>>> boot and then sometimes it doesn't, at which point it will reset and
>>>> reboot 3 or 4 times and then make it past the failure and fully boot.
>>>
>>> Hm, some of that is just artifacts of timing, I'm sure. But now I'm
>>> staring at the way that early_setup_idt() can run in parallel on all
>>> CPUs, rewriting bringup_idt_descr and loading it.
>>>
>>> To start with, let's try unlocking the trampoline_lock much later,
>>> after cpu_init_exception_handling() has loaded the real IDT.
>>>
>>> I think we can probably make secondaries load the real IDT early and
>>> never use bringup_idt_descr at all, can't we? But let's see if this
>>> makes it go away, to start with...
>>>
>>
>> This still fails. I ran with -d cpu_reset on the command line and will
>> forward the full log to you. I ran "grep "[ER]IP=" stderr.log | uniq -c"
>> and got:
>>
>>       128 EIP=00000000 EFL=00000000 [-------] CPL=0 II=0 A20=0 SMM=0 HLT=0
>>       128 EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
>> These are before running any of the vCPUs.
>>
>>         1 RIP=ffffffff810705c6 RFL=00000206 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>> This is where vCPU0 is at the time of the reset. This address tends to
>> be different all the time and so I think it is just where it happens to
>> be when the reset occurs and isn't contributing to the reset.
> 
> I note that one is in native_write_msr() though. I wonder what it's writing?
> 
> Do you have console output (perhaps with earlyprintk=ttyS0) to go with this?

Yes, but it's not really much help...

[    0.146318] Freeing SMP alternatives memory: 36K
[    0.249121] smpboot: CPU0: AMD EPYC Processor (family: 0x17, model: 0x1, stepping: 0x2)
[    0.249291] Performance Events: AMD PMU driver.
[    0.249771] ... version:                0
[    0.250170] ... bit width:              48
[    0.250258] ... generic registers:      4
[    0.250662] ... value mask:             0000ffffffffffff
[    0.251258] ... max period:             00007fffffffffff
[    0.251790] ... fixed-purpose events:   0
[    0.252258] ... event mask:             000000000000000f
[    0.252972] rcu: Hierarchical SRCU implementation.
[    0.255797] smp: Bringing up secondary CPUs ...
[    0.256372] x86: Booting SMP configuration:
SecCoreStartupWithStack(0xFFFCC000, 0x820000)
Register PPI Notify: DCD0BE23-9586-40F4-B643-06522CED4EDE
Install PPI: 8C8CE578-8A3D-4F1C-9935-896185C32DD3
Install PPI: 5473C07A-3DCB-4DCA-BD6F-1E9689E7349A
The 0th FV start address is 0x00000820000, size is 0x000E0000, handle is 0x820000

There's no WARN or PANIC, just a reset. I can look to try and capture some
KVM trace data if that would help. If so, let me know what events you'd
like captured.

> 

>> CPU Reset (CPU 23)
>> EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
> 
> This one we suspect. Is this what a triple-fault would look like? Not
> if it's *already* at f000:fff0, surely?

Good question. The APM doesn't really document it. I'll see if I can find
some h/w folks to check with.

Thanks,
Tom

> 
> CPU Reset (CPU 23)
> EAX=00000000 EBX=00000000 ECX=00000000 EDX=00800f12
> ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
> EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0000 00000000 0000ffff 00009300
> CS =f000 ffff0000 0000ffff 00009a00
> SS =0000 00000000 0000ffff 00009200
> DS =0000 00000000 0000ffff 00009300
> FS =0000 00000000 0000ffff 00009300
> GS =0000 00000000 0000ffff 00009300
> LDT=0000 00000000 0000ffff 00008200
> TR =0000 00000000 0000ffff 00008300
> GDT=     00000000 0000ffff
> IDT=     00000000 0000ffff
> CR0=00000010 CR2=00000000 CR3=00000000 CR4=00000000
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> CCS=00000000 CCD=00000000 CCO=DYNAMIC
> EFER=0000000000000000
> FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80
> FPR0=0000000000000000 0000 FPR1=0000000000000000 0000
> FPR2=0000000000000000 0000 FPR3=0000000000000000 0000
> FPR4=0000000000000000 0000 FPR5=0000000000000000 0000
> FPR6=0000000000000000 0000 FPR7=0000000000000000 0000
> XMM00=0000000000000000 0000000000000000 XMM01=0000000000000000 0000000000000000
> XMM02=0000000000000000 0000000000000000 XMM03=0000000000000000 0000000000000000
> XMM04=0000000000000000 0000000000000000 XMM05=0000000000000000 0000000000000000
> XMM06=0000000000000000 0000000000000000 XMM07=0000000000000000 0000000000000000
>