lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 16 Dec 2021 16:52:36 -0600
From:   Tom Lendacky <thomas.lendacky@....com>
To:     David Woodhouse <dwmw2@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>
Cc:     Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        "x86@...nel.org" <x86@...nel.org>,
        "H . Peter Anvin" <hpa@...or.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        "Paul E . McKenney" <paulmck@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "rcu@...r.kernel.org" <rcu@...r.kernel.org>,
        "mimoja@...oja.de" <mimoja@...oja.de>,
        "hewenliang4@...wei.com" <hewenliang4@...wei.com>,
        "hushiyuan@...wei.com" <hushiyuan@...wei.com>,
        "luolongjun@...wei.com" <luolongjun@...wei.com>,
        "hejingxian@...wei.com" <hejingxian@...wei.com>
Subject: Re: [PATCH v3 0/9] Parallel CPU bringup for x86_64

On 12/16/21 1:24 PM, David Woodhouse wrote:
> On Thu, 2021-12-16 at 10:27 -0600, Tom Lendacky wrote:
>> On 12/15/21 8:56 AM, David Woodhouse wrote:
>>> Doing the INIT/SIPI/SIPI in parallel for all APs and *then* waiting for
>>> them shaves about 80% off the AP bringup time on a 96-thread socket
>>> Skylake box (EC2 c5.metal) — from about 500ms to 100ms.
>>>
>>> There are more wins to be had with further parallelisation, but this is
>>> the simple part.
>>
>> I applied this series and began booting a regular non-SEV guest and hit a
>> failure at 39 vCPUs. No panic or warning, just a reset and OVMF was
>> executing again. I'll try to debug what's going, but not sure how quickly
>> I'll arrive at anything.
> 
> Thanks for testing. This is working for me with BIOS and EFI boots in
> qemu and real hardware but it's mostly been Intel so far. I'll try
> harder on an AMD box.

On baremetal, I haven't seen an issue. This only seems to have a problem 
with Qemu/KVM.

With 191f08997577 I could boot without issues with and without the 
no_parallel_bringup. Only after I applied e78fa57dd642 did the failure happen.

With e78fa57dd642 I could boot 64 vCPUs pretty consistently, but when I 
jumped to 128 vCPUs it failed again. When I moved the series to 
df9726cb7178, then 64 vCPUs also failed pretty consistently.

Strange thing is it is random. Sometimes (rarely) it works on the first 
boot and then sometimes it doesn't, at which point it will reset and 
reboot 3 or 4 times and then make it past the failure and fully boot.

> 
> Anything else special about your setup, kernel config or qemu
> invocation that might help me reproduce?

Shouldn't be anything special that I'm aware of:
  - EPYC 3rd Gen (Milan)
  - Qemu 6.1.0
  - OVMF edk2-stable202111

The qemu command line is:
qemu-system-x86_64 -enable-kvm -cpu EPYC,host-phys-bits=true -smp 128 -m 
1G -machine type=q35 -drive 
if=pflash,format=raw,unit=0,file=/root/kernels/qemu-install/OVMF_CODE.fd,readonly=on 
-drive if=pflash,format=raw,unit=1,file=./diskless.fd -nographic -kernel 
/root/kernels/linux-build-x86_64/arch/x86/boot/bzImage -append 
"console=ttyS0,115200n8" -monitor pty -monitor unix:monitor,server,nowait

I can send the kernel config to you offlist if you're unable to repro with 
yours.

> 
> If it can repro without KVM, 'qemu -d in_asm' can be extremely useful
> for this kind of thing btw.

I didn't repro the failure without KVM.

Thanks,
Tom

> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ