lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b3d9fbbf-e760-5d1d-9182-44c144abd1bf@amd.com>
Date:   Fri, 3 Feb 2023 13:48:39 -0600
From:   Kim Phillips <kim.phillips@....com>
To:     Usama Arif <usama.arif@...edance.com>, dwmw2@...radead.org,
        tglx@...utronix.de, arjan@...ux.intel.com
Cc:     mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com,
        hpa@...or.com, x86@...nel.org, pbonzini@...hat.com,
        paulmck@...nel.org, linux-kernel@...r.kernel.org,
        kvm@...r.kernel.org, rcu@...r.kernel.org, mimoja@...oja.de,
        hewenliang4@...wei.com, thomas.lendacky@....com, seanjc@...gle.com,
        pmenzel@...gen.mpg.de, fam.zheng@...edance.com,
        punit.agrawal@...edance.com, simon.evans@...edance.com,
        liangma@...ngbit.com, David Woodhouse <dwmw@...zon.co.uk>,
        Mario Limonciello <Mario.Limonciello@....com>
Subject: Re: [PATCH v6 07/11] x86/smpboot: Disable parallel boot for AMD CPUs

+Mario

Hi,

On 2/2/23 3:56 PM, Usama Arif wrote:
> From: David Woodhouse <dwmw@...zon.co.uk>
> 
> Signed-off-by: David Woodhouse <dwmw@...zon.co.uk>
> ---

I'd like to nack this, but can't (and not because it doesn't have
commit text):

If I:

  - take dwmw2's parallel-6.2-rc6 branch (commit 459d1c46dbd1)
  - remove the set_cpu_bug(c, X86_BUG_NO_PARALLEL_BRINGUP) line from amd.c

Then:

  - a Ryzen 3000 (Picasso A1/Zen+) notebook I have access to fails to boot.
  - Zen 2,3,4-based servers boot fine
  - a Zen1-based server doesn't boot.

This is what's left on its serial port:

[    3.199633] smp: Bringing up secondary CPUs ...
[    3.200732] x86: Booting SMP configuration:
[    3.204242] .... node  #0, CPUs:          #1
[    3.204301] CPU 1 to 93/x86/cpu:kick in 63 21 -114014307645 0 . 0 0 0 0 . 0 114025055970
[    3.204478] ------------[ cut here ]------------
[    3.204481] WARNING: CPU: 0 PID: 1 at arch/x86/kernel/cpu/common.c:2122 cpu_init+0x2d/0x1f0
[    3.204490] Modules linked in:
[    3.204493] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.2.0-rc6+ #19
[    3.204496] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018
[    3.204498] RIP: 0010:cpu_init+0x2d/0x1f0
[    3.204502] Code: e5 41 56 41 55 41 54 53 65 48 8b 1c 25 80 2e 1f 00 65 44 8b 35 20 e4 39 55 48 8b 05 5d f7 51 02 44 89 f2 f0 48 0f ab 10 73 06 <0f> 0b eb 02 f3 90 48 8b 05 3e f7 51 02 48 0f a3 10 73 f1 45 85 f6
[    3.204504] RSP: 0000:ffffffffac803d70 EFLAGS: 00010083
[    3.204506] RAX: ffff8d293eef6e40 RBX: ffff8d1d40010000 RCX: 0000000000000008
[    3.204508] RDX: 0000000000000000 RSI: ffff8d1d1c40b048 RDI: ffffffffac566418
[    3.204509] RBP: ffffffffac803d90 R08: 00000000fffffe14 R09: ffff8d1d1c406078
[    3.204510] R10: ffffffffac803dc0 R11: 0000000000000000 R12: 0000000000000000
[    3.204511] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[    3.204512] FS:  0000000000000000(0000) GS:ffff8d1d1c400000(0000) knlGS:0000000000000000
[    3.204514] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.204515] CR2: 0000000000000000 CR3: 0000800daec12000 CR4: 00000000003100a0
[    3.204517] Call Trace:
[    3.204519] ---[ end trace 0000000000000000 ]---
[    3.204580] [Firmware Bug]: CPU0: APIC id mismatch. Firmware: 0 APIC: 2
[    3.288686]    #2
[    3.288735] CPU 2 to 93/x86/cpu:kick in 210 42 -114355248756 0 . 0 0 0 0 . 0 114356192013
[    3.288798] ------------[ cut here ]------------
[    3.288804] WARNING: CPU: 0 PID: 1 at arch/x86/kernel/cpu/common.c:2122 cpu_init+0x2d/0x1f0
[    3.288815] Modules linked in:
[    3.288819] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W          6.2.0-rc6+ #19
[    3.288823] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018
[    3.288826] RIP: 0010:cpu_init+0x2d/0x1f0
[    3.288831] Code: e5 41 56 41 55 41 54 53 65 48 8b 1c 25 80 2e 1f 00 65 44 8b 35 20 e4 39 55 48 8b 05 5d f7 51 02 44 89 f2 f0 48 0f ab 10 73 06 <0f> 0b eb 02 f3 90 48 8b 05 3e f7 51 02 48 0f a3 10 73 f1 45 85 f6
[    3.288835] RSP: 0000:ffffffffac803d70 EFLAGS: 00010083
[    3.288838] RAX: ffff8d293eef6e40 RBX: ffff8d1d40010000 RCX: 0000000000000008
[    3.288841] RDX: 0000000000000000 RSI: ffff8d1d1c40b048 RDI: ffffffffac566418
[    3.288844] RBP: ffffffffac803d90 R08: 00000000fffffe14 R09: ffff8d1d1c406078
[    3.288845] R10: ffffffffac803dc0 R11: 0000000000000000 R12: 0000000000000000
[    3.288848] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[    3.288850] FS:  0000000000000000(0000) GS:ffff8d1d1c400000(0000) knlGS:0000000000000000
[    3.288852] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.288855] CR2: 0000000000000000 CR3: 0000800daec12000 CR4: 00000000003100a0
[    3.288857] Call Trace:
[    3.288859] ---[ end trace 0000000000000000 ]---
[    3.288925] [Firmware Bug]: CPU0: APIC id mismatch. Firmware: 0 APIC: 8
6.36[   [    3. 68 33]3 [ #3[  [ #
                               [    3.368623[    3
  [    3.368623]    #3
[    3.368662] ------------[ cut here ]------------
[    3.368673] CPU 3 to 93/x86/cpu:kick in 504 315 -114684508974 0 . 0 0 0 0 . 0 114685353594
[    3.368705] BUG: scheduling while atomic: swapper/0/1/0x00000003
[    3.368708] 7 locks held by swapper/0/1:
[    3.368710]  #0: ffffffffacbff920 (console_lock){....}-{0:0}, at: vprintk_emit+0x13a/0x2e0
[    3.368721]  #1: ffffffffacbffd48 (console_srcu){....}-{0:0}, at: console_flush_all+0x2d/0x250
[    3.368728]  #2: ffffffffac87f540 (console_owner){....}-{0:0}, at: console_emit_next_record.constprop.22+0x189/0x350
[    3.368735]  #3: ffffffffadaae838 (&port_lock_key){....}-{2:2}, at: serial8250_console_write+0x88/0x3c0
[    3.368745]  #4: ffffffffac86aa50 (cpu_add_remove_lock){....}-{3:3}, at: cpu_up+0x6a/0xd0
[    3.368753]  #5: ffffffffac86a9a0 (cpu_hotplug_lock){....}-{0:0}, at: _cpu_up+0x3d/0x2f0
[    3.368760]  #6: ffffffffac8763b0 (smpboot_threads_lock){....}-{3:3}, at: smpboot_create_threads+0x21/0x80
[    3.368769] Modules linked in:
[    3.368770] Preemption disabled at:
[    3.368771] [<ffffffffaae717a4>] do_cpu_up+0x3e4/0x780
[    3.368777] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W          6.2.0-rc6+ #19
[    3.368781] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018
[    3.368782] Call Trace:
[    3.368783]  <TASK>
[    3.368789]  dump_stack_lvl+0x49/0x63
[    3.368795]  ? do_cpu_up+0x3e4/0x780
[    3.368799]  dump_stack+0x10/0x16
[    3.368802]  __schedule_bug+0xad/0xd0
[    3.368808]  __schedule+0x76/0x8a0
[    3.368812]  ? sched_clock+0x9/0x10
[    3.368817]  ? sched_clock_local+0x17/0x90
[    3.368826]  ? sort_range+0x30/0x30
[    3.368830]  schedule+0x88/0xd0
[    3.368833]  schedule_timeout+0x40/0x320
[    3.368840]  ? __this_cpu_preempt_check+0x13/0x20
[    3.368844]  ? lock_release+0x353/0x3c0
[    3.368852]  ? sort_range+0x30/0x30
[    3.368856]  wait_for_completion_killable+0xe0/0x1c0
[    3.368864]  __kthread_create_on_node+0xfe/0x1e0
[    3.368876]  ? wait_for_completion_killable+0x38/0x1c0
[    3.368884]  kthread_create_on_node+0x46/0x70
[    3.368894]  kthread_create_on_cpu+0x2c/0x90
[    3.368899]  __smpboot_create_thread+0x87/0x140
[    3.368905]  smpboot_create_threads+0x3f/0x80
[    3.368909]  ? idle_thread_get+0x40/0x40
[    3.368913]  cpuhp_invoke_callback+0x13c/0x5d0
[    3.368921]  __cpuhp_invoke_callback_range+0x69/0xf0
[    3.368929]  _cpu_up+0x12a/0x2f0
[    3.368937]  cpu_up+0x8f/0xd0
[    3.368942]  bringup_nonboot_cpus+0x7c/0x160
[    3.368950]  smp_init+0x2a/0x83
[    3.368957]  kernel_init_freeable+0x1a1/0x309
[    3.368961]  ? lock_release+0x353/0x3c0
[    3.368972]  ? rest_init+0x140/0x140
[    3.368977]  kernel_init+0x1a/0x130
[    3.368980]  ret_from_fork+0x22/0x30
[    3.368996]  </TASK>
[    3.369419]
[    3.369420] .... node  #1, CPUs:     #4
[    3.369466] ------------[ cut here ]------------
[    3.369469] CPU 4 to 93/x86/cpu:kick in 378 42 -114685407543 0 . 0 0 0 0 . 0 114687022569
[    3.369474] WARNING: CPU: 0 PID: 1 at arch/x86/kernel/cpu/common.c:2122 cpu_init+0x2d/0x1f0
[    3.369487] Modules linked in:
[    3.369491] ------------[ cut here ]------------
[    3.369494] DEBUG_LOCKS_WARN_ON(val > preempt_count())
[    3.369493] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W          6.2.0-rc6+ #19
[    3.369499] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018


...which points to the WARN_ON here:

static void wait_for_master_cpu(int cpu)
{
#ifdef CONFIG_SMP
         /*
          * wait for ACK from master CPU before continuing
          * with AP initialization
          */
         WARN_ON(cpumask_test_and_set_cpu(cpu, cpu_initialized_mask));
         while (!cpumask_test_cpu(cpu, cpu_callout_mask))
                 cpu_relax();
#endif
}

Let me know if you'd like me to test any changes.

Thanks,

Kim

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ