lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aAmeJmL0hUx2kcXC@xsang-OptiPlex-9020>
Date: Thu, 24 Apr 2025 10:12:54 +0800
From: Oliver Sang <oliver.sang@...el.com>
To: Arnd Bergmann <arnd@...db.de>
CC: <oe-lkp@...ts.linux.dev>, kernel test robot <lkp@...el.com>,
	<linux-kernel@...r.kernel.org>, Ingo Molnar <mingo@...nel.org>, "Linus
 Torvalds" <torvalds@...ux-foundation.org>, <oliver.sang@...el.com>
Subject: Re: [linus:master] [x86/cpu]  f388f60ca9:
 BUG:soft_lockup-CPU##stuck_for#s![swapper:#]

hi, Arnd,

On Tue, Apr 22, 2025 at 12:16:33PM +0200, Arnd Bergmann wrote:
> On Mon, Apr 21, 2025, at 10:12, kernel test robot wrote:
> > Hello,
> >
> > by this commit, we notice big config diff [1]
> >
> > then in this rcutorture tests, parent runs quite clean, f388f60ca9 shows
> > various random issues.
> 
> Thanks for the report!
> 
> From my initial reading, my patch most likely caught a preexisting bug,
> but my patch itself is correct. It's worth investigating regardless,
> at the minimum we should perhaps prevent an invalid configuration from
> building or from booting.
> 
> > config: i386-randconfig-r071-20250410
> 
> Generally, I would not expect 'randconfig' kernels to pass all tests,
> and what happened here is that removing the CONFIG_MK8 option made it
> pick some completely different CPU
> 
> > compiler: gcc-12
> > test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> 
> The most relevant options here are
> 
>  -# CONFIG_M486SX is not set
>  +CONFIG_M486SX=y
>   # CONFIG_SMP is not set
>   CONFIG_X86_GENERIC=y
> 
> In theory, setting X86_GENERIC should make a kernel built for an
> older CPU work on any newer one. In practice, I'm not surprised
> that this fails: While AMD K8 is ten years older than Intel Sandy
> Bridge, they are architecturally still very similar. The i486SX
> is another decade older, but its design is as far removed from
> both K8 and Sandy Bridge as it gets.
> 
> It would be nice to not have to support 486sx any more.
> We have discussed removing support for older CPUs without
> TSC, FPU and CX8 in the past, but so far always kept them
> around.
> 
> > [ 721.016779][ C0] hardirqs last disabled at (159506): 
> > sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1049) 
> > [ 721.016779][ C0] softirqs last enabled at (159174): handle_softirqs 
> > (kernel/softirq.c:408 kernel/softirq.c:589) 
> > [ 721.016779][ C0] softirqs last disabled at (159159): __do_softirq 
> > (kernel/softirq.c:596) 
> > [  721.016779][    C0] CPU: 0 UID: 0 PID: 1 Comm: swapper Not tainted 
> > 6.14.0-rc3-00037-gf388f60ca904 #1
> > [  721.016779][    C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 
> > 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > [ 721.016779][ C0] EIP: timekeeping_notify 
> > (kernel/time/timekeeping.c:1522) 
> 
> Timekeeping code could be related, I see that CONFIG_X86_TSC
> is disabled for i486SX configurations, so even if a TSC is present
> in the emulated machine, it is not being used to measure time
> accurately.
> 
> > -CONFIG_X86_CMPXCHG64=y
> 
> This could be another issue, if there is code that relies on
> the cx8/cmpxchg8b feature to be used. Since this is a non-SMP
> kernel, this is less likely to be the cause of the problem.

thanks a lot for all these details!

> 
> Can you try what happens when you enable the two options, either
> by changing CONFIG_M486SX to CONFIG_M586TSC, or with a patch
> like the one below? Note that CONFIG_X86_CMPXCHG64 recently
> got renamed to CONFIG_X86_CX8, but they are the exact same thing.

I applied your patch directly upon f388f60ca9 (change for X86_CMPXCHG64
instead of X86_CX8 as you metnioned), commit id is
c1f7ef63239411313163a7b1bff654236f48351c

after building, the config has below diff to f388f60ca9

--- f388f60ca9041a95c9b3f157d316ed7c8f297e44/.config    2025-04-15 15:41:17.009901645 +0800
+++ c1f7ef63239411313163a7b1bff654236f48351c/.config    2025-04-23 09:36:43.718421931 +0800
@@ -351,7 +351,9 @@ CONFIG_X86_F00F_BUG=y
 CONFIG_X86_INVD_BUG=y
 CONFIG_X86_ALIGNMENT_16=y
 CONFIG_X86_INTEL_USERCOPY=y
-CONFIG_X86_MINIMUM_CPU_FAMILY=4
+CONFIG_X86_TSC=y
+CONFIG_X86_CMPXCHG64=y
+CONFIG_X86_MINIMUM_CPU_FAMILY=5
 CONFIG_IA32_FEAT_CTL=y
 CONFIG_X86_VMX_FEATURE_NAMES=y
 CONFIG_CPU_SUP_INTEL=y
@@ -5745,6 +5747,7 @@ CONFIG_GENERIC_NET_UTILS=y
 # CONFIG_PRIME_NUMBERS is not set
 CONFIG_RATIONAL=y
 CONFIG_GENERIC_IOMAP=y
+CONFIG_ARCH_USE_CMPXCHG_LOCKREF=y
 CONFIG_ARCH_HAS_FAST_MULTIPLIER=y
 CONFIG_ARCH_USE_SYM_ANNOTATIONS=y

by running same tests, now it backs to the clean status like
fc2d5cbe541032e7 (parent of f388f60ca9)

(the statistics data for fc2d5cbe541032e7 and f388f60ca9 has some difference to
the data we shared last time due to some auto cleanup logic in our service which
removes some results which are suspiciously caused by our env problem)


fc2d5cbe541032e7 f388f60ca9041a95c9b3f157d31 c1f7ef63239411313163a7b1bff
---------------- --------------------------- ---------------------------
       fail:runs  %reproduction    fail:runs  %reproduction    fail:runs
           |             |             |             |             |
           :496         29%         145:494          0%            :500   last_state.booting
           :496          7%          35:494          0%            :500   dmesg.BUG:kernel_hang_in_boot_stage
           :496          9%          45:494          0%            :500   dmesg.BUG:soft_lockup-CPU##stuck_for#s![swapper:#]
           :496          0%           1:494          0%            :500   dmesg.EIP:__timer_delete_sync
           :496          1%           5:494          0%            :500   dmesg.EIP:_raw_spin_unlock_irq
           :496          0%           2:494          0%            :500   dmesg.EIP:_raw_spin_unlock_irqrestore
           :496          0%           1:494          0%            :500   dmesg.EIP:console_emit_next_record
           :496          0%           1:494          0%            :500   dmesg.EIP:handle_softirqs
           :496          1%           3:494          0%            :500   dmesg.EIP:lock_acquire
           :496          0%           2:494          0%            :500   dmesg.EIP:lock_release
           :496          0%           1:494          0%            :500   dmesg.EIP:queue_delayed_work_on
           :496          9%          45:494          0%            :500   dmesg.EIP:timekeeping_notify
           :496          3%          14:494          0%            :500   dmesg.INFO:rcu_preempt_detected_stalls_on_CPUs/tasks
           :496          6%          32:494          0%            :500   dmesg.INFO:task_blocked_for_more_than#seconds
           :496          9%          45:494          0%            :500   dmesg.Kernel_panic-not_syncing:softlockup:hung_tasks

> 
> diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu
> index f928cf6e3252..ac6cc69060f1 100644
> --- a/arch/x86/Kconfig.cpu
> +++ b/arch/x86/Kconfig.cpu
> @@ -317,7 +317,6 @@ config X86_USE_PPRO_CHECKSUM
>  
>  config X86_TSC
>         def_bool y
> -       depends on (MWINCHIP3D || MCRUSOE || MEFFICEON || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || MVIAC3_2 || MVIAC7 || MGEODEGX1 || MGEODE_LX || MATOM) || X86_64
>  
>  config X86_HAVE_PAE
>         def_bool y
> @@ -325,7 +324,6 @@ config X86_HAVE_PAE
>  
>  config X86_CX8
>         def_bool y
> -       depends on X86_HAVE_PAE || M586TSC || M586MMX || MK6 || MK7 || MGEODEGX1 || MGEODE_LX
>  
>  # this should be set for all -march=.. options where the compiler
>  # generates cmov.
> 
>       Arnd

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ