lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAF+s44SN6fe6PCdeN-e5LGbFOdadiDDWdAex6oRPT+e+uzfhSA@mail.gmail.com>
Date: Tue, 16 Dec 2025 20:12:53 +0800
From: Pingfan Liu <piliu@...hat.com>
To: kernel test robot <oliver.sang@...el.com>
Cc: oe-lkp@...ts.linux.dev, lkp@...el.com, linux-kernel@...r.kernel.org, 
	Tejun Heo <tj@...nel.org>, Waiman Long <longman@...hat.com>, 
	Chen Ridong <chenridong@...weicloud.com>, Peter Zijlstra <peterz@...radead.org>, 
	Juri Lelli <juri.lelli@...hat.com>, Pierre Gondois <pierre.gondois@....com>, 
	Ingo Molnar <mingo@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>, 
	Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>, 
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, 
	Valentin Schneider <vschneid@...hat.com>, aubrey.li@...ux.intel.com, yu.c.chen@...el.com
Subject: Re: [linus:master] [sched/deadline] 318e18ed22: BUG:soft_lockup-CPU##stuck_for#s![swapper:#]

 i

On Tue, Dec 16, 2025 at 3:45 PM kernel test robot <oliver.sang@...el.com> wrote:
>
>
>
> Hello,
>
> kernel test robot noticed "BUG:soft_lockup-CPU##stuck_for#s![swapper:#]" on:
>
> commit: 318e18ed22e89397635e15095c014accaf47ed30 ("sched/deadline: Walk up cpuset hierarchy to decide root domain when hot-unplug")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> [test failed on linus/master      d358e5254674b70f34c847715ca509e46eb81e6f]
> [test failed on linux-next/master 5ce74bc1b7cb2732b22f9c93082545bc655d6547]
>
> in testcase: trinity
> version: trinity-static-i386-x86_64-f93256fb_2019-08-28
> with following parameters:
>
>         runtime: 300s
>         group: group-03
>         nr_groups: 5
>
>
> config: i386-randconfig-r071-20250410

There is no CONFIG_CPUSETS in this config file. So I think it is not
related to cpuset_cpus_allowed_locked().

And there is no extra lock introduced into dl_add_task_root_domain(),
so it is weird that the commit causes the soft lockup. I will look
into it to see the real root cause.

Pingfan

> compiler: gcc-14
> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 32G
>
> (please refer to attached dmesg/kmsg for entire log/backtrace)
>
>
> we don't have enough knowledge to analyze the relation between the change
> and the issue, so we run tests up to 1000 times. the issue can be reproduced
> 65 times out of 1000 runs. while parent always keeps clean.
>
> =========================================================================================
> tbox_group/testcase/rootfs/kconfig/compiler/runtime/group/nr_groups:
>   vm-snb/trinity/openwrt-i386-generic-20190428.cgz/i386-randconfig-r071-20250410/gcc-14/300s/group-03/5
>
>
> 1f382215119a0bc1 318e18ed22e89397635e15095c0
> ---------------- ---------------------------
>        fail:runs  %reproduction    fail:runs
>            |             |             |
>            :1000         8%          82:1000  dmesg.BUG:kernel_hang_in_boot_stage
>            :1000         7%          69:1000  dmesg.BUG:soft_lockup-CPU##stuck_for#s![swapper:#]   <----
>            :1000         8%          82:1000  dmesg.BUG:workqueue_lockup-pool
>            :1000         7%          69:1000  dmesg.EIP:tick_clock_notify
>            :1000         2%          15:1000  dmesg.INFO:rcu_preempt_detected_stalls_on_CPUs/tasks
>            :1000         5%          53:1000  dmesg.INFO:task_blocked_for_more_than#seconds
>            :1000         7%          69:1000  dmesg.Kernel_panic-not_syncing:softlockup:hung_tasks
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@...el.com>
> | Closes: https://lore.kernel.org/oe-lkp/202512161547.cd3a9187-lkp@intel.com
>
>
> [  699.774873][    C0] watchdog: BUG: soft lockup - CPU#0 stuck for 626s! [swapper/0:1]
> [  699.775553][    C0] CPU#0 Utilization every 96000ms during lockup:
> [  699.775553][    C0]  #1:  26% system,          0% softirq,     0% hardirq,     0% idle
> [  699.775553][    C0]  #2:  25% system,          0% softirq,     0% hardirq,     0% idle
> [  699.775553][    C0]  #3:  25% system,          0% softirq,     0% hardirq,     0% idle
> [  699.775553][    C0]  #4:  34% system,          0% softirq,     0% hardirq,     0% idle
> [  699.775553][    C0]  #5: 100% system,          0% softirq,     0% hardirq,     0% idle
> [  699.775553][    C0] Modules linked in:
> [  699.775553][    C0] irq event stamp: 201566
> [  699.775553][    C0] hardirqs last  enabled at (201565): timekeeping_notify (arch/x86/include/asm/irqflags.h:42 arch/x86/include/asm/irqflags.h:119 arch/x86/include/asm/irqflags.h:159 include/linux/stop_machine.h:172 include/linux/stop_machine.h:179 kernel/time/timekeeping.c:1634)
> [  699.775553][    C0] hardirqs last disabled at (201566): sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1052)
> [  699.775553][    C0] softirqs last  enabled at (200324): handle_softirqs (kernel/softirq.c:469 (discriminator 2) kernel/softirq.c:650 (discriminator 2))
> [  699.775553][    C0] softirqs last disabled at (200309): __do_softirq (kernel/softirq.c:657)
> [  699.775553][    C0] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.18.0-rc2-00020-g318e18ed22e8 #1 PREEMPT(full)
> [  699.775553][    C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> [  699.775553][    C0] EIP: tick_clock_notify (arch/x86/include/asm/bitops.h:55 include/asm-generic/bitops/instrumented-atomic.h:29 kernel/time/tick-sched.c:1633)
> [  699.775553][    C0] Code: 8b 45 e4 89 1d 24 d5 6a 83 a3 38 d5 6a 83 89 15 3c d5 6a 83 83 c4 10 5b 5e 5f 5d c3 2e 8d b4 26 00 00 00 00 8d b6 00 00 00 00 <80> 0d 44 d5 6a 83 01 c3 2e 8d b4 26 00 00 00 00 80 0d 44 d5 6a 83
> All code
> ========
>    0:   8b 45 e4                mov    -0x1c(%rbp),%eax
>    3:   89 1d 24 d5 6a 83       mov    %ebx,-0x7c952adc(%rip)        # 0xffffffff836ad52d
>    9:   a3 38 d5 6a 83 89 15    movabs %eax,0xd53c1589836ad538
>   10:   3c d5
>   12:   6a 83                   push   $0xffffffffffffff83
>   14:   83 c4 10                add    $0x10,%esp
>   17:   5b                      pop    %rbx
>   18:   5e                      pop    %rsi
>   19:   5f                      pop    %rdi
>   1a:   5d                      pop    %rbp
>   1b:   c3                      ret
>   1c:   2e 8d b4 26 00 00 00    cs lea 0x0(%rsi,%riz,1),%esi
>   23:   00
>   24:   8d b6 00 00 00 00       lea    0x0(%rsi),%esi
>   2a:*  80 0d 44 d5 6a 83 01    orb    $0x1,-0x7c952abc(%rip)        # 0xffffffff836ad575               <-- trapping instruction
>   31:   c3                      ret
>   32:   2e 8d b4 26 00 00 00    cs lea 0x0(%rsi,%riz,1),%esi
>   39:   00
>   3a:   80                      .byte 0x80
>   3b:   0d 44 d5 6a 83          or     $0x836ad544,%eax
>
> Code starting with the faulting instruction
> ===========================================
>    0:   80 0d 44 d5 6a 83 01    orb    $0x1,-0x7c952abc(%rip)        # 0xffffffff836ad54b
>    7:   c3                      ret
>    8:   2e 8d b4 26 00 00 00    cs lea 0x0(%rsi,%riz,1),%esi
>    f:   00
>   10:   80                      .byte 0x80
>   11:   0d 44 d5 6a 83          or     $0x836ad544,%eax
> [  699.775553][    C0] EAX: 0003135d EBX: 8322ef00 ECX: 00000006 EDX: 82f6bcac
> [  699.775553][    C0] ESI: 00000200 EDI: 836ac3e0 EBP: 84c97ed8 ESP: 84c97ebc
> [  699.775553][    C0] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00000202
> [  699.775553][    C0] CR0: 80050033 CR2: ffdaa000 CR3: 03aeb000 CR4: 000406d0
> [  699.775553][    C0] Call Trace:
> [  699.775553][    C0]  ? timekeeping_notify (kernel/time/timekeeping.c:1636)
> [  699.775553][    C0]  __clocksource_select (kernel/time/clocksource.c:1069 (discriminator 1))
> [  699.775553][    C0]  ? boot_override_clock (kernel/time/clocksource.c:1101)
> [  699.775553][    C0]  clocksource_select (kernel/time/clocksource.c:1086)
> [  699.775553][    C0]  clocksource_done_booting (kernel/time/clocksource.c:1110)
> [  699.775553][    C0]  do_one_initcall (init/main.c:1283)
> [  699.775553][    C0]  ? rdinit_setup (init/main.c:1331)
> [  699.775553][    C0]  do_initcalls (init/main.c:1344 (discriminator 3) init/main.c:1361 (discriminator 3))
> [  699.775553][    C0]  kernel_init_freeable (init/main.c:1597)
> [  699.775553][    C0]  ? rest_init (init/main.c:1475)
> [  699.775553][    C0]  kernel_init (init/main.c:1485)
> [  699.775553][    C0]  ret_from_fork (arch/x86/kernel/process.c:164)
> [  699.775553][    C0]  ? rest_init (init/main.c:1475)
> [  699.775553][    C0]  ret_from_fork_asm (arch/x86/entry/entry_32.S:737)
> [  699.775553][    C0]  entry_INT80_32 (arch/x86/entry/entry_32.S:945)
> [  699.775553][    C0] Kernel panic - not syncing: softlockup: hung tasks
> [  699.775553][    C0] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Tainted: G             L      6.18.0-rc2-00020-g318e18ed22e8 #1 PREEMPT(full)
> [  699.775553][    C0] Tainted: [L]=SOFTLOCKUP
> [  699.775553][    C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> [  699.775553][    C0] Call Trace:
> [  699.775553][    C0]  dump_stack_lvl (lib/dump_stack.c:122)
> [  699.775553][    C0]  dump_stack (lib/dump_stack.c:130)
> [  699.775553][    C0]  vpanic (kernel/panic.c:487)
> [  699.775553][    C0]  panic (kernel/panic.c:626)
> [  699.775553][    C0]  watchdog_timer_fn (kernel/watchdog.c:753)
> [  699.775553][    C0]  __hrtimer_run_queues+0x125/0x1e0
> [  699.775553][    C0]  ? schedule_work (drivers/usb/core/hub.c:925)
> [  699.775553][    C0]  hrtimer_run_queues (kernel/time/hrtimer.c:1999)
> [  699.775553][    C0]  update_process_times (kernel/time/timer.c:2416 kernel/time/timer.c:2472)
> [  699.775553][    C0]  tick_periodic (kernel/time/tick-common.c:103)
> [  699.775553][    C0]  tick_handle_periodic (kernel/time/tick-common.c:144)
> [  699.775553][    C0]  ? vmware_sched_clock (arch/x86/kernel/apic/apic.c:1052)
> [  699.775553][    C0]  __sysvec_apic_timer_interrupt (arch/x86/include/asm/trace/irq_vectors.h:40 (discriminator 4) arch/x86/include/asm/trace/irq_vectors.h:40 (discriminator 4) arch/x86/kernel/apic/apic.c:1059 (discriminator 4))
> [  699.775553][    C0]  sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1052 (discriminator 2) arch/x86/kernel/apic/apic.c:1052 (discriminator 2))
> [  699.775553][    C0]  handle_exception (arch/x86/entry/entry_32.S:1055)
> [  699.775553][    C0] EIP: tick_clock_notify (arch/x86/include/asm/bitops.h:55 include/asm-generic/bitops/instrumented-atomic.h:29 kernel/time/tick-sched.c:1633)
> [  699.775553][    C0] Code: 8b 45 e4 89 1d 24 d5 6a 83 a3 38 d5 6a 83 89 15 3c d5 6a 83 83 c4 10 5b 5e 5f 5d c3 2e 8d b4 26 00 00 00 00 8d b6 00 00 00 00 <80> 0d 44 d5 6a 83 01 c3 2e 8d b4 26 00 00 00 00 80 0d 44 d5 6a 83
> All code
> ========
>    0:   8b 45 e4                mov    -0x1c(%rbp),%eax
>    3:   89 1d 24 d5 6a 83       mov    %ebx,-0x7c952adc(%rip)        # 0xffffffff836ad52d
>    9:   a3 38 d5 6a 83 89 15    movabs %eax,0xd53c1589836ad538
>   10:   3c d5
>   12:   6a 83                   push   $0xffffffffffffff83
>   14:   83 c4 10                add    $0x10,%esp
>   17:   5b                      pop    %rbx
>   18:   5e                      pop    %rsi
>   19:   5f                      pop    %rdi
>   1a:   5d                      pop    %rbp
>   1b:   c3                      ret
>   1c:   2e 8d b4 26 00 00 00    cs lea 0x0(%rsi,%riz,1),%esi
>   23:   00
>   24:   8d b6 00 00 00 00       lea    0x0(%rsi),%esi
>   2a:*  80 0d 44 d5 6a 83 01    orb    $0x1,-0x7c952abc(%rip)        # 0xffffffff836ad575               <-- trapping instruction
>   31:   c3                      ret
>   32:   2e 8d b4 26 00 00 00    cs lea 0x0(%rsi,%riz,1),%esi
>   39:   00
>   3a:   80                      .byte 0x80
>   3b:   0d 44 d5 6a 83          or     $0x836ad544,%eax
>
> Code starting with the faulting instruction
> ===========================================
>    0:   80 0d 44 d5 6a 83 01    orb    $0x1,-0x7c952abc(%rip)        # 0xffffffff836ad54b
>    7:   c3                      ret
>    8:   2e 8d b4 26 00 00 00    cs lea 0x0(%rsi,%riz,1),%esi
>    f:   00
>   10:   80                      .byte 0x80
>   11:   0d 44 d5 6a 83          or     $0x836ad544,%eax
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20251216/202512161547.cd3a9187-lkp@intel.com
>
>
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ