[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aMoCYelz87V8bSzA@uudg.org>
Date: Tue, 16 Sep 2025 21:35:45 -0300
From: "Luis Claudio R. Goncalves" <lgoncalv@...hat.com>
To: kernel test robot <oliver.sang@...el.com>
Cc: oe-lkp@...ts.linux.dev, lkp@...el.com, linux-kernel@...r.kernel.org,
Peter Zijlstra <peterz@...radead.org>,
Crystal Wood <crwood@...hat.com>,
Wander Lairson Costa <wander@...hat.com>,
Valentin Schneider <vschneid@...hat.com>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
aubrey.li@...ux.intel.com, yu.c.chen@...el.com
Subject: Re: [linus:master] [sched] 8671bad873:
INFO:task_blocked_for_more_than#seconds
On Mon, Sep 08, 2025 at 02:06:37PM -0300, Luis Claudio R. Goncalves wrote:
> On Fri, Sep 05, 2025 at 10:49:35AM +0800, kernel test robot wrote:
> >
> >
> > Hello,
> >
> > kernel test robot noticed "INFO:task_blocked_for_more_than#seconds" on:
> >
> > commit: 8671bad873ebeb082afcf7b4501395c374da6023 ("sched: Do not call __put_task_struct() on rt if pi_blocked_on is set")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> > [test failed on linus/master 08b06c30a44555a8b1d14950e4462a52bfa0758b]
> > [test failed on linux-next/master 5d50cf9f7cf20a17ac469c20a2e07c29c1f6aab7]
> >
> > in testcase: rcutorture
> > version:
> > with following parameters:
> >
> > runtime: 300s
> > test: cpuhotplug
> > torture_type: tasks-rude
>
> I ran tests with the boxes I had at hand, x86_64 and arm64, and was unable
> to reproduce the problem. Tomorrow, when I am back from a holiday,, I will
> try to reproduce the problem with x86 (32 bits) VMs and x86 baremetal as it
> seems to be the case on the report.
I have been trying to reproduce the problem for a week now, on both
baremetal and VMs, x86 32 bits, without success. I tried to limit the
amount of CPUs and memory, to mimic as well as possible the test
environment but that has not changed the test results at all.
Are there any other pointers to reproduce this problem? Other than what can
be extracted from the log excerpts available, I mean.
Best regards.
Luis
> In any case, I sent a follow-up patch that isolated those changes to
> kernels with PREEMPT_RT enabled, as initially intended. That should solve
> this case (if really caused by the commit in question). The patch I
> mentioned is:
>
> [RESEND PATCH] sched: restore the behavior of put_task_struct() for non-rt
> https://lore.kernel.org/all/aKxqGLNOp2sWJwnZ@uudg.org/
>
> Best regards,
> Luis
>
> >
> >
> > config: i386-randconfig-017-20250830
> > compiler: gcc-12
> > test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> >
> > (please refer to attached dmesg/kmsg for entire log/backtrace)
> >
> >
> > the issue happens randomly and we observed various issues in tests by this
> > commit, while parent keeps clean.
> >
> > =========================================================================================
> > tbox_group/testcase/rootfs/kconfig/compiler/runtime/test/torture_type:
> > vm-snb/rcutorture/debian-11.1-i386-20220923.cgz/i386-randconfig-017-20250830/gcc-12/300s/cpuhotplug/tasks-rude
> >
> > 7de9d4f946383f48 8671bad873ebeb082afcf7b4501
> > ---------------- ---------------------------
> > fail:runs %reproduction fail:runs
> > | | |
> > :200 16% 32:200 dmesg.BUG:kernel_hang_in_boot_stage
> > :200 0% 1:200 dmesg.BUG:soft_lockup-CPU##stuck_for#s![swapper:#]
> > :200 1% 2:200 dmesg.BUG:workqueue_lockup-pool
> > :200 0% 1:200 dmesg.EIP:kthread_affine_preferred
> > :200 0% 1:200 dmesg.EIP:lock_release
> > :200 0% 1:200 dmesg.EIP:tick_clock_notify
> > :200 12% 23:200 dmesg.INFO:task_blocked_for_more_than#seconds
> > :200 12% 23:200 dmesg.Kernel_panic-not_syncing:hung_task:blocked_tasks
> > :200 0% 1:200 dmesg.Kernel_panic-not_syncing:softlockup:hung_tasks
> > :200 0% 1:200 dmesg.WARNING:at_kernel/kthread.c:#kthread_affine_preferred
> >
> >
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <oliver.sang@...el.com>
> > | Closes: https://lore.kernel.org/oe-lkp/202509051010.e06823ab-lkp@intel.com
> >
> >
> > [ 994.935251][ T26] INFO: task swapper/0:1 blocked for more than 491 seconds.
> > [ 994.947414][ T26] Not tainted 6.16.0-rc6-00086-g8671bad873eb #1
> > [ 994.951523][ T26] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [ 994.960576][ T26] task:swapper/0 state:D stack:5016 pid:1 tgid:1 ppid:0 task_flags:0x0140 flags:0x00004000
> > [ 994.972581][ T26] Call Trace:
> > [ 994.998429][ T26] __schedule (kernel/sched/core.c:5354 kernel/sched/core.c:6954)
> > [ 995.035758][ T26] schedule (kernel/sched/core.c:7037 kernel/sched/core.c:7051)
> > [ 995.044863][ T26] async_synchronize_cookie_domain (kernel/async.c:317 kernel/async.c:310)
> > [ 995.050698][ T26] ? do_wait_intr (kernel/sched/wait.c:384)
> > [ 995.059798][ T26] wait_for_initramfs (init/initramfs.c:778)
> > [ 995.067798][ T26] populate_rootfs (init/initramfs.c:789)
> > [ 995.070767][ T26] do_one_initcall (init/main.c:1274)
> > [ 995.074441][ T26] ? initramfs_async_setup (init/initramfs.c:782)
> > [ 995.098384][ T26] do_initcalls (init/main.c:1335 init/main.c:1352)
> > [ 995.136744][ T26] kernel_init_freeable (init/main.c:1588)
> > [ 995.136744][ T26] ? rest_init (init/main.c:1466)
> > [ 995.158663][ T26] kernel_init (init/main.c:1476)
> > [ 995.177750][ T26] ret_from_fork (arch/x86/kernel/process.c:154)
> > [ 995.178146][ T26] ? rest_init (init/main.c:1466)
> > [ 995.230129][ T26] ret_from_fork_asm (arch/x86/entry/entry_32.S:737)
> > [ 995.230129][ T26] entry_INT80_32 (arch/x86/entry/entry_32.S:945)
> > [ 995.268743][ T26]
> > [ 995.268743][ T26] Showing all locks held in the system:
> > [ 995.336987][ T26] 1 lock held by khungtaskd/26:
> > [ 995.446697][ T26] #0: 830cce10 (rcu_read_lock){....}-{1:3}, at: debug_show_all_locks (kernel/locking/lockdep.c:6768 (discriminator 1))
> > [ 995.464546][ T26] 4 locks held by kworker/u4:2/29:
> > [ 995.470758][ T26] 2 locks held by kworker/0:3/38:
> > [ 995.498593][ T26]
> > [ 995.599251][ T26] =============================================
> > [ 995.599251][ T26]
> > [ 995.729940][ T26] Kernel panic - not syncing: hung_task: blocked tasks
> > [ 995.729940][ T26] CPU: 0 UID: 0 PID: 26 Comm: khungtaskd Not tainted 6.16.0-rc6-00086-g8671bad873eb #1 PREEMPT(full)
> > [ 995.729940][ T26] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> > [ 995.729940][ T26] Call Trace:
> > [ 995.729940][ T26] dump_stack_lvl (lib/dump_stack.c:124)
> > [ 995.729940][ T26] dump_stack (lib/dump_stack.c:130)
> > [ 995.729940][ T26] panic (kernel/panic.c:382)
> > [ 995.729940][ T26] check_hung_uninterruptible_tasks (kernel/hung_task.c:311)
> > [ 995.729940][ T26] watchdog (kernel/hung_task.c:470)
> > [ 995.729940][ T26] kthread (kernel/kthread.c:464)
> > [ 995.729940][ T26] ? check_hung_uninterruptible_tasks (kernel/hung_task.c:453)
> > [ 995.729940][ T26] ? kthread_complete_and_exit (kernel/kthread.c:413)
> > [ 995.729940][ T26] ret_from_fork (arch/x86/kernel/process.c:154)
> > [ 995.729940][ T26] ? kthread_complete_and_exit (kernel/kthread.c:413)
> > [ 995.729940][ T26] ret_from_fork_asm (arch/x86/entry/entry_32.S:737)
> > [ 995.729940][ T26] entry_INT80_32 (arch/x86/entry/entry_32.S:945)
> >
> >
> >
> > The kernel config and materials to reproduce are available at:
> > https://download.01.org/0day-ci/archive/20250905/202509051010.e06823ab-lkp@intel.com
> >
> >
> >
> > --
> > 0-DAY CI Kernel Test Service
> > https://github.com/intel/lkp-tests/wiki
> >
> ---end quoted text---
---end quoted text---
Powered by blists - more mailing lists