lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtBMdLRN8WVUmvXXZQK5fE_rZEFJenujrxpqWYAQ7b-jBw@mail.gmail.com>
Date: Tue, 9 Jul 2024 12:03:42 +0200
From: Vincent Guittot <vincent.guittot@...aro.org>
To: kernel test robot <oliver.sang@...el.com>
Cc: oe-lkp@...ts.linux.dev, lkp@...el.com, linux-kernel@...r.kernel.org, 
	Ingo Molnar <mingo@...nel.org>, Lukasz Luba <lukasz.luba@....com>, 
	Qais Yousef <qyousef@...alina.io>, linux-doc@...r.kernel.org, aubrey.li@...ux.intel.com, 
	yu.c.chen@...el.com
Subject: Re: [linus:master] [sched/pelt] 97450eb909: INFO:task_blocked_for_more_than#seconds

On Tue, 9 Jul 2024 at 09:22, kernel test robot <oliver.sang@...el.com> wrote:
>
>
>
> Hello,
>
> kernel test robot noticed "INFO:task_blocked_for_more_than#seconds" on:
>
> commit: 97450eb909658573dcacc1063b06d3d08642c0c1 ("sched/pelt: Remove shift of thermal clock")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

First, I'm surprised that an Intel platform is impacted by this patch
because Intel doesn't use it AFAIK.
Then, this patch mainly remove a right shift i.e.:
instead of:
  return rq_clock_task(rq) >> sched_hw_decay_shift
we are now doing:
  return rq_clock_task(rq)

Could it be a false positive ?


>
> [test failed on linus/master      22f902dfc51eb3602ff9b505ac3980f6ff77b1df]
> [test failed on linux-next/master 0b58e108042b0ed28a71cd7edf5175999955b233]
>
> in testcase: rcutorture
> version:
> with following parameters:
>
>         runtime: 300s
>         test: cpuhotplug
>         torture_type: tasks
>
>
>
> compiler: gcc-13
> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
>
> (please refer to attached dmesg/kmsg for entire log/backtrace)
>
>
> we noticed the issue is random, 86 times out of 500 runs as below, while
> keeping clean on parent.
>
>
> d4dbc991714eefcb 97450eb909658573dcacc1063b0
> ---------------- ---------------------------
>        fail:runs  %reproduction    fail:runs
>            |             |             |
>            :500         17%          86:500   dmesg.INFO:task_blocked_for_more_than#seconds
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@...el.com>
> | Closes: https://lore.kernel.org/oe-lkp/202407091527.bb0be229-lkp@intel.com
>
>
> [  996.963402][   T17] INFO: task swapper:1 blocked for more than 491 seconds.
> [  996.973637][   T17]       Tainted: G        W          6.9.0-rc1-00051-g97450eb90965 #1
> [  996.983009][   T17] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  996.992369][   T17] task:swapper         state:D stack:5680  pid:1     tgid:1     ppid:0      flags:0x00004000
> [  997.003585][   T17] Call Trace:
> [ 997.022071][ T17] __schedule (kernel/sched/core.c:5412 kernel/sched/core.c:6746)
> [ 997.032795][ T17] ? schedule (kernel/sched/core.c:6804 kernel/sched/core.c:6837)
> [ 997.051350][ T17] schedule (arch/x86/include/asm/preempt.h:84 (discriminator 13) kernel/sched/core.c:6824 (discriminator 13) kernel/sched/core.c:6838 (discriminator 13))
> [ 997.061394][ T17] async_synchronize_cookie_domain (kernel/async.c:317 (discriminator 9) kernel/async.c:310 (discriminator 9))
> [ 997.071656][ T17] ? add_wait_queue (kernel/sched/wait.c:383)
> [ 997.084201][ T17] wait_for_initramfs (init/initramfs.c:757)
> [ 997.093158][ T17] ? do_header (init/initramfs.c:761)
> [ 997.104204][ T17] populate_rootfs (init/initramfs.c:768)
> [ 997.113736][ T17] do_one_initcall (init/main.c:1238)
> [ 997.122668][ T17] ? parameq (kernel/params.c:90 (discriminator 1) kernel/params.c:99 (discriminator 1))
> [ 997.133272][ T17] ? rdinit_setup (init/main.c:1286)
> [ 997.142176][ T17] ? parse_args (kernel/params.c:142 (discriminator 1) kernel/params.c:186 (discriminator 1))
> [ 997.172210][ T17] ? do_initcalls (init/main.c:1298 init/main.c:1316)
> [ 997.183570][ T17] do_initcalls (init/main.c:1299 (discriminator 1) init/main.c:1316 (discriminator 1))
> [ 997.194241][ T17] ? rest_init (init/main.c:1429)
> [ 997.204290][ T17] kernel_init_freeable (init/main.c:1552)
> [ 997.214432][ T17] kernel_init (init/main.c:1439)
> [ 997.223979][ T17] ret_from_fork (arch/x86/kernel/process.c:153)
> [ 997.232870][ T17] ? rest_init (init/main.c:1429)
> [ 997.244227][ T17] ret_from_fork_asm (arch/x86/entry/entry_32.S:737)
> [ 997.254412][ T17] entry_INT80_32 (arch/x86/entry/entry_32.S:944)
> [  997.831274][   T17]
> [  997.831274][   T17] Showing all locks held in the system:
> [  997.840770][   T17] 4 locks held by kworker/u4:1/16:
> [  997.844292][   T17] 1 lock held by khungtaskd/17:
> [ 997.853205][ T17] #0: c26e50c8 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks (kernel/locking/lockdep.c:6612)
> [  997.880625][   T17]
> [  997.883881][   T17] =============================================
> [  997.883881][   T17]
> BUG: kernel hang in boot stage
>
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20240709/202407091527.bb0be229-lkp@intel.com
>
>
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ