[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <69728632c464b_1d33100dd@dwillia2-mobl4.notmuch>
Date: Thu, 22 Jan 2026 12:18:58 -0800
From: <dan.j.williams@...el.com>
To: kernel test robot <oliver.sang@...el.com>, Dan Williams
<dan.j.williams@...el.com>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, Alison Schofield
<alison.schofield@...el.com>, Vishal Verma <vishal.l.verma@...el.com>, "Ira
Weiny" <ira.weiny@...el.com>, Dan Williams <dan.j.williams@...el.com>,
<linux-cxl@...r.kernel.org>, Dave Jiang <dave.jiang@...el.com>, "Smita
Koralahalli" <Smita.KoralahalliChannabasappa@....com>,
<linux-kernel@...r.kernel.org>, <nvdimm@...ts.linux.dev>,
<oliver.sang@...el.com>
Subject: Re: [cxl:for-7.0/cxl-init] [dax/hmem, e820, resource] bc62f5b308:
BUG:soft_lockup-CPU##stuck_for#s![kworker:#:#]
kernel test robot wrote:
>
>
> Hello,
>
> FYI. we don't have enough knowledge to understand how the issues we found
> in the tests are related with the code. we just run the tests up to 200 times
> for both this commit and parent, noticed there are various random issues on
> this commit, but always clean on parent.
>
>
> =========================================================================================
> tbox_group/testcase/rootfs/kconfig/compiler/sleep:
> vm-snb/boot/debian-11.1-i386-20220923.cgz/i386-randconfig-141-20260117/gcc-14/1
>
> 29317f8dc6ed601e bc62f5b308cbdedf29132fe96e9
> ---------------- ---------------------------
> fail:runs %reproduction fail:runs
> | | |
> :200 2% 5:200 dmesg.BUG:soft_lockup-CPU##stuck_for#s![kworker##:#]
> :200 2% 5:200 dmesg.BUG:soft_lockup-CPU##stuck_for#s![kworker:#:#]
> :200 8% 17:200 dmesg.BUG:soft_lockup-CPU##stuck_for#s![swapper:#]
> :200 2% 4:200 dmesg.BUG:workqueue_lockup-pool
> :200 0% 1:200 dmesg.EIP:__schedule
> :200 0% 1:200 dmesg.EIP:_raw_spin_unlock_irq
> :200 2% 4:200 dmesg.EIP:_raw_spin_unlock_irqrestore
> :200 6% 11:200 dmesg.EIP:console_emit_next_record
> :200 0% 1:200 dmesg.EIP:finish_task_switch
> :200 3% 6:200 dmesg.EIP:lock_acquire
> :200 1% 2:200 dmesg.EIP:lock_release
> :200 1% 2:200 dmesg.EIP:queue_work_on
> :200 0% 1:200 dmesg.EIP:rcu_preempt_deferred_qs_irqrestore
> :200 1% 2:200 dmesg.EIP:timekeeping_notify
> :200 0% 1:200 dmesg.INFO:rcu_preempt_detected_stalls_on_CPUs/tasks
> :200 0% 1:200 dmesg.INFO:task_blocked_for_more_than#seconds
> :200 14% 27:200 dmesg.Kernel_panic-not_syncing:softlockup:hung_tasks
>
> below is full report.
So this is good data, but I do not know what to do with it. The
RCU_STRICT_GRACE_PERIOD feature seems to want to make RCU usage bugs
more detectable, but at the risk of false positives. My concern is that
this patch disturbs 32-bit x86 builds just enough to make the softlockup
detector start getting upset about this rcu_gp::strict_work_handler
workqueue.
So unless this causes actual boot failures all I can assume is that this
is a false positive report. Nothing in this patch is touching workqueues
or object lifetime issues. So I can only assume this is a side effect of
instruction cache layout, or similar.
Powered by blists - more mailing lists