[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250918074536.GE3289052@noisy.programming.kicks-ass.net>
Date: Thu, 18 Sep 2025 09:45:36 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Oliver Sang <oliver.sang@...el.com>
Cc: oe-lkp@...ts.linux.dev, lkp@...el.com, linux-kernel@...r.kernel.org
Subject: Re: [peterz-queue:sched/hrtick] [entry,hrtimer,x86] ebf33ab570:
BUG:soft_lockup-CPU##stuck_for#s![pthread_mutex1_:#]
On Fri, Sep 12, 2025 at 10:03:39AM +0800, Oliver Sang wrote:
> hi, Peter Zijlstra,
>
> On Thu, Sep 11, 2025 at 09:33:04AM +0200, Peter Zijlstra wrote:
> > On Mon, Sep 08, 2025 at 01:24:54PM +0800, kernel test robot wrote:
> > >
> > >
> > > Hello,
> > >
> > > kernel test robot noticed "BUG:soft_lockup-CPU##stuck_for#s![pthread_mutex1_:#]" on:
> > >
> > > commit: ebf33ab5707c7c9ea25e3c03540b1329ad9aff1d ("entry,hrtimer,x86: Push reprogramming timers into the interrupt return path")
> > > https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git sched/hrtick
> > >
> > > in testcase: will-it-scale
> > > version: will-it-scale-x86_64-75f66e4-1_20250906
> > > with following parameters:
> > >
> > > nr_task: 100%
> > > mode: thread
> > > test: pthread_mutex1
> > > cpufreq_governor: performance
> > >
> > >
> > >
> > > config: x86_64-rhel-9.4
> > > compiler: gcc-13
> > > test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
> > >
> > > (please refer to attached dmesg/kmsg for entire log/backtrace)
> > >
> >
> > Is this the same issue again as last time? Eg. disabling all the perf
> > monitors makes it go?
>
> yes, if disabling all monitors, the issue disappeared.
Could you try the below? I can't convince myself it can make a
difference, but while rebasing the patches I noted that we set the TIF
flag while holding cpu_base->lock, and clear after dropping it.
Still, its all on the local CPU with IRQs disabled, so it should not
matter.
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1989,8 +1989,8 @@ void _hrtimer_rearm(void)
now = hrtimer_update_base(cpu_base);
expires_next = hrtimer_update_next_event(cpu_base);
__hrtimer_rearm(cpu_base, now, expires_next);
+ clear_thread_flag(TIF_HRTIMER_REARM);
}
- clear_thread_flag(TIF_HRTIMER_REARM);
}
#endif /* TIF_HRTIMER_REARM */
#endif /* !CONFIG_HIGH_RES_TIMERS */
Anyway, I'll go post these patches, maybe someone else spots the fail.
I'll be sure to make a note this patch has issues.
Thanks!
Powered by blists - more mailing lists