lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250918074536.GE3289052@noisy.programming.kicks-ass.net>
Date: Thu, 18 Sep 2025 09:45:36 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Oliver Sang <oliver.sang@...el.com>
Cc: oe-lkp@...ts.linux.dev, lkp@...el.com, linux-kernel@...r.kernel.org
Subject: Re: [peterz-queue:sched/hrtick] [entry,hrtimer,x86] ebf33ab570:
 BUG:soft_lockup-CPU##stuck_for#s![pthread_mutex1_:#]

On Fri, Sep 12, 2025 at 10:03:39AM +0800, Oliver Sang wrote:
> hi, Peter Zijlstra,
> 
> On Thu, Sep 11, 2025 at 09:33:04AM +0200, Peter Zijlstra wrote:
> > On Mon, Sep 08, 2025 at 01:24:54PM +0800, kernel test robot wrote:
> > > 
> > > 
> > > Hello,
> > > 
> > > kernel test robot noticed "BUG:soft_lockup-CPU##stuck_for#s![pthread_mutex1_:#]" on:
> > > 
> > > commit: ebf33ab5707c7c9ea25e3c03540b1329ad9aff1d ("entry,hrtimer,x86: Push reprogramming timers into the interrupt return path")
> > > https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git sched/hrtick
> > > 
> > > in testcase: will-it-scale
> > > version: will-it-scale-x86_64-75f66e4-1_20250906
> > > with following parameters:
> > > 
> > > 	nr_task: 100%
> > > 	mode: thread
> > > 	test: pthread_mutex1
> > > 	cpufreq_governor: performance
> > > 
> > > 
> > > 
> > > config: x86_64-rhel-9.4
> > > compiler: gcc-13
> > > test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
> > > 
> > > (please refer to attached dmesg/kmsg for entire log/backtrace)
> > > 
> > 
> > Is this the same issue again as last time? Eg. disabling all the perf
> > monitors makes it go?
> 
> yes, if disabling all monitors, the issue disappeared.

Could you try the below? I can't convince myself it can make a
difference, but while rebasing the patches I noted that we set the TIF
flag while holding cpu_base->lock, and clear after dropping it.

Still, its all on the local CPU with IRQs disabled, so it should not
matter.

--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1989,8 +1989,8 @@ void _hrtimer_rearm(void)
 		now = hrtimer_update_base(cpu_base);
 		expires_next = hrtimer_update_next_event(cpu_base);
 		__hrtimer_rearm(cpu_base, now, expires_next);
+		clear_thread_flag(TIF_HRTIMER_REARM);
 	}
-	clear_thread_flag(TIF_HRTIMER_REARM);
 }
 #endif /* TIF_HRTIMER_REARM */
 #endif /* !CONFIG_HIGH_RES_TIMERS */


Anyway, I'll go post these patches, maybe someone else spots the fail.
I'll be sure to make a note this patch has issues.

Thanks!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ