lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <928b995c-92a0-434a-ac18-58fbd7f467ad@paulmck-laptop>
Date: Sun, 27 Apr 2025 08:51:53 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Oliver Sang <oliver.sang@...el.com>
Cc: oe-lkp@...ts.linux.dev, lkp@...el.com,
	Joel Fernandes <joelagnelf@...dia.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [linux-next:master] [rcutorture]  ddd062f753:
 WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_updown[rcutorture]

On Sun, Apr 27, 2025 at 01:31:11PM +0800, Oliver Sang wrote:
> hi, Paul,
> 
> On Thu, Apr 24, 2025 at 03:56:09PM -0700, Paul E. McKenney wrote:
> > On Wed, Apr 23, 2025 at 08:05:53PM -0700, Paul E. McKenney wrote:
> > > On Thu, Apr 24, 2025 at 09:50:04AM +0800, Oliver Sang wrote:
> > > > hi, Paul,
> > > > 
> > > > On Tue, Apr 22, 2025 at 10:54:10AM -0700, Paul E. McKenney wrote:
> > > > 
> > > > [...]
> > > > 
> > > > > > > > 
> > > > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > > > > > > the same patch/commit), kindly add following tags
> > > > > > > > | Reported-by: kernel test robot <oliver.sang@...el.com>
> > > > > > > > | Closes: https://lore.kernel.org/oe-lkp/202504211513.23f21a0-lkp@intel.com
> > > > > > > > 
> > > > > > > > 
> > > > > > > > The kernel config and materials to reproduce are available at:
> > > > > > > > https://download.01.org/0day-ci/archive/20250421/202504211513.23f21a0-lkp@intel.com
> > > > > > > 
> > > > > > > Good catch, and thank you for your testing efforts!
> > > > > > > 
> > > > > > > Does the patch at the end of this email help?
> > > > > > 
> > > > > > sorry but the patch does not help. one dmesg is attached.
> > > > > 
> > > > > And idiot here failed to check for the exact same problem at the point
> > > > > where the timer is queued, so thank you for bearing with me.
> > > > > 
> > > > > Does the patch at the end of this email (in addition to the previous
> > > > > patch) get the job done?
> > > > 
> > > > unfortunately, it still doesn't fix, one dmesg is attached. part of is as [2]
> > > > 
> > > > but I applied your two patches directly upon ddd062f753, like below:
> > > > 
> > > > * 1c91d0bd4809f (linux-devel/fixup-1539a7e7b61a9) further patch for ddd062f753 from Paul
> > > > * 1539a7e7b61a9 (linux-devel/fixup-ddd062f753) fix for ddd062f753 from Paul E. McKenney
> > > > * ddd062f7536cc rcutorture: Complain if an ->up_read() is delayed more than 10 seconds
> > > > * 1b983c34d5695 rcutorture: Comment invocations of tick_dep_set_task()
> > > > 
> > > > 
> > > > I noticed there are some conflicts while applying your second patch, the
> > > > 1c91d0bd4809f looks like [1]. there is no "int rtorsu_cpu;" before line:
> > > > +	ktime_t rtorsu_kt;
> > > > 
> > > > seems your patch has a different base? I worried if my applyment has
> > > > problems. if so, could you tell me the correct base? thanks!
> > > 
> > > It looks correct to me.  I will rebase at my end to make it apply cleanly
> > > by the end of my tomorrow at the latest.  Attempting it now would likely
> > > just make a mess.  :-/
> > > 
> > > And thank you very much for your help with this because...
> > > 
> 
> [...]
> 
> > > > 
> > > > [  168.048387][  T796] ------------[ cut here ]------------
> > > > [  168.049342][  T796] hrtimer queued at jiffies 4294952214 for 200000000 ns took 1502 jiffies
> > > 
> > > ... I am quite surprised by the 1502 jiffies.  On a HZ=1000 system,
> > > I would have expected this value to be at least 10,000.  I clearly need
> > > to dig into this code much more carefully.
> > 
> > And upon looking at the dmesg.xz that you attached, I see that HZ=100.
> > So there really is a 15-second delay, which is intended to trip this
> > 10-second timeout.
> > 
> > So I guess is it no more Mr. Nice Guy for hrtimers, and therefore
> > HRTIMER_MODE_HARD it is!  ;-)
> > 
> > Once again, thank you for your testing efforts!
> 
> you are welcome!
> 
> > I also rebased my fixup patches to the bottom of my development stack,
> > so the combined patch shown below should apply cleanly.  Here is hoping
> > that the third time is a charm.  ;-)
> 
> we confirm it fixes the WARNING we reported. thanks!
> 
> Tested-by: kernel test robot <oliver.sang@...el.com>

Thank you for bearing with me on this one!

							Thanx, Paul

> > ------------------------------------------------------------------------
> > 
> > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> > index 88d9f5298c3d8..1ebeef8019b86 100644
> > --- a/kernel/rcu/rcutorture.c
> > +++ b/kernel/rcu/rcutorture.c
> > @@ -2445,6 +2445,7 @@ rcu_torture_reader(void *arg)
> >  struct rcu_torture_one_read_state_updown {
> >  	struct hrtimer rtorsu_hrt;
> >  	bool rtorsu_inuse;
> > +	ktime_t rtorsu_kt;
> >  	unsigned long rtorsu_j;
> >  	unsigned long rtorsu_ndowns;
> >  	unsigned long rtorsu_nups;
> > @@ -2488,7 +2489,7 @@ static int rcu_torture_updown_init(void)
> >  	for (i = 0; i < n_up_down; i++) {
> >  		init_rcu_torture_one_read_state(&updownreaders[i].rtorsu_rtors, rand);
> >  		hrtimer_setup(&updownreaders[i].rtorsu_hrt, rcu_torture_updown_hrt, CLOCK_MONOTONIC,
> > -			      HRTIMER_MODE_REL | HRTIMER_MODE_SOFT);
> > +			      HRTIMER_MODE_REL | HRTIMER_MODE_HARD);
> >  		torture_random_init(&updownreaders[i].rtorsu_trs);
> >  		init_rcu_torture_one_read_state(&updownreaders[i].rtorsu_rtors,
> >  						&updownreaders[i].rtorsu_trs);
> > @@ -2539,12 +2540,14 @@ static void rcu_torture_updown_one(struct rcu_torture_one_read_state_updown *rto
> >  		schedule_timeout_idle(HZ);
> >  		return;
> >  	}
> > -	rtorsup->rtorsu_j = jiffies;
> >  	smp_store_release(&rtorsup->rtorsu_inuse, true);
> >  	t = torture_random(&rtorsup->rtorsu_trs) & 0xfffff; // One per million.
> >  	if (t < 10 * 1000)
> >  		t = 200 * 1000 * 1000;
> > -	hrtimer_start(&rtorsup->rtorsu_hrt, t, HRTIMER_MODE_REL | HRTIMER_MODE_SOFT);
> > +	hrtimer_start(&rtorsup->rtorsu_hrt, t, HRTIMER_MODE_REL | HRTIMER_MODE_HARD);
> > +	smp_mb(); // Sample jiffies after posting hrtimer.
> > +	rtorsup->rtorsu_j = jiffies;  // Not used by hrtimer handler.
> > +	rtorsup->rtorsu_kt = t;
> >  }
> >  
> >  /*
> > @@ -2555,6 +2558,7 @@ static void rcu_torture_updown_one(struct rcu_torture_one_read_state_updown *rto
> >  static int
> >  rcu_torture_updown(void *arg)
> >  {
> > +	unsigned long j;
> >  	struct rcu_torture_one_read_state_updown *rtorsup;
> >  
> >  	VERBOSE_TOROUT_STRING("rcu_torture_updown task started");
> > @@ -2562,8 +2566,10 @@ rcu_torture_updown(void *arg)
> >  		for (rtorsup = updownreaders; rtorsup < &updownreaders[n_up_down]; rtorsup++) {
> >  			if (torture_must_stop())
> >  				break;
> > +			j = smp_load_acquire(&jiffies); // Time before ->rtorsu_inuse.
> >  			if (smp_load_acquire(&rtorsup->rtorsu_inuse)) {
> > -				WARN_ON_ONCE(time_after(jiffies, rtorsup->rtorsu_j + 10 * HZ));
> > +				WARN_ONCE(time_after(j, rtorsup->rtorsu_j + 1 + HZ * 10),
> > +					  "hrtimer queued at jiffies %lu for %lld ns took %lu jiffies\n", rtorsup->rtorsu_j, rtorsup->rtorsu_kt, j - rtorsup->rtorsu_j);
> >  				continue;
> >  			}
> >  			rcu_torture_updown_one(rtorsup);

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ