[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aA3BH+UzbypCXaDZ@xsang-OptiPlex-9020>
Date: Sun, 27 Apr 2025 13:31:11 +0800
From: Oliver Sang <oliver.sang@...el.com>
To: "Paul E. McKenney" <paulmck@...nel.org>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, Joel Fernandes
<joelagnelf@...dia.com>, <linux-kernel@...r.kernel.org>,
<oliver.sang@...el.com>
Subject: Re: [linux-next:master] [rcutorture] ddd062f753:
WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_updown[rcutorture]
hi, Paul,
On Thu, Apr 24, 2025 at 03:56:09PM -0700, Paul E. McKenney wrote:
> On Wed, Apr 23, 2025 at 08:05:53PM -0700, Paul E. McKenney wrote:
> > On Thu, Apr 24, 2025 at 09:50:04AM +0800, Oliver Sang wrote:
> > > hi, Paul,
> > >
> > > On Tue, Apr 22, 2025 at 10:54:10AM -0700, Paul E. McKenney wrote:
> > >
> > > [...]
> > >
> > > > > > >
> > > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > > > > > the same patch/commit), kindly add following tags
> > > > > > > | Reported-by: kernel test robot <oliver.sang@...el.com>
> > > > > > > | Closes: https://lore.kernel.org/oe-lkp/202504211513.23f21a0-lkp@intel.com
> > > > > > >
> > > > > > >
> > > > > > > The kernel config and materials to reproduce are available at:
> > > > > > > https://download.01.org/0day-ci/archive/20250421/202504211513.23f21a0-lkp@intel.com
> > > > > >
> > > > > > Good catch, and thank you for your testing efforts!
> > > > > >
> > > > > > Does the patch at the end of this email help?
> > > > >
> > > > > sorry but the patch does not help. one dmesg is attached.
> > > >
> > > > And idiot here failed to check for the exact same problem at the point
> > > > where the timer is queued, so thank you for bearing with me.
> > > >
> > > > Does the patch at the end of this email (in addition to the previous
> > > > patch) get the job done?
> > >
> > > unfortunately, it still doesn't fix, one dmesg is attached. part of is as [2]
> > >
> > > but I applied your two patches directly upon ddd062f753, like below:
> > >
> > > * 1c91d0bd4809f (linux-devel/fixup-1539a7e7b61a9) further patch for ddd062f753 from Paul
> > > * 1539a7e7b61a9 (linux-devel/fixup-ddd062f753) fix for ddd062f753 from Paul E. McKenney
> > > * ddd062f7536cc rcutorture: Complain if an ->up_read() is delayed more than 10 seconds
> > > * 1b983c34d5695 rcutorture: Comment invocations of tick_dep_set_task()
> > >
> > >
> > > I noticed there are some conflicts while applying your second patch, the
> > > 1c91d0bd4809f looks like [1]. there is no "int rtorsu_cpu;" before line:
> > > + ktime_t rtorsu_kt;
> > >
> > > seems your patch has a different base? I worried if my applyment has
> > > problems. if so, could you tell me the correct base? thanks!
> >
> > It looks correct to me. I will rebase at my end to make it apply cleanly
> > by the end of my tomorrow at the latest. Attempting it now would likely
> > just make a mess. :-/
> >
> > And thank you very much for your help with this because...
> >
[...]
> > >
> > > [ 168.048387][ T796] ------------[ cut here ]------------
> > > [ 168.049342][ T796] hrtimer queued at jiffies 4294952214 for 200000000 ns took 1502 jiffies
> >
> > ... I am quite surprised by the 1502 jiffies. On a HZ=1000 system,
> > I would have expected this value to be at least 10,000. I clearly need
> > to dig into this code much more carefully.
>
> And upon looking at the dmesg.xz that you attached, I see that HZ=100.
> So there really is a 15-second delay, which is intended to trip this
> 10-second timeout.
>
> So I guess is it no more Mr. Nice Guy for hrtimers, and therefore
> HRTIMER_MODE_HARD it is! ;-)
>
> Once again, thank you for your testing efforts!
you are welcome!
>
> I also rebased my fixup patches to the bottom of my development stack,
> so the combined patch shown below should apply cleanly. Here is hoping
> that the third time is a charm. ;-)
we confirm it fixes the WARNING we reported. thanks!
Tested-by: kernel test robot <oliver.sang@...el.com>
>
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> index 88d9f5298c3d8..1ebeef8019b86 100644
> --- a/kernel/rcu/rcutorture.c
> +++ b/kernel/rcu/rcutorture.c
> @@ -2445,6 +2445,7 @@ rcu_torture_reader(void *arg)
> struct rcu_torture_one_read_state_updown {
> struct hrtimer rtorsu_hrt;
> bool rtorsu_inuse;
> + ktime_t rtorsu_kt;
> unsigned long rtorsu_j;
> unsigned long rtorsu_ndowns;
> unsigned long rtorsu_nups;
> @@ -2488,7 +2489,7 @@ static int rcu_torture_updown_init(void)
> for (i = 0; i < n_up_down; i++) {
> init_rcu_torture_one_read_state(&updownreaders[i].rtorsu_rtors, rand);
> hrtimer_setup(&updownreaders[i].rtorsu_hrt, rcu_torture_updown_hrt, CLOCK_MONOTONIC,
> - HRTIMER_MODE_REL | HRTIMER_MODE_SOFT);
> + HRTIMER_MODE_REL | HRTIMER_MODE_HARD);
> torture_random_init(&updownreaders[i].rtorsu_trs);
> init_rcu_torture_one_read_state(&updownreaders[i].rtorsu_rtors,
> &updownreaders[i].rtorsu_trs);
> @@ -2539,12 +2540,14 @@ static void rcu_torture_updown_one(struct rcu_torture_one_read_state_updown *rto
> schedule_timeout_idle(HZ);
> return;
> }
> - rtorsup->rtorsu_j = jiffies;
> smp_store_release(&rtorsup->rtorsu_inuse, true);
> t = torture_random(&rtorsup->rtorsu_trs) & 0xfffff; // One per million.
> if (t < 10 * 1000)
> t = 200 * 1000 * 1000;
> - hrtimer_start(&rtorsup->rtorsu_hrt, t, HRTIMER_MODE_REL | HRTIMER_MODE_SOFT);
> + hrtimer_start(&rtorsup->rtorsu_hrt, t, HRTIMER_MODE_REL | HRTIMER_MODE_HARD);
> + smp_mb(); // Sample jiffies after posting hrtimer.
> + rtorsup->rtorsu_j = jiffies; // Not used by hrtimer handler.
> + rtorsup->rtorsu_kt = t;
> }
>
> /*
> @@ -2555,6 +2558,7 @@ static void rcu_torture_updown_one(struct rcu_torture_one_read_state_updown *rto
> static int
> rcu_torture_updown(void *arg)
> {
> + unsigned long j;
> struct rcu_torture_one_read_state_updown *rtorsup;
>
> VERBOSE_TOROUT_STRING("rcu_torture_updown task started");
> @@ -2562,8 +2566,10 @@ rcu_torture_updown(void *arg)
> for (rtorsup = updownreaders; rtorsup < &updownreaders[n_up_down]; rtorsup++) {
> if (torture_must_stop())
> break;
> + j = smp_load_acquire(&jiffies); // Time before ->rtorsu_inuse.
> if (smp_load_acquire(&rtorsup->rtorsu_inuse)) {
> - WARN_ON_ONCE(time_after(jiffies, rtorsup->rtorsu_j + 10 * HZ));
> + WARN_ONCE(time_after(j, rtorsup->rtorsu_j + 1 + HZ * 10),
> + "hrtimer queued at jiffies %lu for %lld ns took %lu jiffies\n", rtorsup->rtorsu_j, rtorsup->rtorsu_kt, j - rtorsup->rtorsu_j);
> continue;
> }
> rcu_torture_updown_one(rtorsup);
Powered by blists - more mailing lists