[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <7f32b085-c9e9-4686-9a93-539080cee9ab@paulmck-laptop>
Date: Thu, 24 Apr 2025 15:56:09 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Oliver Sang <oliver.sang@...el.com>
Cc: oe-lkp@...ts.linux.dev, lkp@...el.com,
Joel Fernandes <joelagnelf@...dia.com>,
linux-kernel@...r.kernel.org
Subject: Re: [linux-next:master] [rcutorture] ddd062f753:
WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_updown[rcutorture]
On Wed, Apr 23, 2025 at 08:05:53PM -0700, Paul E. McKenney wrote:
> On Thu, Apr 24, 2025 at 09:50:04AM +0800, Oliver Sang wrote:
> > hi, Paul,
> >
> > On Tue, Apr 22, 2025 at 10:54:10AM -0700, Paul E. McKenney wrote:
> >
> > [...]
> >
> > > > > >
> > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > > > > the same patch/commit), kindly add following tags
> > > > > > | Reported-by: kernel test robot <oliver.sang@...el.com>
> > > > > > | Closes: https://lore.kernel.org/oe-lkp/202504211513.23f21a0-lkp@intel.com
> > > > > >
> > > > > >
> > > > > > The kernel config and materials to reproduce are available at:
> > > > > > https://download.01.org/0day-ci/archive/20250421/202504211513.23f21a0-lkp@intel.com
> > > > >
> > > > > Good catch, and thank you for your testing efforts!
> > > > >
> > > > > Does the patch at the end of this email help?
> > > >
> > > > sorry but the patch does not help. one dmesg is attached.
> > >
> > > And idiot here failed to check for the exact same problem at the point
> > > where the timer is queued, so thank you for bearing with me.
> > >
> > > Does the patch at the end of this email (in addition to the previous
> > > patch) get the job done?
> >
> > unfortunately, it still doesn't fix, one dmesg is attached. part of is as [2]
> >
> > but I applied your two patches directly upon ddd062f753, like below:
> >
> > * 1c91d0bd4809f (linux-devel/fixup-1539a7e7b61a9) further patch for ddd062f753 from Paul
> > * 1539a7e7b61a9 (linux-devel/fixup-ddd062f753) fix for ddd062f753 from Paul E. McKenney
> > * ddd062f7536cc rcutorture: Complain if an ->up_read() is delayed more than 10 seconds
> > * 1b983c34d5695 rcutorture: Comment invocations of tick_dep_set_task()
> >
> >
> > I noticed there are some conflicts while applying your second patch, the
> > 1c91d0bd4809f looks like [1]. there is no "int rtorsu_cpu;" before line:
> > + ktime_t rtorsu_kt;
> >
> > seems your patch has a different base? I worried if my applyment has
> > problems. if so, could you tell me the correct base? thanks!
>
> It looks correct to me. I will rebase at my end to make it apply cleanly
> by the end of my tomorrow at the latest. Attempting it now would likely
> just make a mess. :-/
>
> And thank you very much for your help with this because...
>
> > [1]
> > commit 1c91d0bd4809f9f12e61f25d881a02f25c473702 (linux-devel/fixup-1539a7e7b61a9)
> > Author: 0day robot <lkp@...el.com>
> > Date: Wed Apr 23 10:27:51 2025 +0800
> >
> > further patch for ddd062f753 from Paul
> >
> > Signed-off-by: 0day robot <lkp@...el.com>
> >
> > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> > index e7b5811e0e456..14cc67d436c97 100644
> > --- a/kernel/rcu/rcutorture.c
> > +++ b/kernel/rcu/rcutorture.c
> > @@ -2438,6 +2438,7 @@ rcu_torture_reader(void *arg)
> > struct rcu_torture_one_read_state_updown {
> > struct hrtimer rtorsu_hrt;
> > bool rtorsu_inuse;
> > + ktime_t rtorsu_kt;
> > unsigned long rtorsu_j;
> > struct torture_random_state rtorsu_trs;
> > struct rcu_torture_one_read_state rtorsu_rtors;
> > @@ -2522,12 +2523,14 @@ static void rcu_torture_updown_one(struct rcu_torture_one_read_state_updown *rto
> > schedule_timeout_idle(HZ);
> > return;
> > }
> > - rtorsup->rtorsu_j = jiffies;
> > smp_store_release(&rtorsup->rtorsu_inuse, true);
> > t = torture_random(&rtorsup->rtorsu_trs) & 0xfffff; // One per million.
> > if (t < 10 * 1000)
> > t = 200 * 1000 * 1000;
> > hrtimer_start(&rtorsup->rtorsu_hrt, t, HRTIMER_MODE_REL | HRTIMER_MODE_SOFT);
> > + smp_mb(); // Sample jiffies after posting hrtimer.
> > + rtorsup->rtorsu_j = jiffies; // Not used by hrtimer handler.
> > + rtorsup->rtorsu_kt = t;
> > }
> >
> > /*
> > @@ -2548,7 +2551,9 @@ rcu_torture_updown(void *arg)
> > break;
> > j = smp_load_acquire(&jiffies); // Time before ->rtorsu_inuse.
> > if (smp_load_acquire(&rtorsup->rtorsu_inuse)) {
> > - WARN_ON_ONCE(time_after(j, rtorsup->rtorsu_j + 10 * HZ));
> > + WARN_ONCE(time_after(j, rtorsup->rtorsu_j + 1 + HZ * 10),
> > + "hrtimer queued at jiffies %lu for %lld ns took %lu jiffies\n", rtorsup->rtorsu_j, rtorsup->rtorsu_kt, j - rtorsup->rtorsu_j);
> > continue;
> > }
> > rcu_torture_updown_one(rtorsup);
> >
> >
> >
> > [2]
> >
> > [ 168.048387][ T796] ------------[ cut here ]------------
> > [ 168.049342][ T796] hrtimer queued at jiffies 4294952214 for 200000000 ns took 1502 jiffies
>
> ... I am quite surprised by the 1502 jiffies. On a HZ=1000 system,
> I would have expected this value to be at least 10,000. I clearly need
> to dig into this code much more carefully.
And upon looking at the dmesg.xz that you attached, I see that HZ=100.
So there really is a 15-second delay, which is intended to trip this
10-second timeout.
So I guess is it no more Mr. Nice Guy for hrtimers, and therefore
HRTIMER_MODE_HARD it is! ;-)
Once again, thank you for your testing efforts!
I also rebased my fixup patches to the bottom of my development stack,
so the combined patch shown below should apply cleanly. Here is hoping
that the third time is a charm. ;-)
Thanx, Paul
------------------------------------------------------------------------
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 88d9f5298c3d8..1ebeef8019b86 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -2445,6 +2445,7 @@ rcu_torture_reader(void *arg)
struct rcu_torture_one_read_state_updown {
struct hrtimer rtorsu_hrt;
bool rtorsu_inuse;
+ ktime_t rtorsu_kt;
unsigned long rtorsu_j;
unsigned long rtorsu_ndowns;
unsigned long rtorsu_nups;
@@ -2488,7 +2489,7 @@ static int rcu_torture_updown_init(void)
for (i = 0; i < n_up_down; i++) {
init_rcu_torture_one_read_state(&updownreaders[i].rtorsu_rtors, rand);
hrtimer_setup(&updownreaders[i].rtorsu_hrt, rcu_torture_updown_hrt, CLOCK_MONOTONIC,
- HRTIMER_MODE_REL | HRTIMER_MODE_SOFT);
+ HRTIMER_MODE_REL | HRTIMER_MODE_HARD);
torture_random_init(&updownreaders[i].rtorsu_trs);
init_rcu_torture_one_read_state(&updownreaders[i].rtorsu_rtors,
&updownreaders[i].rtorsu_trs);
@@ -2539,12 +2540,14 @@ static void rcu_torture_updown_one(struct rcu_torture_one_read_state_updown *rto
schedule_timeout_idle(HZ);
return;
}
- rtorsup->rtorsu_j = jiffies;
smp_store_release(&rtorsup->rtorsu_inuse, true);
t = torture_random(&rtorsup->rtorsu_trs) & 0xfffff; // One per million.
if (t < 10 * 1000)
t = 200 * 1000 * 1000;
- hrtimer_start(&rtorsup->rtorsu_hrt, t, HRTIMER_MODE_REL | HRTIMER_MODE_SOFT);
+ hrtimer_start(&rtorsup->rtorsu_hrt, t, HRTIMER_MODE_REL | HRTIMER_MODE_HARD);
+ smp_mb(); // Sample jiffies after posting hrtimer.
+ rtorsup->rtorsu_j = jiffies; // Not used by hrtimer handler.
+ rtorsup->rtorsu_kt = t;
}
/*
@@ -2555,6 +2558,7 @@ static void rcu_torture_updown_one(struct rcu_torture_one_read_state_updown *rto
static int
rcu_torture_updown(void *arg)
{
+ unsigned long j;
struct rcu_torture_one_read_state_updown *rtorsup;
VERBOSE_TOROUT_STRING("rcu_torture_updown task started");
@@ -2562,8 +2566,10 @@ rcu_torture_updown(void *arg)
for (rtorsup = updownreaders; rtorsup < &updownreaders[n_up_down]; rtorsup++) {
if (torture_must_stop())
break;
+ j = smp_load_acquire(&jiffies); // Time before ->rtorsu_inuse.
if (smp_load_acquire(&rtorsup->rtorsu_inuse)) {
- WARN_ON_ONCE(time_after(jiffies, rtorsup->rtorsu_j + 10 * HZ));
+ WARN_ONCE(time_after(j, rtorsup->rtorsu_j + 1 + HZ * 10),
+ "hrtimer queued at jiffies %lu for %lld ns took %lu jiffies\n", rtorsup->rtorsu_j, rtorsup->rtorsu_kt, j - rtorsup->rtorsu_j);
continue;
}
rcu_torture_updown_one(rtorsup);
Powered by blists - more mailing lists