[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <749b68ac-da4d-452e-883a-d7065b3b3c30@paulmck-laptop>
Date: Tue, 25 Jul 2023 09:33:46 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Joel Fernandes <joel@...lfernandes.org>
Cc: linux-kernel@...r.kernel.org, stable@...r.kernel.org,
rcu@...r.kernel.org, Greg KH <gregkh@...uxfoundation.org>
Subject: Re: [BUG] Re: Linux 6.4.4
On Tue, Jul 25, 2023 at 03:30:17PM +0000, Joel Fernandes wrote:
> On Mon, Jul 24, 2023 at 04:17:32PM -0700, Paul E. McKenney wrote:
> > On Mon, Jul 24, 2023 at 07:04:14PM -0400, Joel Fernandes wrote:
> > >
> > >
> > > > On Jul 24, 2023, at 12:00 PM, Paul E. McKenney <paulmck@...nel.org> wrote:
> > > >
> > > > On Mon, Jul 24, 2023 at 09:36:02AM -0400, Joel Fernandes wrote:
> > > >>> On Sun, Jul 23, 2023 at 11:35 PM Paul E. McKenney <paulmck@...nel.org> wrote:
> > > >>>
> > > >>> On Mon, Jul 24, 2023 at 12:32:57AM +0000, Joel Fernandes wrote:
> > > >>>> On Sun, Jul 23, 2023 at 10:19:27AM -0700, Paul E. McKenney wrote:
> > > >>>>> On Sun, Jul 23, 2023 at 10:50:26AM -0400, Joel Fernandes wrote:
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> On 7/22/23 13:27, Paul E. McKenney wrote:
> > > >>>>>> [..]
> > > >>>>>>>
> > > >>>>>>> OK, if this kernel is non-preemptible, you are not running TREE03,
> > > >>>>>>> correct?
> > > >>>>>>>
> > > >>>>>>>> Next plan of action is to get sched_waking stack traces since I have a
> > > >>>>>>>> very reliable repro of this now.
> > > >>>>>>>
> > > >>>>>>> Too much fun! ;-)
> > > >>>>>>
> > > >>>>>> For TREE07 issue, it is actually the schedule_timeout_interruptible(1)
> > > >>>>>> in stutter_wait() that is beating up the CPU0 for 4 seconds.
> > > >>>>>>
> > > >>>>>> This is very similar to the issue I fixed in New year in d52d3a2bf408
> > > >>>>>> ("torture: Fix hang during kthread shutdown phase")
> > > >>>>>
> > > >>>>> Agreed, if there are enough kthreads, and all the kthreads are on a
> > > >>>>> single CPU, this could consume that CPU.
> > > >>>>>
> > > >>>>>> Adding a cond_resched() there also did not help.
> > > >>>>>>
> > > >>>>>> I think the issue is the stutter thread fails to move spt forward
> > > >>>>>> because it does not get CPU time. But spt == 1 should be very brief
> > > >>>>>> AFAIU. I was wondering if we could set that to RT.
> > > >>>>>
> > > >>>>> Or just use a single hrtimer-based wait for each kthread?
> > > >>>>
> > > >>>> [Joel]
> > > >>>> Yes this might be better, but there's still the issue that spt may not be set
> > > >>>> back to 0 in some future release where the thread gets starved.
> > > >>>
> > > >>> But if each thread knows the absolute time at which the current stutter
> > > >>> period is supposed to end, there should not be any need for the spt
> > > >>> variable, correct?
> > > >>
> > > >> Yes.
> > > >>
> > > >>>>>> But also maybe the following will cure it like it did for the shutdown
> > > >>>>>> issue, giving the stutter thread just enough CPU time to move spt forward.
> > > >>>>>>
> > > >>>>>> Now I am trying the following and will let it run while I go do other
> > > >>>>>> family related things. ;)
> > > >>>>>
> > > >>>>> Good point, if this avoids the problem, that gives a strong indication
> > > >>>>> that your hypothesis on the root cause is correct.
> > > >>>>
> > > >>>> [Joel]
> > > >>>> And the TREE07 issue is gone with that change!
> > > >> [...]
> > > >>>> Let me know what you think, thanks!
> > > >>>
> > > >>> If we can make the stutter kthread set an absolute time for the current
> > > >>> stutter period to end, then we should be able to simplify the code quite
> > > >>> a bit and get rid of the CPU consumption entirely. (Give or take the
> > > >>> possible need for a given thread to check whether it was erroneously
> > > >>> awakened early.)
> > > >>>
> > > >>> But what specifically did you have in mind?
> > > >>
> > > >> I was thinking of a 2 counter approach storing the absolute time. Use
> > > >> an alternative counter for different stuttering sessions. But yes,
> > > >> generally I agree with the absolute time idea. What do you think Paul?
> > > >>
> > > >> Do we want to just do the simpler schedule_timeout at HZ / 20 to keep stable
> > > >> green, and do the absolute-time approach for mainline? That might be better
> > > >> from a process PoV. But I think stable requires patches to be upstream. Greg?
> > > >>
> > > >> I will try to send out patches this week to discuss this, thanks,
> > > >
> > > > Heh!!!
> > > >
> > > > Me, I was just thinking of mainline. ;-)
> > >
> > > Turns out it is simple enough for both mainline and stable :-).
> > > Will test more and send it out soon.
> >
> > Woo-hoo!!! Some times you get lucky!
>
> Better luck today than yesterday since yesterday a Linux distro update
> stalled my development. :-). Thanks to backups that I got back up quickly..
>
> Here is the mostly tested patch as preview. I am running the full suite now
> after confirming it fixed TREE07. Looks OK? (I also confirmed with prints
> that stuttering timing is the same as before).
Looks quite plausible, thank you!
Thanx, Paul
> ---8<-----------------------
>
> From: Joel Fernandes <joelaf@...gle.com>
> Subject: [PATCH] rcutorture: Fix stuttering races and other issues
>
> The stuttering code isn't functioning as expected. Ideally, it should
> pause the torture threads for a designated period before resuming. Yet,
> it fails to halt the test for the correct duration. Additionally, a race
> condition exists, potentially causing the stuttering code to pause for
> an extended period if the 'spt' variable is non-zero due to the stutter
> orchestration thread's inadequate CPU time.
>
> Moreover, over-stuttering can hinder RCU's progress on TREE07 kernels.
> This happens as the stuttering code may run within a softirq due to RCU
> callbacks. Consequently, ksoftirqd keeps a CPU busy for several seconds,
> thus obstructing RCU's progress. This situation triggers a warning
> message in the logs:
>
> [ 2169.481783] rcu_torture_writer: rtort_pipe_count: 9
>
> This warning suggests that an RCU torture object, although invisible to
> RCU readers, couldn't make it past the pipe array and be freed -- a
> strong indication that there weren't enough grace periods during the
> stutter interval.
>
> To address these issues, this patch sets the "stutter end" time to an
> absolute point in the future set by the main stutter thread. This is
> then used for waiting in stutter_wait(). While the stutter thread still
> defines this absolute time, the waiters' waiting logic doesn't rely on
> the stutter thread receiving sufficient CPU time to halt the stuttering
> as the halting is now self-controlled.
>
> Signed-off-by: Joel Fernandes (Google) <joel@...lfernandes.org>
> ---
> kernel/torture.c | 46 +++++++++++++---------------------------------
> 1 file changed, 13 insertions(+), 33 deletions(-)
>
> diff --git a/kernel/torture.c b/kernel/torture.c
> index 68dba4ecab5c..63f8f2a7d960 100644
> --- a/kernel/torture.c
> +++ b/kernel/torture.c
> @@ -719,7 +719,7 @@ static void torture_shutdown_cleanup(void)
> * suddenly applied to or removed from the system.
> */
> static struct task_struct *stutter_task;
> -static int stutter_pause_test;
> +static ktime_t stutter_till_abs_time;
> static int stutter;
> static int stutter_gap;
>
> @@ -729,30 +729,17 @@ static int stutter_gap;
> */
> bool stutter_wait(const char *title)
> {
> - unsigned int i = 0;
> bool ret = false;
> - int spt;
> + ktime_t now_ns, till_ns;
>
> cond_resched_tasks_rcu_qs();
> - spt = READ_ONCE(stutter_pause_test);
> - for (; spt; spt = READ_ONCE(stutter_pause_test)) {
> - if (!ret && !rt_task(current)) {
> - sched_set_normal(current, MAX_NICE);
> - ret = true;
> - }
> - if (spt == 1) {
> - torture_hrtimeout_jiffies(1, NULL);
> - } else if (spt == 2) {
> - while (READ_ONCE(stutter_pause_test)) {
> - if (!(i++ & 0xffff))
> - torture_hrtimeout_us(10, 0, NULL);
> - cond_resched();
> - }
> - } else {
> - torture_hrtimeout_jiffies(round_jiffies_relative(HZ), NULL);
> - }
> - torture_shutdown_absorb(title);
> + now_ns = ktime_get();
> + till_ns = READ_ONCE(stutter_till_abs_time);
> + if (till_ns && ktime_before(now_ns, till_ns)) {
> + torture_hrtimeout_ns(ktime_sub(till_ns, now_ns), 0, NULL);
> + ret = true;
> }
> + torture_shutdown_absorb(title);
> return ret;
> }
> EXPORT_SYMBOL_GPL(stutter_wait);
> @@ -763,23 +750,16 @@ EXPORT_SYMBOL_GPL(stutter_wait);
> */
> static int torture_stutter(void *arg)
> {
> - DEFINE_TORTURE_RANDOM(rand);
> - int wtime;
> + ktime_t till_ns;
>
> VERBOSE_TOROUT_STRING("torture_stutter task started");
> do {
> if (!torture_must_stop() && stutter > 1) {
> - wtime = stutter;
> - if (stutter > 2) {
> - WRITE_ONCE(stutter_pause_test, 1);
> - wtime = stutter - 3;
> - torture_hrtimeout_jiffies(wtime, &rand);
> - wtime = 2;
> - }
> - WRITE_ONCE(stutter_pause_test, 2);
> - torture_hrtimeout_jiffies(wtime, NULL);
> + till_ns = ktime_add_ns(ktime_get(),
> + jiffies_to_nsecs(stutter));
> + WRITE_ONCE(stutter_till_abs_time, till_ns);
> + torture_hrtimeout_jiffies(stutter - 1, NULL);
> }
> - WRITE_ONCE(stutter_pause_test, 0);
> if (!torture_must_stop())
> torture_hrtimeout_jiffies(stutter_gap, NULL);
> torture_shutdown_absorb("torture_stutter");
> --
> 2.41.0.487.g6d72f3e995-goog
>
Powered by blists - more mailing lists