lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220803140653.GD2125313@paulmck-ThinkPad-P17-Gen-1>
Date:   Wed, 3 Aug 2022 07:06:53 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     Liu Song <liusong@...ux.alibaba.com>, mingo@...hat.com,
        peterz@...radead.org, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        bsegall@...gle.com, mgorman@...e.de, bristot@...hat.com,
        vschneid@...hat.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sched/debug: avoid executing show_state and causing rcu
 stall warning

On Wed, Aug 03, 2022 at 08:42:35AM -0400, Steven Rostedt wrote:
> 
> [ Adding Paul ]
> 
> On Wed,  3 Aug 2022 09:18:45 +0800
> Liu Song <liusong@...ux.alibaba.com> wrote:
> 
> > From: Liu Song <liusong@...ux.alibaba.com>
> > 
> > If the number of CPUs is large, "sysrq_sched_debug_show" will execute for
> > a long time. Every time I execute "echo t > /proc/sysrq-trigger" on my
> > 128-core machine, the rcu stall warning will be triggered. Moreover,
> > sysrq_sched_debug_show does not need to be protected by rcu_read_lock,
> > and no rcu stall warning will appear after adjustment.
> > 
> > Signed-off-by: Liu Song <liusong@...ux.alibaba.com>
> > ---
> >  kernel/sched/core.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 5555e49..82c117e 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -8879,11 +8879,11 @@ void show_state_filter(unsigned int state_filter)
> >  			sched_show_task(p);
> >  	}
> >  
> > +	rcu_read_unlock();
> >  #ifdef CONFIG_SCHED_DEBUG
> >  	if (!state_filter)
> >  		sysrq_sched_debug_show();
> 
> If this is just because sysrq_sched_debug_show() is very slow, does RCU
> have a way to "touch" it? Like the watchdogs have? That is, to tell RCU
> "Yes I know I'm taking a long time, but I'm still making forward progress,
> don't complain about me". Then the sysrq_sched_debug_show() could have:
> 
> 	for_each_online_cpu(cpu) {
> 		/*
> 		 * Need to reset softlockup watchdogs on all CPUs, because
> 		 * another CPU might be blocked waiting for us to process
> 		 * an IPI or stop_machine.
> 		 */
> 		touch_nmi_watchdog();
> 		touch_all_softlockup_watchdogs();
> +		touch_rcu();
> 		print_cpu(NULL, cpu);
> 	}
> 
> ??

There is an rcu_sysrq_start() and rcu_sysrq_end() to suppress this.  These
are invoked by __handle_sysrq().  The value of rcu_cpu_stall_suppress
should be non-zero during the sysrq execution, and this should prevent
RCU CPU stall warnings from being printed.

That said, the code currently does not support overlapping calls to the
various functions that suppress RCU CPU stall warnings.  Except that
the only other use in current mainline is rcu_panic(), which never
unsuppresses.

So could you please check the value of rcu_cpu_stall_suppress?
Just in case some other form of suppression was added somewhere
that I missed?

							Thanx, Paul

> -- Steve
> 
> >  #endif
> > -	rcu_read_unlock();
> >  	/*
> >  	 * Only show locks if all tasks are dumped:
> >  	 */
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ