lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20220615170825.GS1790663@paulmck-ThinkPad-P17-Gen-1>
Date:   Wed, 15 Jun 2022 10:08:25 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Mark Rutland <mark.rutland@....com>
Cc:     Zhouyi Zhou <zhouzhouyi@...il.com>, willy@...radead.org,
        urezki@...il.com, rcu@...r.kernel.org, frederic@...nel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH rcu/dev] RFC: temporarily reduce rcu tasks kthread sleep
 time for PROVE_RCU

On Wed, Jun 15, 2022 at 09:51:18AM +0100, Mark Rutland wrote:
> Hi Paul,
> 
> On Tue, Jun 07, 2022 at 04:42:09PM -0700, Paul E. McKenney wrote:
> > On Fri, May 27, 2022 at 08:39:09AM +0800, Zhouyi Zhou wrote:
> > > Sometimes, the kernel will boot too fast for rcu_tasks_verify_self_tests
> > > to have all required grace periods.
> 
> As mentioned on IRC, with v5.19-rc2 I consistently see this on arm64 at boot
> time, e.g.
> 
> | call_rcu_tasks() has been failed.
> | ------------[ cut here ]------------
> | WARNING: CPU: 0 PID: 1 at kernel/rcu/tasks.h:1751 rcu_tasks_verify_self_tests+0x48/0x74
> | Modules linked in:
> | CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.19.0-rc2 #2
> | Hardware name: linux,dummy-virt (DT)
> | pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> | pc : rcu_tasks_verify_self_tests+0x48/0x74
> | lr : rcu_tasks_verify_self_tests+0x6c/0x74
> | sp : ffff80000800bd70
> | x29: ffff80000800bd70 x28: 0000000000000000 x27: ffffbd05c97904c8
> | x26: ffffbd05c98610a0 x25: 0000000000000007 x24: 0000000000000000
> | x23: ffffbd05c9fa0b38 x22: ffffbd05ca42f000 x21: ffffbd05c9514ac0
> | x20: ffffbd05ca038808 x19: ffffbd05ca038808 x18: 0000000000000030
> | x17: 00000000808cf272 x16: 00000000b4c3fe3b x15: fffffffffffe4328
> | x14: 0000000000000001 x13: 2e64656c69616620 x12: 6e65656220736168
> | x11: 202928736b736174 x10: 5f7563725f6c6c61 x9 : 64656c696166206e
> | x8 : 6565622073616820 x7 : 2928736b7361745f x6 : 7563725f6c6c6163
> | x5 : ffff3f48ffbc0c80 x4 : 0000000000000000 x3 : 0000000000000000
> | x2 : 0000000000000000 x1 : 0000000000000001 x0 : 00000000ffffffff
> | Call trace:
> |  rcu_tasks_verify_self_tests+0x48/0x74
> |  do_one_initcall+0x74/0x400
> |  kernel_init_freeable+0x2fc/0x384
> |  kernel_init+0x28/0x130
> |  ret_from_fork+0x10/0x20
> | irq event stamp: 188396
> | hardirqs last  enabled at (188395): [<ffffbd05c7d2b4f4>] vprintk_store+0x3e4/0x4b0
> | hardirqs last disabled at (188396): [<ffffbd05c8d9a494>] el1_dbg+0x24/0x90
> | softirqs last  enabled at (188302): [<ffffbd05c8b95910>] release_sock+0xac/0xd0
> | softirqs last disabled at (188300): [<ffffbd05c8b9588c>] release_sock+0x28/0xd0
> | ---[ end trace 0000000000000000 ]---
> | registered taskstats version 1
> 
> I gave the below diff a go with:
> 
> * 100 UP boots
> * 100 64-CPU SMP boots
> 
> ... in a KVM VM on a ThunderX2 host, and out of those 200 boots I never saw the
> warning.
> 
> So FWIW:
> 
>   Tested-by: Mark Rutland <mark.rutland@....com>

Thank you, and I will apply this on the next rebase.

> Thanks for this!

Glad it helped.  ;-)

> Just as an aside, I had to grab this thread from lore as I'm not subscribed to
> the RCU list, so there might be others who haven't seen this thread.

Good point!  I just now added linux-kernel on CC, which should allow
people to follow the lore bread crumbs.

							Thanx, Paul

> Mark.
> 
> > commit 3e95d4b287b37ee5f7f82e5ebd749ab89fd706c2
> > Author: Paul E. McKenney <paulmck@...nel.org>
> > Date:   Tue Jun 7 15:23:52 2022 -0700
> > 
> >     rcu-tasks: Be more patient for RCU Tasks boot-time testing
> >     
> >     The RCU-Tasks family of grace-period primitives can take some time to
> >     complete, and the amount of time can depend on the exact hardware and
> >     software configuration.  Some configurations boot up fast enough that the
> >     RCU-Tasks verification process gets false-positive failures.  This commit
> >     therefore allows up to 30 seconds for the grace periods to complete, with
> >     this value adjustable downwards using the rcupdate.rcu_task_stall_timeout
> >     kernel boot parameter.
> >     
> >     Reported-by: Matthew Wilcox <willy@...radead.org>
> >     Reported-by: Zhouyi Zhou <zhouzhouyi@...il.com>
> >     Signed-off-by: Paul E. McKenney <paulmck@...nel.org>
> > 
> > diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> > index f1209ce621c51..1a4c3adc5c397 100644
> > --- a/kernel/rcu/tasks.h
> > +++ b/kernel/rcu/tasks.h
> > @@ -145,6 +145,7 @@ static int rcu_task_ipi_delay __read_mostly = RCU_TASK_IPI_DELAY;
> >  module_param(rcu_task_ipi_delay, int, 0644);
> >  
> >  /* Control stall timeouts.  Disable with <= 0, otherwise jiffies till stall. */
> > +#define RCU_TASK_BOOT_STALL_TIMEOUT (HZ * 30)
> >  #define RCU_TASK_STALL_TIMEOUT (HZ * 60 * 10)
> >  static int rcu_task_stall_timeout __read_mostly = RCU_TASK_STALL_TIMEOUT;
> >  module_param(rcu_task_stall_timeout, int, 0644);
> > @@ -1778,23 +1779,24 @@ struct rcu_tasks_test_desc {
> >  	struct rcu_head rh;
> >  	const char *name;
> >  	bool notrun;
> > +	unsigned long runstart;
> >  };
> >  
> >  static struct rcu_tasks_test_desc tests[] = {
> >  	{
> >  		.name = "call_rcu_tasks()",
> >  		/* If not defined, the test is skipped. */
> > -		.notrun = !IS_ENABLED(CONFIG_TASKS_RCU),
> > +		.notrun = IS_ENABLED(CONFIG_TASKS_RCU),
> >  	},
> >  	{
> >  		.name = "call_rcu_tasks_rude()",
> >  		/* If not defined, the test is skipped. */
> > -		.notrun = !IS_ENABLED(CONFIG_TASKS_RUDE_RCU),
> > +		.notrun = IS_ENABLED(CONFIG_TASKS_RUDE_RCU),
> >  	},
> >  	{
> >  		.name = "call_rcu_tasks_trace()",
> >  		/* If not defined, the test is skipped. */
> > -		.notrun = !IS_ENABLED(CONFIG_TASKS_TRACE_RCU)
> > +		.notrun = IS_ENABLED(CONFIG_TASKS_TRACE_RCU)
> >  	}
> >  };
> >  
> > @@ -1805,23 +1807,28 @@ static void test_rcu_tasks_callback(struct rcu_head *rhp)
> >  
> >  	pr_info("Callback from %s invoked.\n", rttd->name);
> >  
> > -	rttd->notrun = true;
> > +	rttd->notrun = false;
> >  }
> >  
> >  static void rcu_tasks_initiate_self_tests(void)
> >  {
> > +	unsigned long j = jiffies;
> > +
> >  	pr_info("Running RCU-tasks wait API self tests\n");
> >  #ifdef CONFIG_TASKS_RCU
> > +	tests[0].runstart = j;
> >  	synchronize_rcu_tasks();
> >  	call_rcu_tasks(&tests[0].rh, test_rcu_tasks_callback);
> >  #endif
> >  
> >  #ifdef CONFIG_TASKS_RUDE_RCU
> > +	tests[1].runstart = j;
> >  	synchronize_rcu_tasks_rude();
> >  	call_rcu_tasks_rude(&tests[1].rh, test_rcu_tasks_callback);
> >  #endif
> >  
> >  #ifdef CONFIG_TASKS_TRACE_RCU
> > +	tests[2].runstart = j;
> >  	synchronize_rcu_tasks_trace();
> >  	call_rcu_tasks_trace(&tests[2].rh, test_rcu_tasks_callback);
> >  #endif
> > @@ -1831,11 +1838,18 @@ static int rcu_tasks_verify_self_tests(void)
> >  {
> >  	int ret = 0;
> >  	int i;
> > +	unsigned long bst = rcu_task_stall_timeout;
> >  
> > +	if (bst <= 0 || bst > RCU_TASK_BOOT_STALL_TIMEOUT)
> > +		bst = RCU_TASK_BOOT_STALL_TIMEOUT;
> >  	for (i = 0; i < ARRAY_SIZE(tests); i++) {
> > -		if (!tests[i].notrun) {		// still hanging.
> > -			pr_err("%s has been failed.\n", tests[i].name);
> > -			ret = -1;
> > +		while (tests[i].notrun) {		// still hanging.
> > +			if (time_after(jiffies, tests[i].runstart + bst)) {
> > +				pr_err("%s has failed boot-time tests.\n", tests[i].name);
> > +				ret = -1;
> > +				break;
> > +			}
> > +			schedule_timeout_uninterruptible(1);
> >  		}
> >  	}
> >  
> > 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ