linux-kernel - Re: [PATCH v4 4/5] rcutorture: Force synchronizing of RCU flavor from hotplug notifier

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200810194131.GE2865655@google.com>
Date:   Mon, 10 Aug 2020 15:41:31 -0400
From:   Joel Fernandes <joel@...lfernandes.org>
To:     "Paul E. McKenney" <paulmck@...nel.org>
Cc:     linux-kernel@...r.kernel.org, Davidlohr Bueso <dave@...olabs.net>,
        Jonathan Corbet <corbet@....net>,
        Josh Triplett <josh@...htriplett.org>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        linux-doc@...r.kernel.org,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Mauro Carvalho Chehab <mchehab+samsung@...nel.org>,
        neeraju@...eaurora.org, peterz@...radead.org,
        Randy Dunlap <rdunlap@...radead.org>, rcu@...r.kernel.org,
        Steven Rostedt <rostedt@...dmis.org>, tglx@...utronix.de,
        vineethrp@...il.com
Subject: Re: [PATCH v4 4/5] rcutorture: Force synchronizing of RCU flavor
 from hotplug notifier

On Mon, Aug 10, 2020 at 10:54:34AM -0700, Paul E. McKenney wrote:
> On Mon, Aug 10, 2020 at 01:31:09PM -0400, Joel Fernandes wrote:
> > Hi Paul,
> > 
> > On Mon, Aug 10, 2020 at 09:19:45AM -0700, Paul E. McKenney wrote:
> > > On Fri, Aug 07, 2020 at 01:07:21PM -0400, Joel Fernandes (Google) wrote:
> > > > RCU has had deadlocks in the past related to synchronizing in a hotplug
> > > > notifier. Typically, this has occurred because timer callbacks did not get
> > > > migrated before the CPU hotplug notifier requesting RCU's services is
> > > > called. If RCU's grace period processing has a timer callback queued in
> > > > the meanwhile, it may never get called causing RCU stalls.
> > > > 
> > > > These issues have been fixed by removing such dependencies from grace
> > > > period processing, however there are no testing scenarios for such
> > > > cases.
> > > > 
> > > > This commit therefore reuses rcutorture's existing hotplug notifier to
> > > > invoke the flavor-specific synchronize callback. If anything locks up,
> > > > we expect stall warnings and/or other splats.
> > > > 
> > > > Obviously, we need not test for rcu_barrier from a notifier, since those
> > > > are not allowed from notifiers. This fact is already detailed in the
> > > > documentation as well.
> > > > 
> > > > Signed-off-by: Joel Fernandes (Google) <joel@...lfernandes.org>
> > > 
> > > Given that rcutorture_booster_init() is invoked on the CPU in question
> > > only after it is up and running, and that (if I remember correctly)
> > > rcutorture_booster_cleanup() is invoked on the outgoing CPU before it
> > > has really started going away, would this code really have caught that
> > > timer/CPU-hotplug/RCU bug?
> > 
> > You are right, it would not have caught that particular one because the timer
> > callbacks would have been migrated by the time the rcutorture_booster_init()
> > is called.
> > 
> > I still thought it is a good idea anyway to test if the dynamic hotplug
> > notifiers don't have these issues.
> > 
> > Did you have a better idea on how to test the timer/hotplug/rcu bug?
> 
> My suggestion would be to place an rcutorture hook in all of the RCU
> notifiers that support blocking and that have some possibility of making
> this deadlock happen.  There are some similar hooks in other parts of RCU.

Sure that's a good idea, I will look into it. Thanks!

 - Joel

> 							Thanx, Paul
> 
> > thanks,
> > 
> >  - Joel
> > 
> > 
> > 
> > > >  kernel/rcu/rcutorture.c | 81 +++++++++++++++++++++--------------------
> > > >  1 file changed, 42 insertions(+), 39 deletions(-)
> > > > 
> > > > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> > > > index 92cb79620939..083b65e4877d 100644
> > > > --- a/kernel/rcu/rcutorture.c
> > > > +++ b/kernel/rcu/rcutorture.c
> > > > @@ -1645,12 +1645,37 @@ rcu_torture_print_module_parms(struct rcu_torture_ops *cur_ops, const char *tag)
> > > >  		 read_exit_delay, read_exit_burst);
> > > >  }
> > > >  
> > > > -static int rcutorture_booster_cleanup(unsigned int cpu)
> > > > +static bool rcu_torture_can_boost(void)
> > > > +{
> > > > +	static int boost_warn_once;
> > > > +	int prio;
> > > > +
> > > > +	if (!(test_boost == 1 && cur_ops->can_boost) && test_boost != 2)
> > > > +		return false;
> > > > +
> > > > +	prio = rcu_get_gp_kthreads_prio();
> > > > +	if (!prio)
> > > > +		return false;
> > > > +
> > > > +	if (prio < 2) {
> > > > +		if (boost_warn_once  == 1)
> > > > +			return false;
> > > > +
> > > > +		pr_alert("%s: WARN: RCU kthread priority too low to test boosting.  Skipping RCU boost test. Try passing rcutree.kthread_prio > 1 on the kernel command line.\n", KBUILD_MODNAME);
> > > > +		boost_warn_once = 1;
> > > > +		return false;
> > > > +	}
> > > > +
> > > > +	return true;
> > > > +}
> > > > +
> > > > +static int rcutorture_hp_cleanup(unsigned int cpu)
> > > >  {
> > > >  	struct task_struct *t;
> > > >  
> > > > -	if (boost_tasks[cpu] == NULL)
> > > > +	if (!rcu_torture_can_boost() || boost_tasks[cpu] == NULL)
> > > >  		return 0;
> > > > +
> > > >  	mutex_lock(&boost_mutex);
> > > >  	t = boost_tasks[cpu];
> > > >  	boost_tasks[cpu] = NULL;
> > > > @@ -1662,11 +1687,14 @@ static int rcutorture_booster_cleanup(unsigned int cpu)
> > > >  	return 0;
> > > >  }
> > > >  
> > > > -static int rcutorture_booster_init(unsigned int cpu)
> > > > +static int rcutorture_hp_init(unsigned int cpu)
> > > >  {
> > > >  	int retval;
> > > >  
> > > > -	if (boost_tasks[cpu] != NULL)
> > > > +	/* Force synchronizing from hotplug notifier to ensure it is safe. */
> > > > +	cur_ops->sync();
> > > > +
> > > > +	if (!rcu_torture_can_boost() || boost_tasks[cpu] != NULL)
> > > >  		return 0;  /* Already created, nothing more to do. */
> > > >  
> > > >  	/* Don't allow time recalculation while creating a new task. */
> > > > @@ -2336,30 +2364,6 @@ static void rcu_torture_barrier_cleanup(void)
> > > >  	}
> > > >  }
> > > >  
> > > > -static bool rcu_torture_can_boost(void)
> > > > -{
> > > > -	static int boost_warn_once;
> > > > -	int prio;
> > > > -
> > > > -	if (!(test_boost == 1 && cur_ops->can_boost) && test_boost != 2)
> > > > -		return false;
> > > > -
> > > > -	prio = rcu_get_gp_kthreads_prio();
> > > > -	if (!prio)
> > > > -		return false;
> > > > -
> > > > -	if (prio < 2) {
> > > > -		if (boost_warn_once  == 1)
> > > > -			return false;
> > > > -
> > > > -		pr_alert("%s: WARN: RCU kthread priority too low to test boosting.  Skipping RCU boost test. Try passing rcutree.kthread_prio > 1 on the kernel command line.\n", KBUILD_MODNAME);
> > > > -		boost_warn_once = 1;
> > > > -		return false;
> > > > -	}
> > > > -
> > > > -	return true;
> > > > -}
> > > > -
> > > >  static bool read_exit_child_stop;
> > > >  static bool read_exit_child_stopped;
> > > >  static wait_queue_head_t read_exit_wq;
> > > > @@ -2503,8 +2507,7 @@ rcu_torture_cleanup(void)
> > > >  		 rcutorture_seq_diff(gp_seq, start_gp_seq));
> > > >  	torture_stop_kthread(rcu_torture_stats, stats_task);
> > > >  	torture_stop_kthread(rcu_torture_fqs, fqs_task);
> > > > -	if (rcu_torture_can_boost())
> > > > -		cpuhp_remove_state(rcutor_hp);
> > > > +	cpuhp_remove_state(rcutor_hp);
> > > >  
> > > >  	/*
> > > >  	 * Wait for all RCU callbacks to fire, then do torture-type-specific
> > > > @@ -2773,21 +2776,21 @@ rcu_torture_init(void)
> > > >  		if (firsterr)
> > > >  			goto unwind;
> > > >  	}
> > > > +
> > > > +	firsterr = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "RCU_TORTURE",
> > > > +			rcutorture_hp_init,
> > > > +			rcutorture_hp_cleanup);
> > > > +	if (firsterr < 0)
> > > > +		goto unwind;
> > > > +	rcutor_hp = firsterr;
> > > > +
> > > >  	if (test_boost_interval < 1)
> > > >  		test_boost_interval = 1;
> > > >  	if (test_boost_duration < 2)
> > > >  		test_boost_duration = 2;
> > > > -	if (rcu_torture_can_boost()) {
> > > > -
> > > > +	if (rcu_torture_can_boost())
> > > >  		boost_starttime = jiffies + test_boost_interval * HZ;
> > > >  
> > > > -		firsterr = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "RCU_TORTURE",
> > > > -					     rcutorture_booster_init,
> > > > -					     rcutorture_booster_cleanup);
> > > > -		if (firsterr < 0)
> > > > -			goto unwind;
> > > > -		rcutor_hp = firsterr;
> > > > -	}
> > > >  	shutdown_jiffies = jiffies + shutdown_secs * HZ;
> > > >  	firsterr = torture_shutdown_init(shutdown_secs, rcu_torture_cleanup);
> > > >  	if (firsterr)
> > > > -- 
> > > > 2.28.0.236.gb10cc79966-goog
> > > >