linux-kernel - Re: [PATCH] tracing: Do not synchronize freeing of trigger filter on boot up

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20221215190158.GK4001@paulmck-ThinkPad-P17-Gen-1>
Date:   Thu, 15 Dec 2022 11:01:58 -0800
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     Masami Hiramatsu <mhiramat@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux Trace Kernel <linux-trace-kernel@...r.kernel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Subject: Re: [PATCH] tracing: Do not synchronize freeing of trigger filter on
 boot up

On Thu, Dec 15, 2022 at 01:51:02PM -0500, Steven Rostedt wrote:
> On Thu, 15 Dec 2022 09:02:56 -0800
> "Paul E. McKenney" <paulmck@...nel.org> wrote:
> 
> > On Thu, Dec 15, 2022 at 10:02:41AM -0500, Steven Rostedt wrote:
> > > On Wed, 14 Dec 2022 12:03:33 -0800
> > > "Paul E. McKenney" <paulmck@...nel.org> wrote:
> > >   
> > > > > > Avoid calling the synchronization function when system_state is
> > > > > > SYSTEM_BOOTING.    
> > > > > 
> > > > > Shouldn't this be done inside tracepoint_synchronize_unregister()?
> > > > > Then, it will prevent similar warnings if we expand boot time feature.    
> > > > 
> > > > How about the following wide-spectrum fix within RCU_LOCKDEP_WARN()?
> > > > Just in case there are ever additional issues of this sort?  
> > > 
> > > Adding more tracing command line parameters is triggering this more. I now
> > > hit:  
> > 
> > Fair point, and apologies for the hassle.
> > 
> > Any chance of getting an official "it is now late enough in boot to
> > safely invoke lockdep" API?  ;-)
> 
> lockdep API isn't the problem, it's that we are still in the earlyboot stage
> where interrupts are disabled, and you can't enable them. Lockdep works
> just fine there, and is reporting interrupts being disabled correctly. The
> backtrace reported *does* have interrupts disabled.
> 
> The thing is, because we are still running on a single CPU with interrupts
> disabled, there is no need for synchronization. Even grabbing a mutex is
> safe because there's guaranteed to be no contention (unless it's not
> released). This is why a lot of warnings are suppressed if system_state is
> SYSTEM_BOOTING.

Agreed, and my second attempt is a bit more surgical.  (Please see below
for a more official version of it.)

> > In the meantime, does the (untested and quite crude) patch at the end
> > of this message help?
> 
> I'll go and test it, but I'm guessing it will work fine. You could also test
> against system_state != SYSTEM_BOOTING, as that gets cleared just before
> kernel_init() can continue (it waits for the complete() that is called
> after system_state is set to SYSTEM_SCHEDULING). Which happens shortly
> after rcu_scheduler_starting().
> 
> I wonder if you could even replace RCU_SCHEDULER_RUNNING with
> system_state != SYSTEM_BOOTING, and remove the rcu_scheduler_starting()
> call.

In this particular case, agreed, I could use system_state.  But there are
other cases that must change behavior as soon as preemption can happen,
which is upon return from that call to user_mode_thread().  The update to
system_state doesn't happen until much later.  So I don't get to remove
that rcu_scheduler_starting() call.

What case?

Here is one:

o	The newly spawned init process does something that uses RCU,
	but is preempted while holding rcu_read_lock().

o	The boot thread, which did the preempting, waits for a grace
	period.  If we use rcu_scheduler_active, all is well because
	synchronize_rcu() will do a real run-time grace period, thus
	waiting for that reader.

	But system_state has not yet been updated, so if synchronize_rcu()
	were instead to pay attention to that one, there might be a
	tragically too-short RCU grace period.

Thoughts?

							Thanx, Paul

------------------------------------------------------------------------

commit 876c5ac113fa66a64fa241e69d9a2251b8daa5ee
Author: Paul E. McKenney <paulmck@...nel.org>
Date:   Thu Dec 15 09:26:09 2022 -0800

    rcu: Don't assert interrupts enabled too early in boot
    
    The rcu_poll_gp_seq_end() and rcu_poll_gp_seq_end_unlocked() both check
    that interrupts are enabled, as they normally should be when waiting for
    an RCU grace period.  Except that it is legal to wait for grace periods
    during early boot, before interrupts have been enabled for the first time,
    and polling for grace periods is required to work during this time.
    This can result in false-positive lockdep splats in the presence of
    boot-time-initiated tracing.
    
    This commit therefore conditions those interrupts-enabled checks on
    rcu_scheduler_active having advanced past RCU_SCHEDULER_INACTIVE, by
    which time interrupts have been enabled.
    
    Reported-by: Steven Rostedt <rostedt@...dmis.org>
    Signed-off-by: Paul E. McKenney <paulmck@...nel.org>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index ee8a6a711719a..f627888715dca 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1314,7 +1314,7 @@ static void rcu_poll_gp_seq_start(unsigned long *snap)
 {
 	struct rcu_node *rnp = rcu_get_root();
 
-	if (rcu_init_invoked())
+	if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE)
 		raw_lockdep_assert_held_rcu_node(rnp);
 
 	// If RCU was idle, note beginning of GP.
@@ -1330,7 +1330,7 @@ static void rcu_poll_gp_seq_end(unsigned long *snap)
 {
 	struct rcu_node *rnp = rcu_get_root();
 
-	if (rcu_init_invoked())
+	if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE)
 		raw_lockdep_assert_held_rcu_node(rnp);
 
 	// If the previously noted GP is still in effect, record the
@@ -1353,7 +1353,8 @@ static void rcu_poll_gp_seq_start_unlocked(unsigned long *snap)
 	struct rcu_node *rnp = rcu_get_root();
 
 	if (rcu_init_invoked()) {
-		lockdep_assert_irqs_enabled();
+		if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE)
+			lockdep_assert_irqs_enabled();
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 	}
 	rcu_poll_gp_seq_start(snap);
@@ -1369,7 +1370,8 @@ static void rcu_poll_gp_seq_end_unlocked(unsigned long *snap)
 	struct rcu_node *rnp = rcu_get_root();
 
 	if (rcu_init_invoked()) {
-		lockdep_assert_irqs_enabled();
+		if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE)
+			lockdep_assert_irqs_enabled();
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 	}
 	rcu_poll_gp_seq_end(snap);