linux-kernel - Re: [PATCH RFC context_tracking] Make RCU watch ct_kernel_exit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <xhsmhh658jvq4.mognet@vschneid-thinkpadt14sgen2i.remote.csb>
Date: Wed, 05 Feb 2025 15:45:55 +0100
From: Valentin Schneider <vschneid@...hat.com>
To: paulmck@...nel.org
Cc: linux-kernel@...r.kernel.org, frederic@...nel.org, leitao@...ian.org
Subject: Re: [PATCH RFC context_tracking] Make RCU watch
 ct_kernel_exit_state() warning

On 05/02/25 04:16, Paul E. McKenney wrote:
> On Wed, Feb 05, 2025 at 12:17:06PM +0100, Valentin Schneider wrote:
>> On 01/02/25 10:44, Paul E. McKenney wrote:
>> > The WARN_ON_ONCE() in ct_kernel_exit_state() follows the call to
>> > ct_state_inc(), which means that RCU is not watching this WARN_ON_ONCE().
>> > This can (and does) result in extraneous lockdep warnings when this
>> > WARN_ON_ONCE() triggers.  These extraneous warnings are the opposite
>> > of helpful.
>> >
>> > Therefore, invert the WARN_ON_ONCE() condition and move it before the
>> > call to ct_state_inc().  This does mean that the ct_state_inc() return
>> > value can no longer be used in the WARN_ON_ONCE() condition, so discard
>> > this return value and instead use a call to rcu_is_watching_curr_cpu().
>> > This call is executed only in CONFIG_RCU_EQS_DEBUG=y kernels, so there
>> > is no added overhead in production use.
>> >
>> > Reported-by: Breno Leitao <leitao@...ian.org>
>> > Signed-off-by: Paul E. McKenney <paulmck@...nel.org>
>> > Cc: Frederic Weisbecker <frederic@...nel.org>
>> > Cc: Valentin Schneider <vschneid@...hat.com>
>> >
>> > diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
>> > index 938c48952d26..fb5be6e9b423 100644
>> > --- a/kernel/context_tracking.c
>> > +++ b/kernel/context_tracking.c
>> > @@ -80,17 +80,16 @@ static __always_inline void rcu_task_trace_heavyweight_exit(void)
>> >   */
>> >  static noinstr void ct_kernel_exit_state(int offset)
>> >  {
>> > -	int seq;
>> > -
>> >       /*
>> >        * CPUs seeing atomic_add_return() must see prior RCU read-side
>> >        * critical sections, and we also must force ordering with the
>> >        * next idle sojourn.
>> >        */
>> >       rcu_task_trace_heavyweight_enter();  // Before CT state update!
>> > -	seq = ct_state_inc(offset);
>> > -	// RCU is no longer watching.  Better be in extended quiescent state!
>> > -	WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && (seq & CT_RCU_WATCHING));
>> > +	// RCU is still watching.  Better not be in extended quiescent state!
>> > +	WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !rcu_is_watching_curr_cpu());
>>
>> Isn't this equivalent to the check done in ct_kernel_enter_state()? That
>> is, it operates on the same context_tracking.state value that the
>> ct_kernel_enter_state() WARN_ON_ONCE() sees, so if the warning is to fire
>> it will fire there first.
>
> In theory, yes.  In practice, the bug we are trying to complain about
> might well be due to that call to ct_kernel_enter_state() having been
> left out completely.  Or, more likely, the call to one of its callers
> having been left out completely.  So we cannot rely on its WARN_ON_ONCE()
> to detect this sort of omitted-call bug.
>
> And these omitted-call bugs do happen when bringing up new hardware or
> implementing new exception paths for existing hardware.
>

Ah, quite so, it evens says so on the tin for ct_nmi_enter() & co.

>> I don't have any better idea than something like the ugly:
>>
>>      if (IS_ENABLED(CONFIG_RCU_EQS_DEBUG)) {
>>              unsigned int new_state, state = atomic_read(&ct->state);
>>              bool ret;
>>
>>              do {
>>                      new_state = state + offset;
>>                      // RCU will no longer be watching. Better be in extended quiescent state!
>>                      WARN_ON_ONCE(new_state & CT_RCU_WATCHING);
>>
>>                      ret = atomic_try_cmpxchg(&ct->state, &state, new_state);
>>              } while (!ret);
>>      } else {
>>              (void)ct_state_inc(offset);
>>      }
>
> This would make sense if we need to detect a bug in ct_state_inc() itself.
> But that function is a one-liner invoking raw_atomic_add_return(),
> and we have other tests to find bugs in atomics, correct?
>
> Or am I missing a trick here?
>

Not at all; consider my suggestion revoked and my questioning answered :-)

Reviewed-by: Valentin Schneider <vschneid@...hat.com>