[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c0dca589e8615e1e0105cf1ae20f3f613992d33d.camel@redhat.com>
Date: Wed, 30 Apr 2025 10:05:07 +0200
From: Gabriele Monaco <gmonaco@...hat.com>
To: Nam Cao <namcao@...utronix.de>
Cc: Steven Rostedt <rostedt@...dmis.org>,
linux-trace-kernel@...r.kernel.org, linux-kernel@...r.kernel.org,
john.ogness@...utronix.de, Peter Zijlstra <peterz@...radead.org>, Ingo
Molnar <mingo@...hat.com>, Will Deacon <will@...nel.org>, Boqun Feng
<boqun.feng@...il.com>, Waiman Long <longman@...hat.com>
Subject: Re: [PATCH v5 21/23] rv: Add rtapp_sleep monitor
On Tue, 2025-04-29 at 19:20 +0200, Nam Cao wrote:
> On Tue, Apr 29, 2025 at 06:01:01PM +0200, Gabriele Monaco wrote:
> > On Tue, 2025-04-29 at 14:01 +0200, Nam Cao wrote:
> > > Add a monitor for checking that real-time tasks do not go to
> > > sleep in
> > > a
> > > manner that may cause undesirable latency.
> > >
> > > Also change
> > > RV depends on TRACING
> > > to
> > > RV select TRACING
> > > to avoid the following recursive dependency:
> > >
> > > error: recursive dependency detected!
> > > symbol TRACING is selected by PREEMPTIRQ_TRACEPOINTS
> > > symbol PREEMPTIRQ_TRACEPOINTS depends on TRACE_IRQFLAGS
> > > symbol TRACE_IRQFLAGS is selected by RV_MON_SLEEP
> > > symbol RV_MON_SLEEP depends on RV
> > > symbol RV depends on TRACING
> > >
> > > Reviewed-by: Gabriele Monaco <gmonaco@...hat.com>
> > > Signed-off-by: Nam Cao <namcao@...utronix.de>
> > > ---
> > > Cc: Peter Zijlstra <peterz@...radead.org>
> > > Cc: Ingo Molnar <mingo@...hat.com>
> > > Cc: Will Deacon <will@...nel.org>
> > > Cc: Boqun Feng <boqun.feng@...il.com>
> > > Cc: Waiman Long <longman@...hat.com>
> > > ---
> > >
> > > [...]
> > >
> > > +RULE = always ((RT and SLEEP) imply (RT_FRIENDLY_SLEEP or
> > > ALLOWLIST))
> > > +
> > > +RT_FRIENDLY_SLEEP = (RT_VALID_SLEEP_REASON or KERNEL_THREAD)
> > > + and ((not WAKE) until RT_FRIENDLY_WAKE)
> > > +
> > > +RT_VALID_SLEEP_REASON = PI_FUTEX
> > > + or RT_FRIENDLY_NANOSLEEP
> > > +
> > > +RT_FRIENDLY_NANOSLEEP = CLOCK_NANOSLEEP
> > > + and NANOSLEEP_TIMER_ABSTIME
> > > + and NANOSLEEP_CLOCK_MONOTONIC
> > > +
> > > +RT_FRIENDLY_WAKE = WOKEN_BY_EQUAL_OR_HIGHER_PRIO
> > > + or WOKEN_BY_HARDIRQ
> > > + or WOKEN_BY_NMI
> > > + or KTHREAD_SHOULD_STOP
> > > +
> > > +ALLOWLIST = BLOCK_ON_RT_MUTEX
> > > + or TASK_IS_RCU
> > > + or TASK_IS_MIGRATION
> >
> > So, just thinking out loud, PI_FUTEX is a valid sleep reason,
> > technically also BLOCK_ON_RT_MUTEX is something you are allowing.
> >
> > In my understanding, the contention tracepoints already in the
> > kernel
> > can track all contention by kernel code and are leaving aside the
> > PI
> > futexes, which use the untracked rt_mutex_wait_proxy_lock.
> >
> > In your case, you are tracking PI_FUTEX via the system call, which
> > should cover the above scenario.
> >
> > Do you really need extra tracepoints to track this too? Or is there
> > any
> > other use of start_proxy_lock/wait_proxy_lock I'm missing here?
> >
> > I see the only case in which rt_mutex_start_proxy_lock is called
> > with a
> > task different than current is via FUTEX_CMP_REQUEUE_PI, wouldn't
> > considering this one too make the new tracepoints superfluous
> > (assuming
> > this one is even needed to be tracked before
> > FUTEX_WAIT_REQUEUE_PI).
>
> The monitor allows PI_FUTEX and allows BLOCK_ON_RT_MUTEX in different
> manners.
>
> PI_FUTEX is only a valid sleep reason. If a task sleeps with
> PI_FUTEX=true,
> it still has to obey ((not WAKE) until RT_FRIENDLY_WAKE)
>
> On the other hand, BLOCK_ON_RT_MUTEX alone is good enough. Waker is
> not
> checked due to how rt_mutex is implemented: when a task unlocks an
> rt_mutex
> and wakes waiter, the task is priority-deboosted first before doing
> the
> wakeup, and we would see a false positive warning. See
> rt_mutex_slowunlock().
>
> In the case of futex_lock_pi(), both PI_FUTEX and BLOCK_ON_RT_MUTEX
> is
> true. Therefore we don't check the waker.
>
> However, in the case of futex_wait_requeue_pi(), PI_FUTEX is true but
> BLOCK_ON_RT_MUTEX is false. In this case, we check the waker.
>
> So, what happens if we don't have the tracepoint in *_proxy_lock()?
> The
> futex_lock_pi() may generate a false positive warning, because we
> check the
> waker and the waker may have lower priority.
>
> But now that you mention it, perhaps instead of PI_FUTEX, we could
> define
> FUTEX_LOCK_PI and FUTEX_WAIT_REQUEUE_PI separately. And we don't
> check the
> waker if FUTEX_LOCK_PI=true. Something like the diff below.
>
> Then we wouldn't need the block_on_rt_mutex tracepoints anymore. And
> the
> specification is a bit more obvious.
>
> Having a second pair of eyes is great, thanks!
>
> Nam
>
> diff --git a/tools/verification/models/rtapp/sleep.ltl
> b/tools/verification/models/rtapp/sleep.ltl
> index 6e2f1ff31163..1f26e58e72f8 100644
> --- a/tools/verification/models/rtapp/sleep.ltl
> +++ b/tools/verification/models/rtapp/sleep.ltl
> @@ -3,7 +3,7 @@ RULE = always ((RT and SLEEP) imply
> (RT_FRIENDLY_SLEEP or ALLOWLIST))
> RT_FRIENDLY_SLEEP = (RT_VALID_SLEEP_REASON or KERNEL_THREAD)
> and ((not WAKE) until RT_FRIENDLY_WAKE)
>
> -RT_VALID_SLEEP_REASON = PI_FUTEX
> +RT_VALID_SLEEP_REASON = FUTEX_WAIT_REQUEUE_PI
> or RT_FRIENDLY_NANOSLEEP
>
> RT_FRIENDLY_NANOSLEEP = CLOCK_NANOSLEEP
> @@ -16,5 +16,6 @@ RT_FRIENDLY_WAKE = WOKEN_BY_EQUAL_OR_HIGHER_PRIO
> or KTHREAD_SHOULD_STOP
>
> ALLOWLIST = BLOCK_ON_RT_MUTEX
> + or FUTEX_LOCK_PI
> or TASK_IS_RCU
> or TASK_IS_MIGRATION
Well, it was more complicated than I pictured it, glad my babbling
helped in finding another solution ;)
I've got one more small remark though, the name ALLOWLIST doesn't give
justice to what you explained above.
It suggests me something potentially wrong that we can't do much about,
while in case of RT mutexes and futex lock, we just don't want the
monitor to yell in a perfectly RT-compliant scenario.
What is happening here, from what I see, is that the kernel is handling
the RT behaviour and your monitor is just meant to tell when userspace
is doing something it could do better (unless we deal with kthreads,
there we are in fact whitelisting the ones we know are not complying).
What about calling it RT_KERNEL_MANAGED_SLEEP or something along the
line to say we just trust what the kernel is doing?
Does this make sense to you?
Thanks,
Gabriele
Powered by blists - more mailing lists