[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <eed6ff19-a944-4e4c-96e4-0f44e888c71d@kzalloc.com>
Date: Wed, 29 Oct 2025 07:53:20 +0900
From: Yunseong Kim <ysk@...lloc.com>
To: Gabriele Monaco <gmonaco@...hat.com>, Nam Cao <nam.cao@...aro.org>
Cc: Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
 Tomas Glozar <tglozar@...hat.com>, Shung-Hsi Yu <shung-hsi.yu@...e.com>,
 Byungchul Park <byungchul@...com>, syzkaller@...glegroups.com,
 linux-rt-devel@...ts.linux.dev, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [Question] Detecting Sleep-in-Atomic Context in PREEMPT_RT via RV
 (Runtime Verification) monitor rtapp:sleep
Hi Gabriele,
On 10/27/25 9:20 PM, Gabriele Monaco wrote:
> On Mon, 2025-10-27 at 15:54 +0900, Yunseong Kim wrote:
>> Hi Nam,
>>
>> I've been very interested in RV (Runtime Verification) to proactively detect
>> "sleep in atomic" scenarios on PREEMPT_RT kernels. Specifically, I'm looking
>> for ways to find cases where sleeping spinlocks or memory allocations are used
>> within preemption-disabled or irq-disabled contexts. While searching for
>> solutions, I discovered the RV subsystem.
>>
> 
> Hi Yunseong,
> 
> I'm sure Nam can be more specific on this, but let me add my 2 cents here.
Thank you so much for your detailed response! It cleared up many of the
questions I had.
> The sleep monitor doesn't really do what you want, its violations are real time
> tasks (typically userspace tasks with RR/FIFO policies) sleeping in a way that
> might incur latencies. For instance using non PI locks or imprecise sleep.
So that’s the role of rtapp:sleep you mentioned. Thank you again for
clarifying it.
> What you need here is to validate kernel code, RV was actually designed for
> that, but there's currently no monitor that does what you want.
It’s a valuable chance to make a contribution to RV!
> The closest thing I can think of is monitors like scpd and snep in the sched
> collection [1]. Those however won't catch what you need because they focus on
> the preemption tracepoints and schedule, which works fine also in your scenario.
> 
> We could add similar monitors to catch what you want though:
> 
>                      |
>                      |
>                      v
>                    +-----------------+
>                    |   cant_sleep    | <+
>                    +-----------------+  |
>                      |                  |
>                      | preempt_enable   | preempt_disable
>                      v                  |
>     kmalloc                             |
>     lock_acquire                        |
>   +---------------      can_sleep       |
>   |                                     |
>   +-------------->                     -+
> 
> which would become slightly more complicated if considering irq enable/disable
> too. This is a deterministic automaton representation (see [1] for examples),
> you could use an LTL like sleep as well, I assume (needs a per-CPU monitor which
> is not merged yet for LTL).
> 
> This is simplified but you can of course put conditions on what kind of
> allocations and locks you're interested in.
If the goal is to detect this state before the output from __might_resched()
under CONFIG_DEBUG_ATOMIC_SLEEP (i.e., before an actual context switch occurs),
I am considering whether Deterministic Automata (.dot/DA) or Linear Temporal
Logic (.ltl/LTL) would be more appropriate for modeling this check. I'm also
thinking about whether I need to create a comprehensive table of all sleepable
functions for this purpose on the PREEMPT_RT kernel.
If this check is necessary, I’m planning to try the following verification:
RULE = always ((IN_ATOMIC or IRQS_DISABLED) imply not CALLS_RT_SLEEPER)
I’m also planning to add sleepable functions, including sleepable spinlocks
and memory allocations callable under PREEMPT_RT preempt/IRQ-disabled states,
to the RV monitor kernel module.
I’m considering adding the following functions as a result:
 // Mutex & Semaphore (or Lockdep's 'lock_acquire' for lock cases)
 "mutex_lock",
 "mutex_lock_interruptible",
 "mutex_lock_killable",
 "down_interruptible",
 "down_killable",
 "rwsem_down_read_failed",
 "rwsem_down_write_failed",
 "ww_mutex_lock",
 "rt_spin_lock",
 "rt_read_lock",
 "rt_write_lock",
 // or just "lock_acquire" for LOCKDEP enabled kernel.
 // sleep & schedule
 "msleep",
 "ssleep",
 "usleep_range",
 "wait_for_completion",
 "schedule",
 "cond_resched",
 // User-space memory access
 "copy_from_user",
 "copy_to_user",
 "__get_user_asm",
 "__put_user_asm",
 // memory allocation
 "__vmalloc",
 "__kmalloc"
> Now this specific case would require lockdep for the definition of lock_acquire
> tracepoints. So I'm not sure how useful this monitor would be since lockdep is
> going to complain too. You could use contention tracepoints to catch exactly
> when sleep is going to occur and not /potential/ failures.
I’ll look into this lockdep realated part further as well.
> I only gave a quick thought on this, there may be better models/event fitting
> your usecase, but I hope you get the idea.
> 
> [1] - https://docs.kernel.org/trace/rv/monitor_sched.html#monitor-scpd
Thank you for providing a diagram and references that make it easier to
understand!
>> Here are my questions:
>>
>> 1. Does the rtapp:sleep monitor proactively detect scenarios that
>>    could lead to sleeping in atomic context, perhaps before
>>    CONFIG_DEBUG_ATOMIC_SLEEP (enabled) would trigger at the actual point of
>>    sleeping?
> 
> I guess I answered this already, but TL;DR no, you'd need a dedicated monitor.
> 
>> 2. Is there a way to enable this monitor (e.g., rtapp:sleep)
>>    immediately as soon as the RV subsystem is loaded during boot time?
>>    (How to make this "default turn on"?)
> 
> Currently not, but you could probably use any sort of startup script to turn it
> on soon enough.
> 
>> 3. When a "violation detected" message occurs at runtime, is it
>>    possible to get a call stack of the location that triggered the
>>    violation? The panic reactor provides a full stack, but I'm
>>    wondering if this is also possible with the printk reactor.
> 
> You can use ftrace and rely on error tracepoints instead of reactors. Each RV
> violation triggers a tracepoint (e.g. error_sleep) and you can print a call
> stack there. E.g.:
> 
>   echo stacktrace > /sys/kernel/tracing/events/rv/error_sleep/trigger
> 
> Here I use sleep as an example, but all monitors have their own error events
> (e.g. error_wwnr, error_snep, etc.).
> 
> Does this all look useful in your scenario?
Thank you once again for your thorough explanation. Many of the questions
I initially had have now been resolved!
> Gabriele
Best regards,
Yunseong Kim
Powered by blists - more mailing lists
 
