linux-kernel - Re: [Question] Detecting Sleep-in-Atomic Context in PREEMPT_RT via RV (Runtime Verification) monitor rtapp:sleep

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6a85223ba6022ed2183a522fcf3e7c8d00675672.camel@redhat.com>
Date: Wed, 29 Oct 2025 10:24:53 +0100
From: Gabriele Monaco <gmonaco@...hat.com>
To: Yunseong Kim <ysk@...lloc.com>, Nam Cao <namcao@...utronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@...utronix.de>, Tomas Glozar	
 <tglozar@...hat.com>, Shung-Hsi Yu <shung-hsi.yu@...e.com>, Byungchul Park	
 <byungchul@...com>, syzkaller@...glegroups.com,
 linux-rt-devel@...ts.linux.dev,  LKML <linux-kernel@...r.kernel.org>
Subject: Re: [Question] Detecting Sleep-in-Atomic Context in PREEMPT_RT via
 RV (Runtime Verification) monitor rtapp:sleep

On Wed, 2025-10-29 at 07:53 +0900, Yunseong Kim wrote:
> > What you need here is to validate kernel code, RV was actually designed for
> > that, but there's currently no monitor that does what you want.
> 
> It’s a valuable chance to make a contribution to RV!

And could be quite a useful model!

> If the goal is to detect this state before the output from __might_resched()
> under CONFIG_DEBUG_ATOMIC_SLEEP (i.e., before an actual context switch
> occurs),
> I am considering whether Deterministic Automata (.dot/DA) or Linear Temporal
> Logic (.ltl/LTL) would be more appropriate for modeling this check. I'm also
> thinking about whether I need to create a comprehensive table of all sleepable
> functions for this purpose on the PREEMPT_RT kernel.
> 
> If this check is necessary, I’m planning to try the following verification:
> 
> RULE = always ((IN_ATOMIC or IRQS_DISABLED) imply not CALLS_RT_SLEEPER)

Yes, in this case DA or LTL is mostly down to preference, one thing to keep in
mind is that this is going to be a per-cpu monitor (i.e. the rule stands for
each CPU, as the irq/preemption state is per-cpu).

LTL support for per-cpu is added in [1] (not merged), so you will need to pull
that in if you want to play with LTL.

[1] -
https://lore.kernel.org/lkml/e7fb580ca898c707573fe1dcf6312f0c2d7682c5.1754900299.git.namcao@linutronix.de

> I’m also planning to add sleepable functions, including sleepable spinlocks
> and memory allocations callable under PREEMPT_RT preempt/IRQ-disabled states,
> to the RV monitor kernel module.
> 
> I’m considering adding the following functions as a result:
> 
>  // Mutex & Semaphore (or Lockdep's 'lock_acquire' for lock cases)
>  "mutex_lock",
>  "mutex_lock_interruptible",
>  "mutex_lock_killable",
>  "down_interruptible",
>  "down_killable",
>  "rwsem_down_read_failed",
>  "rwsem_down_write_failed",
>  "ww_mutex_lock",
>  "rt_spin_lock",
>  "rt_read_lock",
>  "rt_write_lock",
>  // or just "lock_acquire" for LOCKDEP enabled kernel.
> 
>  // sleep & schedule
>  "msleep",
>  "ssleep",
>  "usleep_range",
>  "wait_for_completion",
>  "schedule",
>  "cond_resched",
> 
>  // User-space memory access
>  "copy_from_user",
>  "copy_to_user",
>  "__get_user_asm",
>  "__put_user_asm",
> 
>  // memory allocation
>  "__vmalloc",
>  "__kmalloc"
> 

Here you're talking about direct kernel functions, currently RV relies on
tracepoints (that's why I mentioned those earlier). You have two routes:

1. use existing tracepoints and/or add new ones in strategical points
2. use kprobes and attach wherever you want

1. is very easy in RV and you may use tracepoints arguments to narrow down the
search (e.g. just transition state on certain locks, certain allocations), you
may need to discuss with various maintainers to add new ones, but that's usually
alright, have a look at the V2 of the linked thread for an example [2].

2. is a bit more involved, you'd be able to access precisely the functions you
want (usually), but I'm not sure about the overhead of plugging 15 kprobes.
Also RV doesn't support kprobes, although extending it is rather trivial.

You can mix both, of course. But yes, you'd need to identify all the "events"
you care about. I'd start simple with some of those (e.g. malloc and lock
contention tracepoints) and see if it satisfies your needs.

You may also be counting things twice (isn't malloc calling locks, which may end
up calling schedule?), just an idea, but you may find common paths in the above
list.

Gabriele

[2] -
https://lore.kernel.org/lkml/f87ce0cb979daa3e8221c496de16883ca53f3950.1754466623.git.namcao@linutronix.de

> > Now this specific case would require lockdep for the definition of
> > lock_acquire
> > tracepoints. So I'm not sure how useful this monitor would be since lockdep
> > is
> > going to complain too. You could use contention tracepoints to catch exactly
> > when sleep is going to occur and not /potential/ failures.
> 
> I’ll look into this lockdep realated part further as well.
> 
> > I only gave a quick thought on this, there may be better models/event
> > fitting
> > your usecase, but I hope you get the idea.
> > 
> > [1] - https://docs.kernel.org/trace/rv/monitor_sched.html#monitor-scpd
> 
> Thank you for providing a diagram and references that make it easier to
> understand!
> 
> > > Here are my questions:
> > > 
> > > 1. Does the rtapp:sleep monitor proactively detect scenarios that
> > >    could lead to sleeping in atomic context, perhaps before
> > >    CONFIG_DEBUG_ATOMIC_SLEEP (enabled) would trigger at the actual point
> > > of
> > >    sleeping?
> > 
> > I guess I answered this already, but TL;DR no, you'd need a dedicated
> > monitor.
> > 
> > > 2. Is there a way to enable this monitor (e.g., rtapp:sleep)
> > >    immediately as soon as the RV subsystem is loaded during boot time?
> > >    (How to make this "default turn on"?)
> > 
> > Currently not, but you could probably use any sort of startup script to turn
> > it
> > on soon enough.
> > 
> > > 3. When a "violation detected" message occurs at runtime, is it
> > >    possible to get a call stack of the location that triggered the
> > >    violation? The panic reactor provides a full stack, but I'm
> > >    wondering if this is also possible with the printk reactor.
> > 
> > You can use ftrace and rely on error tracepoints instead of reactors. Each
> > RV
> > violation triggers a tracepoint (e.g. error_sleep) and you can print a call
> > stack there. E.g.:
> > 
> >   echo stacktrace > /sys/kernel/tracing/events/rv/error_sleep/trigger
> > 
> > Here I use sleep as an example, but all monitors have their own error events
> > (e.g. error_wwnr, error_snep, etc.).
> > 
> > Does this all look useful in your scenario?
> 
> Thank you once again for your thorough explanation. Many of the questions
> I initially had have now been resolved!
> 
> > Gabriele
> 
> Best regards,
> Yunseong Kim