linux-kernel - Re: [WIP] coccinelle: rt: Add coccicheck on sleep in atomic context on PREEMPT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAP4=nvTOE9W+6UtVZ5-5gAoYeEQE8g4cgG602FJDPesNko-Bgw@mail.gmail.com>
Date: Tue, 26 Aug 2025 13:19:05 +0200
From: Tomas Glozar <tglozar@...hat.com>
To: Yunseong Kim <ysk@...lloc.com>
Cc: Julia Lawall <Julia.Lawall@...ia.fr>, Nicolas Palix <nicolas.palix@...g.fr>, 
	Easwar Hariharan <eahariha@...ux.microsoft.com>, Gal Pressman <gal@...dia.com>, 
	Hongbo Li <lihongbo22@...wei.com>, Kees Cook <kees@...nel.org>, cocci@...ia.fr, 
	linux-rt-devel@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [WIP] coccinelle: rt: Add coccicheck on sleep in atomic context
 on PREEMPT_RT

Hi Yunseong,

so 16. 8. 2025 v 6:56 odesílatel Yunseong Kim <ysk@...lloc.com> napsal:
>
> I'm working on a new Coccinelle script to detect sleep-in-atomic bugs in
> PREEMPT_RT kernels. This script identifies calls to sleeping functions
> (e.g., mutex_lock, msleep, kmalloc with GFP_KERNEL, spin_lock which may
> sleep in PREEMPT_RT) within atomic contexts (e.g., raw_spin_lock,
> preempt_disable, bit_spin_lock).
>
> It supports both direct calls and indirect call chains through
> inter-procedural analysis using function call graphs. Memory allocations
> are handled including GFP_ATOMIC/NOWAIT. This is a WIP patch for early
> feedback. I've tested it with make coccicheck on various subsystems, but
> there are still issues with position variables sometimes being tuples,
> leading to "Invalid position info" warnings and incomplete data collection.

I can share some of my own experience. I wrote a similar tool for the
same problem two years ago, called rtlockscope [1], which uses ctags
to get a list of all functions, CScope to get a function call graph,
and assigning a summary to each function based on its callees. The
results could use some improvement, since it reduces control flow to
an ordering of callees, and assumes that all symbols are global (e.g.
an ARM-only function is seen as called from x86-only code).

[1] Repo: https://gitlab.com/tglozar/rtlockscope, LPC talk slides:
https://lpc.events/event/18/contributions/1735/attachments/1428/3051/lpc2024talk.pdf;
currently I'm focusing on getting more reliable results using automata
abstractions.

>
> The script includes defensive checks, but indirect bugs are not always
> detected. I'd appreciate any suggestions on improving the Python handling
> of position variables or the SmPL rules for better matching in complex code
> (e.g., macros, inlines). The script is added to scripts/coccinelle/rt/.
>

My tool captures macros, but it reports a lot of false positives via
various KASAN and printing routines. For example:

Sleeping lock called at:
__cache_free at mm/slab.c:3617
___cache_free at mm/slab.c:3378
do_slab_free at mm/slub.c:3816
__slab_free at mm/slub.c:3796
put_cpu_partial at mm/slub.c:3679
local_lock_irqsave at mm/slub.c:2703
__local_lock_irqsave at include/linux/local_lock.h:31
__local_lock at include/linux/local_lock_internal.h:128
spin_lock at include/linux/local_lock_internal.h:119

preemption disabled at:
__cache_free at mm/slab.c:3617
kasan_slab_free at mm/slab.c:3370
__kasan_slab_free at include/linux/kasan.h:164
____kasan_slab_free at mm/kasan/common.c:244
kasan_quarantine_put at mm/kasan/common.c:238
raw_spin_lock at mm/kasan/quarantine.c:224
_raw_spin_lock at include/linux/spinlock.h:217
__raw_spin_lock at kernel/locking/spinlock.c:154
preempt_disable at include/linux/spinlock_api_smp.h:132

But that might be just because I'm also tracking indirect
preempt_disable though (see below). I'm not familiar with Coccinelle
unfortunately, I considered it for a while, but opted for a different
approach.

> Detects sleep-in-atomic bugs in PREEMPT_RT kernels by identifying improper
> calls to functions that may sleep, such as mutex locks, explicit sleep
> functions (e.g., msleep), memory allocations and sleepable spinlocks,
> within atomic contexts created by preempt_disable, raw_spin_lock,
> irq_disable (e.g. bit_spin_lock).
>
> 1. Detection of direct calls to sleeping functions in atomic scopes.
> 2. Analysis of inter-procedural call chains to uncover indirect calls to
>    sleeping functions via function call graphs.
> 3. Handling of memory allocation functions that may sleep.
>    (including GFP_ATOMIC).
>

If I understand your code properly, you only match on a specific case
of sleeping in atomic context, where the offending call is directly in
between "preempt disable" and "preempt enable".

That means that your script only takes indirection into account for
sleeping functions, not for disabling preemption/atomic context. There
are some occurrences where custom "lock" functions call
preempt_disable in the kernel, so this is needed in order not to miss
those. But it might be better to skip them to prevent flooding the
output with a lot of false positives, since one unmatched
preempt_disable will pollute the rest of the function (and every
function that calls it).

> This cocci script should identify direct and indirect sleep-in-atomic
> violations, improving PREEMPT_RT compatibility across kernel code.
> For example:
> Link: https://lore.kernel.org/linux-rt-devel/7a68c944-0199-468e-a0f2-ae2a9f21225b@kzalloc.com/t/#u
>

There are likely still tens of these bugs across different subsystems,
I remember fixing one in nvdimm and one in BPF.

There is also a 2018 paper, Effective Detection of
Sleep-in-Atomic-Context Bugs in the Linux Kernel [2], which covers
this problem without taking PREEMPT_RT into account. They identify
three challenges: accurately processing control flow, handling
function pointers, and handling different code paths. Notably, they
also use summaries, and handle sleeping in atomic context in interrupt
handlers. Overall, it looks like it uses the same general approach as
rtlockscope and your Coccinelle script, just more polished, so you
might want to have a look at it (if you have not seen it yet). Of
course, on PREEMPT_RT, there is an additional challenge in
distinguishing between RT and non-RT paths (like code that sleeps only
on RT and disables preemption only on non-RT).

[2] https://hal.science/hal-03032244



Tomas