[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ddd14f62-b6c9-4984-84be-6c999ea92e30@kzalloc.com>
Date: Wed, 6 Aug 2025 00:27:01 +0900
From: Yunseong Kim <ysk@...lloc.com>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: Dmitry Vyukov <dvyukov@...gle.com>,
Andrey Konovalov <andreyknvl@...il.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Thomas Gleixner <tglx@...utronix.de>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
Byungchul Park <byungchul@...com>, max.byungchul.park@...il.com,
Yeoreum Yun <yeoreum.yun@....com>, ppbuk5246@...il.com,
linux-usb@...r.kernel.org, linux-rt-devel@...ts.linux.dev,
syzkaller@...glegroups.com, linux-kernel@...r.kernel.org,
stable@...r.kernel.org
Subject: Re: [PATCH v3 0/4] kcov, usb: Fix invalid context sleep in softirq
path on PREEMPT_RT
Hi Steve,
You're absolutely right to ask for clarification, and I now realize that
I didn’t explain the background clearly enough in my cover letter.
On 8/5/25 1:24 오전, Steven Rostedt wrote:
> On Sun, 3 Aug 2025 07:20:41 +0000
> Yunseong Kim <ysk@...lloc.com> wrote:
>
>> This patch series resolves a sleeping function called from invalid context
>> bug that occurs when fuzzing USB with syzkaller on a PREEMPT_RT kernel.
>>
>> The regression was introduced by the interaction of two separate patches:
>> one that made kcov's internal locks sleep on PREEMPT_RT for better latency
>
> Just so I fully understand this change. It is basically reverting the
> "better latency" changes? That is, with KCOV anyone running with PREEMPT_RT
> can expect non deterministic latency behavior?
The regression results from the interaction of two changes — and in my original
description, I inaccurately characterized one of them as being
"for better latency." That was misleading.
The first change d5d2c51 replaced spin_lock_irqsave() with local_lock_irqsave()
in KCOV to ensure compatibility with PREEMPT_RT. This avoided using a
potentially sleeping lock with interrupts disabled.
At the time, as Sebastian noted:
"There is no compelling reason to change the lock type to raw_spin_lock_t...
Changing it would require to move memory allocation and deallocation outside
of the locked section."
However, the situation changed after another patch 8fea0c8 converted the USB
HCD tasklet to a BH workqueue. As a result, usb_giveback_urb_bh() began running
with interrupts enabled, and the KCOV remote coverage collection section in
this path became re-entrant. To prevent nested coverage sections — which KCOV
doesn’t support — kcov_remote_start_usb_softirq() was updated to explicitly
disable interrupts during coverage collection f85d39d.
This combination — using a local_lock (which can sleep on RT) alongside
local_irq_save() — inadvertently created a scenario where a sleeping lock was
acquired in atomic context, triggering a kernel BUG on PREEMPT_RT.
So while the original KCOV locking change didn't require raw spinlocks at
the time, it became effectively incompatible with the USB softirq use case once
that path began relying on interrupt disabling for correctness. In this sense,
the "no compelling reason" eventually turned into a "necessary compromise."
To clarify: this patch series doesn't revert the previous change entirely.
It keeps the local_lock behavior for task context (where it's safe and
appropriate), but ensures atomic safety in interrupt/softirq contexts by
using raw spinlocks selectively where needed.
> This should be fully documented. I assume this will not be a problem as
> kcov is more for debugging and should not be enabled in production.
>
> -- Steve
>
Thanks again for raising this — I’ll make sure the changelog documents this
interaction more clearly.
Best regards,
Yunseong Kim
Powered by blists - more mailing lists