[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4834c0cf-b0e8-49c8-a13b-27c80921a03d@kzalloc.com>
Date: Sat, 2 Aug 2025 07:06:39 +0900
From: Yunseong Kim <ysk@...lloc.com>
To: Thomas Gleixner <tglx@...utronix.de>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
Cc: Dmitry Vyukov <dvyukov@...gle.com>,
Andrey Konovalov <andreyknvl@...il.com>, Byungchul Park <byungchul@...com>,
max.byungchul.park@...il.com, Yeoreum Yun <yeoreum.yun@....com>,
Michelle Jin <shjy180909@...il.com>, linux-kernel@...r.kernel.org,
Alan Stern <stern@...land.harvard.edu>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>, stable@...r.kernel.org,
kasan-dev@...glegroups.com, syzkaller@...glegroups.com,
linux-usb@...r.kernel.org, linux-rt-devel@...ts.linux.dev
Subject: Re: [PATCH] kcov, usb: Fix invalid context sleep in softirq path on
PREEMPT_RT
Huge thanks to everyone for the feedback!
While working on earlier patches, running syzkaller on PREEMPT_RT uncovered
numerous sleep-in-atomic-context bugs and other synchronization issues unique to
that environment. This highlighted the need to address these problems.
On 7/26/25 8:59 오후, Thomas Gleixner wrote:
> On Sat, Jul 26 2025 at 09:59, Greg Kroah-Hartman wrote:
>> On Sat, Jul 26, 2025 at 04:44:42PM +0900, Tetsuo Handa wrote:
>>> static void __usb_hcd_giveback_urb(struct urb *urb)
>>> {
>>> (...snipped...)
>>> kcov_remote_start_usb_softirq((u64)urb->dev->bus->busnum) {
>>> if (in_serving_softirq()) {
>>> local_irq_save(flags); // calling local_irq_save() is wrong if CONFIG_PREEMPT_RT=y
>>> kcov_remote_start_usb(id) {
>>> kcov_remote_start(id) {
>>> kcov_remote_start(kcov_remote_handle(KCOV_SUBSYSTEM_USB, id)) {
>>> (...snipped...)
>>> local_lock_irqsave(&kcov_percpu_data.lock, flags) {
>>> __local_lock_irqsave(lock, flags) {
>>> #ifndef CONFIG_PREEMPT_RT
>>> https://elixir.bootlin.com/linux/v6.16-rc7/source/include/linux/local_lock_internal.h#L125
>>> #else
>>> https://elixir.bootlin.com/linux/v6.16-rc7/source/include/linux/local_lock_internal.h#L235 // not calling local_irq_save(flags)
>>> #endif
>
> Right, it does not invoke local_irq_save(flags), but it takes the
> underlying lock, which means it prevents reentrance.
>
>> Ok, but then how does the big comment section for
>> kcov_remote_start_usb_softirq() work, where it explicitly states:
>>
>> * 2. Disables interrupts for the duration of the coverage collection section.
>> * This allows avoiding nested remote coverage collection sections in the
>> * softirq context (a softirq might occur during the execution of a work in
>> * the BH workqueue, which runs with in_serving_softirq() > 0).
>> * For example, usb_giveback_urb_bh() runs in the BH workqueue with
>> * interrupts enabled, so __usb_hcd_giveback_urb() might be interrupted in
>> * the middle of its remote coverage collection section, and the interrupt
>> * handler might invoke __usb_hcd_giveback_urb() again.
>>
>>
>> You are removing half of this function entirely, which feels very wrong
>> to me as any sort of solution, as you have just said that all of that
>> documentation entry is now not needed.
>
> I'm not so sure because kcov_percpu_data.lock is only held within
> kcov_remote_start() and kcov_remote_stop(), but the above comment
> suggests that the whole section needs to be serialized.
>
> Though I'm not a KCOV wizard and might be completely wrong here.
>
> If the whole section is required to be serialized, then this need
> another local lock in kcov_percpu_data to work correctly on RT.
>
> Thanks,
>
> tglx
After receiving comments from maintainers, I realized that my initial patch set
wasn't heading in the right direction.
It seems that the following two patches conflict on PREEMPT_RT kernels:
1. kcov: replace local_irq_save() with a local_lock_t
Link: https://github.com/torvalds/linux/commit/d5d2c51f1e5f
2. kcov, usb: disable interrupts in kcov_remote_start_usb_softirq
Link: https://github.com/torvalds/linux/commit/f85d39dd7ed8
My current approach involves:
* Removing the existing 'kcov_percpu_data.lock'
* Converting 'kcov->lock' and 'kcov_remote_lock' to raw spinlocks
* Relocating the kmalloc call for kcov_remote_add() outside kcov_ioctl_locked(),
as GFP_ATOMIC allocations can potentially sleep under PREEMPT_RT.
: As expected from further testing, keeping the GFP_ATOMIC allocation inside
kcov_remote_add() still leads to sleep in atomic context.
This approach allows us to keep Andrey’s patch d5d2c51f1e5f while making
modifications as Sebastian suggested in his commit f85d39dd7ed8 message,
which I found particularly insightful and full of helpful hints.
The work I'm doing on PATCH v2 involves a number of changes, and I would truly
appreciate any critical feedback. I'm always happy to hear insights!
Best regards,
Yunseong Kim
Powered by blists - more mailing lists