[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <001d822f-2f78-4ba5-b29f-23ec1813d3d4@gmail.com>
Date: Thu, 14 Aug 2025 16:45:19 +0100
From: Pavel Begunkov <asml.silence@...il.com>
To: Breno Leitao <leitao@...ian.org>, Mike Galbraith <efault@....de>,
paulmck@...nel.org, kuba@...nel.org
Cc: LKML <linux-kernel@...r.kernel.org>, netdev@...r.kernel.org,
boqun.feng@...il.com
Subject: Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning
On 8/14/25 11:16, Breno Leitao wrote:
> Hello Mike,
>
> On Wed, Aug 13, 2025 at 06:14:36AM +0200, Mike Galbraith wrote:
>> [ 107.984942] Chain exists of:
>> console_owner --> target_list_lock --> &fq->lock
>>
>> [ 107.984947] Possible interrupt unsafe locking scenario:
>>
>> [ 107.984948] CPU0 CPU1
>> [ 107.984949] ---- ----
>> [ 107.984950] lock(&fq->lock);
>> [ 107.984952] local_irq_disable();
>> [ 107.984952] lock(console_owner);
>> [ 107.984954] lock(target_list_lock);
>
> Thanks for the report. I _think_ I understand the problem, it should be
> easier to see it while thinking about a single CPU:
>
> 1) lock(&fq->lock); // This is not hard irq safe log
> 2) IRQ // IRQ hits the while the lock is held
> 2.1) printk() // WARNs and printk can in fact happen during IRQs
> 2.2) netconsole subsystem /// target_list_lock is not important and can be ignored
> 2.2) netpoll // net poll will call the network subsystem to send the packet
> 2.3) lock(&fq->lock); // Try to get the lock while the lock was already held
> 3) Dead lock!
>
> Given fq->lock is not IRQ safe, then this is a possible deadlock.
>
> In fact, I would say that FQ is not the only lock that might get into
> this deadlock.
>
> Possible solutions that come to my mind:
>
> 1) make those lock (fq->lock and TX locks) IRQ safe
And I'm pretty sure the list is not exhaustive.
> * cons: This has network performance penalties, and very intrusive.
> 2) Making printk from IRQs deferred. Calling `printk_deferred_enter` at
> IRQs handlers ?!
It'd only help if the deferred printk doesn't need the
console_lock / doesn't disable irqs.
> * Cons: This will add latency to printk() inside IRQs.
> 3) Create a deferred mechanism inside netconsole, that would buffer and
> defer the TX of the packet to outside of the IRQs.
> a) Basically on netconsole, check if it is being invoke inside an
> IRQ, then buffer the message and it it at Softirq/task context.
> * Cons: this would use extra memory for printks() inside IRQs and also
> latency (netconsole only).
That should work, we basically need to pull xmit out of the
console_lock protected section, and deferring is not a bad option
> Let me add some other developers who might have other opinions and help
> to decide what is the best approach.
--
Pavel Begunkov
Powered by blists - more mailing lists