linux-kernel - Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <001d822f-2f78-4ba5-b29f-23ec1813d3d4@gmail.com>
Date: Thu, 14 Aug 2025 16:45:19 +0100
From: Pavel Begunkov <asml.silence@...il.com>
To: Breno Leitao <leitao@...ian.org>, Mike Galbraith <efault@....de>,
 paulmck@...nel.org, kuba@...nel.org
Cc: LKML <linux-kernel@...r.kernel.org>, netdev@...r.kernel.org,
 boqun.feng@...il.com
Subject: Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning

On 8/14/25 11:16, Breno Leitao wrote:
> Hello Mike,
> 
> On Wed, Aug 13, 2025 at 06:14:36AM +0200, Mike Galbraith wrote:
>> [  107.984942] Chain exists of:
>>                   console_owner --> target_list_lock --> &fq->lock
>>
>> [  107.984947]  Possible interrupt unsafe locking scenario:
>>
>> [  107.984948]        CPU0                    CPU1
>> [  107.984949]        ----                    ----
>> [  107.984950]   lock(&fq->lock);
>> [  107.984952]                                local_irq_disable();
>> [  107.984952]                                lock(console_owner);
>> [  107.984954]                                lock(target_list_lock);
> 
> Thanks for the report. I _think_ I understand the problem, it should be
> easier to see it while thinking about a single CPU:
> 
>   1) lock(&fq->lock); 			// This is not hard irq safe log
>   2) IRQ					// IRQ hits the while the lock is held
>   2.1) printk() 				// WARNs and printk can in fact happen during IRQs
>   2.2) netconsole subsystem 		/// target_list_lock is not important and can be ignored
>   2.2) netpoll 				// net poll will call the network subsystem to send the packet
>   2.3) lock(&fq->lock);			// Try to get the lock while the lock was already held
>   3) Dead lock!
> 
> Given fq->lock is not IRQ safe, then this is a possible deadlock.
> 
> In fact, I would say that FQ is not the only lock that might get into
> this deadlock.
> 
> Possible solutions that come to my mind:
> 
> 1) make those lock (fq->lock and TX locks) IRQ safe

And I'm pretty sure the list is not exhaustive.

>   * cons: This has network performance penalties, and very intrusive.
> 2) Making printk from IRQs deferred. Calling `printk_deferred_enter` at
>     IRQs handlers ?!

It'd only help if the deferred printk doesn't need the
console_lock / doesn't disable irqs.

>   * Cons: This will add latency to printk() inside IRQs.
> 3) Create a deferred mechanism inside netconsole, that would buffer and
>     defer the TX of the packet to outside of the IRQs.
>     a) Basically on netconsole, check if it is being invoke inside an
>     IRQ, then buffer the message and it it at Softirq/task context.
>   * Cons: this would use extra memory for printks() inside IRQs and also
>     latency (netconsole only).

That should work, we basically need to pull xmit out of the
console_lock protected section, and deferring is not a bad option

> Let me add some other developers who might have other opinions and help
> to decide what is the best approach.

-- 
Pavel Begunkov