netdev - Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <hyc64wbklq2mv77ydzfxcqdigsl33leyvebvf264n42m2f3iq5@qgn5lljc4m5y>
Date: Mon, 18 Aug 2025 05:10:24 -0700
From: Breno Leitao <leitao@...ian.org>
To: Pavel Begunkov <asml.silence@...il.com>
Cc: Jakub Kicinski <kuba@...nel.org>, 
	Johannes Berg <johannes@...solutions.net>, Mike Galbraith <efault@....de>, paulmck@...nel.org, 
	LKML <linux-kernel@...r.kernel.org>, netdev@...r.kernel.org, boqun.feng@...il.com
Subject: Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning

On Fri, Aug 15, 2025 at 09:02:27PM +0100, Pavel Begunkov wrote:
> On 8/15/25 18:29, Breno Leitao wrote:
> > On Fri, Aug 15, 2025 at 09:42:17AM -0700, Jakub Kicinski wrote:
> > > On Fri, 15 Aug 2025 11:44:45 +0100 Pavel Begunkov wrote:
> > > > On 8/15/25 01:23, Jakub Kicinski wrote:
> > > 
> > > I suspect disabling netconsole over WiFi may be the most sensible way out.
> > 
> > I believe we might be facing a similar issue with virtio-net.
> > Specifically, any network adapter where TX is not safe to use in IRQ
> > context encounters this problem.
> > 
> > If we want to keep netconsole enabled on all TX paths, a possible
> > solution is to defer the transmission work when netconsole is called
> > inside an IRQ.
> > 
> > The idea is that netconsole first checks if it is running in an IRQ
> > context using in_irq(). If so, it queues the skb without transmitting it
> > immediately and schedules deferred work to handle the transmission
> > later.
> > 
> > A rough implementation could be:
> > 
> > static void send_udp(struct netconsole_target *nt, const char *msg, int len) {
> > 
> > 	/* get the SKB that is already populated, with all the headers
> > 	 * and ready to be sent
> > 	 */
> > 	struct sk_buff = netpoll_get_skb(&nt->np, msg, len);
> > 
> > 	if (in_irq()) {
> 
> It's not just irq handlers but any context that has irqs disabled, and
> since it's nested under irq-safe console_owner it'd need to always be
> deferred or somehow moved out of the console_owner critical section.

Agree. An IRQ-unsafe lock (fq lock) should not be reachable from an IRQ
disabled code path. So, one solution might be to always send TX packets
from a workqueue (unless it is on panic, as suggested by Calvin).

I've created a quick PoC to see how it looks like to always transmit
from a a work queue:

commit 02d0f38c3e435e4349de2fa3ce50fa8841aa0df9
Author: Breno Leitao <leitao@...ian.org>
Date:   Mon Aug 18 04:09:44 2025 -0700

    netpoll: move packet transmission to workqueue for non-blocking output
    
    This patch modifies the netpoll subsystem to perform packet transmission
    asynchronously via a dedicated workqueue, always sending the packet from
    a workqueue. This fix potential deadlock when the network locks are not
    HARDIRQ safe.
    
    Packets generated for transmission are queued on tx_queue and the workqueue
    is scheduled to send them in process context, avoiding sending packets
    directly from atomic context except during oops handling.
    
    This affect only netconsole, given that no other netpoll user uses
    netpoll_send_udp().
    
    Signed-off-by: Breno Leitao <leitao@...ian.org>

diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h
index b5ea9882eda8b..bac9dec8bd3bc 100644
--- a/include/linux/netpoll.h
+++ b/include/linux/netpoll.h
@@ -35,8 +35,8 @@ struct netpoll {
 	bool ipv6;
 	u16 local_port, remote_port;
 	u8 remote_mac[ETH_ALEN];
-	struct sk_buff_head skb_pool;
-	struct work_struct refill_wq;
+	struct sk_buff_head skb_pool, tx_queue;
+	struct work_struct refill_wq, tx_wq;
 };
 
 #define np_info(np, fmt, ...)				\
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 5f65b62346d4e..92a4186cebb83 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -496,6 +496,16 @@ static void push_eth(struct netpoll *np, struct sk_buff *skb)
 		eth->h_proto = htons(ETH_P_IP);
 }
 
+static int netpoll_queue_skb(struct netpoll *np, struct sk_buff *skb)
+{
+	/* Queue at the tail and TX from the head */
+	skb_queue_tail(&np->tx_queue, skb);
+	schedule_work(&np->tx_wq);
+
+	/* TODO: this will need some refactor to update netconsole stats properly */
+	return NET_XMIT_SUCCESS;
+}
+
 int netpoll_send_udp(struct netpoll *np, const char *msg, int len)
 {
 	int total_len, ip_len, udp_len;
@@ -528,7 +538,10 @@ int netpoll_send_udp(struct netpoll *np, const char *msg, int len)
 	push_eth(np, skb);
 	skb->dev = np->dev;
 
-	return (int)netpoll_send_skb(np, skb);
+	if (oops_in_progress)
+		return (int)netpoll_send_skb(np, skb);
+
+	return netpoll_queue_skb(np, skb);
 }
 EXPORT_SYMBOL(netpoll_send_udp);
 
@@ -540,6 +553,23 @@ static void skb_pool_flush(struct netpoll *np)
 	cancel_work_sync(&np->refill_wq);
 	skb_pool = &np->skb_pool;
 	skb_queue_purge_reason(skb_pool, SKB_CONSUMED);
+
+	/* tx_queue must be empty here, given we the pkts were flushed
+	 * after users got disabled.
+	 */
+	if (WARN_ON_ONCE(skb_queue_len(&np->tx_queue)))
+		skb_queue_purge(&np->tx_queue);
+}
+
+static void netpoll_flush_tx_poll(struct work_struct *work)
+{
+	struct sk_buff *skb;
+	struct netpoll *np;
+
+	np = container_of(work, struct netpoll, tx_wq);
+
+	while ((skb = skb_dequeue(&np->tx_queue)))
+		netpoll_send_skb(np, skb);
 }
 
 static void refill_skbs_work_handler(struct work_struct *work)
@@ -557,6 +587,7 @@ int __netpoll_setup(struct netpoll *np, struct net_device *ndev)
 	int err;
 
 	skb_queue_head_init(&np->skb_pool);
+	skb_queue_head_init(&np->tx_queue);
 
 	if (ndev->priv_flags & IFF_DISABLE_NETPOLL) {
 		np_err(np, "%s doesn't support polling, aborting\n",
@@ -596,6 +627,7 @@ int __netpoll_setup(struct netpoll *np, struct net_device *ndev)
 	/* fill up the skb queue */
 	refill_skbs(np);
 	INIT_WORK(&np->refill_wq, refill_skbs_work_handler);
+	INIT_WORK(&np->tx_wq, netpoll_flush_tx_poll);
 
 	/* last thing to do is link it to the net device structure */
 	rcu_assign_pointer(ndev->npinfo, npinfo);
@@ -815,6 +847,8 @@ static void __netpoll_cleanup(struct netpoll *np)
 	if (!npinfo)
 		return;
 
+	cancel_work_sync(&np->tx_wq);
+
 	if (refcount_dec_and_test(&npinfo->refcnt)) {
 		const struct net_device_ops *ops;