lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 07 Jun 2011 08:50:03 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	David Miller <davem@...emloft.net>
Cc:	heiko.carstens@...ibm.com, blaschka@...ux.vnet.ibm.com,
	netdev@...r.kernel.org, linux-s390@...r.kernel.org
Subject: [PATCH] net: cpu offline cause napi stall

From: Heiko Carstens <heiko.carstens@...ibm.com>

Frank Blaschka reported :
<quote>
  During heavy network load we turn off/on cpus.
  Sometimes this causes a stall on the network device.
  Digging into the dump I found out following:

  napi is scheduled but does not run. From the I/O buffers
  and the napi state I see napi/rx_softirq processing has stopped
  because the budget was reached. napi stays in the
  softnet_data poll_list and the rx_softirq was raised again.

  I assume at this time the cpu offline comes in,
  the rx softirq is raised/moved to another cpu but napi stays in the
  poll_list of the softnet_data of the now offline cpu.

  Reviewing dev_cpu_callback (net/core/dev.c) I did not find the
  poll_list is transfered to the new cpu.
</quote>

This patch is a straightforward implementation of Frank suggestion :

Transfert poll_list and trigger NET_RX_SOFTIRQ on new cpu.

Reported-by: Frank Blaschka <blaschka@...ux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@...ibm.com>
Signed-off-by: Eric Dumazet <eric.dumazet@...il.com>
Tested-by: Eric Dumazet <eric.dumazet@...il.com>
---
While doing my tests on bnx2x adapter, I found patch was working ok,
but /proc/interrupts still increment interrupt count on my offlined
cpu... go figure...

 net/core/dev.c |    5 +++++
 1 files changed, 5 insertions(+)

diff --git a/net/core/dev.c b/net/core/dev.c
index 9393078..095909c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6178,6 +6178,11 @@ static int dev_cpu_callback(struct notifier_block *nfb,
 		oldsd->output_queue = NULL;
 		oldsd->output_queue_tailp = &oldsd->output_queue;
 	}
+	/* Append NAPI poll list from offline CPU. */
+	if (!list_empty(&oldsd->poll_list)) {
+		list_splice_init(&oldsd->poll_list, &sd->poll_list);
+		raise_softirq_irqoff(NET_RX_SOFTIRQ);
+	}
 
 	raise_softirq_irqoff(NET_TX_SOFTIRQ);
 	local_irq_enable();


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists