lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 05 Mar 2010 11:34:59 -0800 From: Daniel Walker <dwalker@...eaurora.org> To: "David S. Miller" <davem@...emloft.net> Cc: Jim Harford <c_jharfo@...cinc.com>, netdev@...r.kernel.org Subject: [PATCH] net: Fix race condition on receive path. From: Jim Harford <c_jharfo@...cinc.com> Fixes a race condition on the networking receive path that causes all received packets to be dropped after 15-60 minutes of heavy network usage. Function process_backlog() empties the receive queue, re-enables interrupts, then "completes" the softIRQ. This provides a time window for netif_rx() to execute (in IRQ context) and enqueue a received packet without re-scheduling the softIRQ. After this, the receive queue is never processed and the system eventually begins to drop all received packets. The fix has netif_rx() calling napi_schedule(), whether or not the receive queue is empty. There are two questions that must be addressed: 1) Is the fix correct? Or, is it safe to always call napi_schedule()? 2) Does the fix result in a performance degradation? With respect to correctness, the napi softIRQ routine should NOT be scheduled if it is already running. This is controlled by the bit flag NAPI_STATE_SCHED in the "state" field of struct napi_struct. When this flag is set, it means the softIRQ routine is either already pending or currently running. When the softIRQ is finished, the bit is cleared via __napi_complete(), which is called by napi_complete(), which is called by process_backlog() after re-enabling interrupts and just prior to returning. When netif_rx() calls napi_schedule(), that routine checks the NAPI_STATE_SCHED bit flag by calling napi_schedule_prep(), which calls test_and_set_bit(); if the flag is already set, napi_schedule() returns without doing anything. Note that test_and_set_bit() implements an atomic "test and set" operation that is commonly used as the foundation of mutual exclusion mechanisms. So it is safe for netif_rx() to call napi_schedule(), whether or not the receive queue is empty. With respect to performance, the following behavior was observed using an ARM CPU and cross-compiling for ARM using gcc. Removing the test of whether the receive queue is empty saves 2 assembly instructions. The addition of calling napi_schedule(), when that routine does nothing (because the NAPI_STATE_SCHED flag is set) adds 11 assembly instructions. Thus, there is a net addition of 9 assembly instructions. Furthermore, we observe in our testing that even under heavy network load, the receive queue is empty on at least 95% of netif_rx() invocations. The 9 assembly instructions amortized over all netif_rx() invocations yield an average performance penalty of less than 1 assembly instruction per invocation of netif_rx(). Finally, the test-and-set operation does not entail testing common memory shared by multiple CPU cores (which can be costly), because the data structures in question are specific to a single CPU by design. So calling napi_schedule() at every invocation of netif_rx() does not impair performance. CRs-fixed: 184599 Signed-off-by: Jim Harford <c_jharfo@...cinc.com> Signed-off-by: Daniel Walker <dwalker@...eaurora.org> --- net/core/dev.c | 11 +++-------- 1 files changed, 3 insertions(+), 8 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index be9924f..43161c6 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2150,15 +2150,10 @@ int netif_rx(struct sk_buff *skb) __get_cpu_var(netdev_rx_stat).total++; if (queue->input_pkt_queue.qlen <= netdev_max_backlog) { - if (queue->input_pkt_queue.qlen) { -enqueue: - __skb_queue_tail(&queue->input_pkt_queue, skb); - local_irq_restore(flags); - return NET_RX_SUCCESS; - } - + __skb_queue_tail(&queue->input_pkt_queue, skb); napi_schedule(&queue->backlog); - goto enqueue; + local_irq_restore(flags); + return NET_RX_SUCCESS; } __get_cpu_var(netdev_rx_stat).dropped++; -- 1.6.3.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists