netdev - [PATCH net] xen-netback: fix vif tx queue race in xenvif_rx

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <1389209061-29494-1-git-send-email-jieyue.majy@alibaba-inc.com>
Date:	Thu,  9 Jan 2014 03:24:21 +0800
From:	Ma JieYue <majieyue@...il.com>
To:	netdev@...r.kernel.org, xen-devel@...ts.xen.org
Cc:	Ma JieYue <jieyue.majy@...baba-inc.com>,
	Wang Yingbin <yingbin.wangyb@...baba-inc.com>,
	Fu Tienan <tienan.ftn@...baba-inc.com>,
	Wei Liu <wei.liu2@...rix.com>,
	Ian Campbell <ian.campbell@...rix.com>,
	David Vrabel <david.vrabel@...rix.com>
Subject: [PATCH net] xen-netback: fix vif tx queue race in xenvif_rx_interrupt

From: Ma JieYue <jieyue.majy@...baba-inc.com>

There is a race when waking up or stopping xenvif tx queue, and it leads to 
unnecessary packet drop. The problem is that the rx ring still full when entering 
into xenvif_start_xmit. In fact, in xenvif_rx_interrupt, the netif_wake_queue 
may be called not just after the ring is not full any more, so the operation 
is not atomic. Here is part of the debug log when the race scenario happened:

wake_queue: req_cons_peek 2679757 req_cons 2679586 req_prod 2679841
stop_queue: req_cons_peek 2679837 req_cons 2679757 req_prod 2679841
[tx_queue_stopped true]
wake_queue: req_cons_peek 2679837 req_cons 2679757 req_prod 2679841
[tx_queue_stopped false]
drop packet: req_cons_peek 2679837 req_cons 2679757 req_prod 2679841

The debug log was written, every time right after netif_wake_queue been called 
in xenvif_rx_interrupt, every time after netif_stop_queue been called in 
xenvif_start_xmit and every time packet drop happened in xenvif_start_xmit. 
As we can see, the second wake_queue appeared in the place it should not be, and 
we believed the ring had been checked before the stop_queue, but the actual 
wake_queue action didn't follow, and took place after the stop_queue, so that when 
entering into xenvif_start_xmit the ring was full but the queue was not stopped.

The patch fixes the race by checking if tx queue stopped, before trying to 
wake it up in xenvif_rx_interrupt. It only wakes the queue when it is stopped, 
as well as it is not full and schedulable.

Signed-off-by: Ma JieYue <jieyue.majy@...baba-inc.com>
Signed-off-by: Wang Yingbin <yingbin.wangyb@...baba-inc.com>
Signed-off-by: Fu Tienan <tienan.ftn@...baba-inc.com>
Cc: Wei Liu <wei.liu2@...rix.com>
Cc: Ian Campbell <ian.campbell@...rix.com>
Cc: David Vrabel <david.vrabel@...rix.com>
---
 drivers/net/xen-netback/interface.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index fff8cdd..e099f62 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -105,7 +105,7 @@ static irqreturn_t xenvif_rx_interrupt(int irq, void *dev_id)
 {
 	struct xenvif *vif = dev_id;

-	if (xenvif_rx_schedulable(vif))
+	if (netif_queue_stopped(vif->dev) && xenvif_rx_schedulable(vif))
 		netif_wake_queue(vif->dev);

 	return IRQ_HANDLED;
-- 
1.8.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html