lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 20 Oct 2016 00:49:39 +0200
From:   Willy Tarreau <w@....eu>
To:     linux-kernel@...r.kernel.org, stable@...r.kernel.org
Cc:     Wei Liu <wei.liu2@...rix.com>, Philipp Hahn <hahn@...vention.de>,
        David Vrabel <david.vrabel@...rix.com>,
        Willy Tarreau <w@....eu>
Subject: [PATCH 3.10 15/16] xen-netback: ref count shared rings

From: Wei Liu <wei.liu2@...rix.com>

... so that we can make sure the rings are not freed until all SKBs in
internal queues are consumed.

1. The VM is receiving packets through bonding + bridge + netback +
   netfront.
2. For some unknown reason at least one packet remains in the rx queue
   and is not delivered to the domU immediately by netback.
3. The VM finishes shutting down.
4. The shared ring between dom0 and domU is freed.
5. then xen-netback continues processing the pending requests and tries
   to put the packet into the now already released shared ring.

> XXXlan0: port 9(vif26.0) entered disabled state
> BUG: unable to handle kernel paging request at ffffc900108641d8
> IP: [<ffffffffa04147dc>] xen_netbk_rx_action+0x18b/0x6f0 [xen_netback]
> PGD 57e20067 PUD 57e21067 PMD 571a7067 PTE 0
> Oops: 0000 [#1] SMP
> ...
> CPU: 0 PID: 12587 Comm: netback/0 Not tainted 3.10.0-ucs58-amd64 #1 Debian 3.10.11-1.58.201405060908
> Hardware name: FUJITSU PRIMERGY BX620 S6/D3051, BIOS 080015 Rev.3C78.3051 07/22/2011
> task: ffff880004b067c0 ti: ffff8800561ec000 task.ti: ffff8800561ec000
> RIP: e030:[<ffffffffa04147dc>]  [<ffffffffa04147dc>] xen_netbk_rx_action+0x18b/0x6f0 [xen_netback]
> RSP: e02b:ffff8800561edce8  EFLAGS: 00010202
> RAX: ffffc900104adac0 RBX: ffff8800541e95c0 RCX: ffffc90010864000
> RDX: 000000000000003b RSI: 0000000000000000 RDI: ffff880040014380
> RBP: ffff8800570e6800 R08: 0000000000000000 R09: ffff880004799800
> R10: ffffffff813ca115 R11: ffff88005e4fdb08 R12: ffff880054e6f800
> R13: ffff8800561edd58 R14: ffffc900104a1000 R15: 0000000000000000
> FS:  00007f19a54a8700(0000) GS:ffff88005da00000(0000) knlGS:0000000000000000
> CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: ffffc900108641d8 CR3: 0000000054cb3000 CR4: 0000000000002660
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Stack:
>  ffff880004b06ba0 0000000000000000 ffff88005da13ec0 ffff88005da13ec0
>  0000000004b067c0 ffffc900104a8ac0 ffffc900104a1020 000000005da13ec0
>  0000000000000000 0000000000000001 ffffc900104a8ac0 ffffc900104adac0
> Call Trace:
>  [<ffffffff813ca32d>] ? _raw_spin_lock_irqsave+0x11/0x2f
>  [<ffffffffa0416033>] ? xen_netbk_kthread+0x174/0x841 [xen_netback]
>  [<ffffffff8105d373>] ? wake_up_bit+0x20/0x20
>  [<ffffffffa0415ebf>] ? xen_netbk_tx_build_gops+0xce8/0xce8 [xen_netback]
>  [<ffffffff8105cd73>] ? kthread_freezable_should_stop+0x56/0x56
>  [<ffffffffa0415ebf>] ? xen_netbk_tx_build_gops+0xce8/0xce8 [xen_netback]
>  [<ffffffff8105ce1e>] ? kthread+0xab/0xb3
>  [<ffffffff81003638>] ? xen_end_context_switch+0xe/0x1c
>  [<ffffffff8105cd73>] ? kthread_freezable_should_stop+0x56/0x56
>  [<ffffffff813cfbfc>] ? ret_from_fork+0x7c/0xb0
>  [<ffffffff8105cd73>] ? kthread_freezable_should_stop+0x56/0x56
> Code: 8b b3 d0 00 00 00 48 8b bb d8 00 00 00 0f b7 74 37 02 89 70 08 eb 07 c7 40 08 00 00 00 00 89 d2 c7 40 04 00 00 00 00 48 83 c2 08 <0f> b7 34 d1 89 30 c7 44 24 60 00 00 00 00 8b 44 d1 04 89 44 24
> RIP  [<ffffffffa04147dc>] xen_netbk_rx_action+0x18b/0x6f0 [xen_netback]
>  RSP <ffff8800561edce8>
> CR2: ffffc900108641d8

Track the shared ring buffer being unmapped and drop those packets.

Ref-count the rings as followed:
  map         -> set to 1
   start_xmit -> inc when queueing SKB to internal queue
   rx_action  -> dec after finishing processing a SKB
  unmap       -> dec and wait to be 0

Note that this is different from ref counting the vif structure itself.
Currently only guest Rx path is taken care of because that's where the
bug surfaced.

This bug doesn't exist in kernel >=3.12 as multi-queue support was added
there.

Link: <https://lists.xenproject.org/archives/html/xen-devel/2014-06/msg00818.html>
Signed-off-by: Wei Liu <wei.liu2@...rix.com>
Signed-off-by: Philipp Hahn <hahn@...vention.de>
Cc: David Vrabel <david.vrabel@...rix.com>
Tested-by: Philipp Hahn <hahn@...vention.de>
Signed-off-by: Willy Tarreau <w@....eu>
---
 drivers/net/xen-netback/common.h    |  4 ++++
 drivers/net/xen-netback/interface.c | 17 +++++++++++++++--
 drivers/net/xen-netback/netback.c   |  6 ++++++
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index f2faa77..dd6b7c3 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -66,6 +66,8 @@ struct xenvif {
 	/* The shared rings and indexes. */
 	struct xen_netif_tx_back_ring tx;
 	struct xen_netif_rx_back_ring rx;
+	atomic_t ring_refcnt;
+	wait_queue_head_t waiting_to_unmap;
 
 	/* Frontend feature information. */
 	u8 can_sg:1;
@@ -120,6 +122,8 @@ void xenvif_free(struct xenvif *vif);
 
 void xenvif_get(struct xenvif *vif);
 void xenvif_put(struct xenvif *vif);
+void xenvif_get_rings(struct xenvif *vif);
+void xenvif_put_rings(struct xenvif *vif);
 
 int xenvif_xenbus_init(void);
 
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 540a796..7e3817a 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -44,12 +44,23 @@ void xenvif_get(struct xenvif *vif)
 	atomic_inc(&vif->refcnt);
 }
 
+void xenvif_get_rings(struct xenvif *vif)
+{
+	atomic_inc(&vif->ring_refcnt);
+}
+
 void xenvif_put(struct xenvif *vif)
 {
 	if (atomic_dec_and_test(&vif->refcnt))
 		wake_up(&vif->waiting_to_free);
 }
 
+void xenvif_put_rings(struct xenvif *vif)
+{
+	if (atomic_dec_and_test(&vif->ring_refcnt))
+		wake_up(&vif->waiting_to_unmap);
+}
+
 int xenvif_schedulable(struct xenvif *vif)
 {
 	return netif_running(vif->dev) && netif_carrier_ok(vif->dev);
@@ -91,6 +102,7 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	/* Reserve ring slots for the worst-case number of fragments. */
 	vif->rx_req_cons_peek += xen_netbk_count_skb_slots(vif, skb);
 	xenvif_get(vif);
+	xenvif_get_rings(vif);
 
 	if (vif->can_queue && xen_netbk_must_stop_queue(vif))
 		netif_stop_queue(dev);
@@ -271,6 +283,7 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	vif->dev = dev;
 	INIT_LIST_HEAD(&vif->schedule_list);
 	INIT_LIST_HEAD(&vif->notify_list);
+	init_waitqueue_head(&vif->waiting_to_unmap);
 
 	vif->credit_bytes = vif->remaining_credit = ~0UL;
 	vif->credit_usec  = 0UL;
@@ -365,12 +378,12 @@ void xenvif_disconnect(struct xenvif *vif)
 	if (netif_carrier_ok(vif->dev))
 		xenvif_carrier_off(vif);
 
+	disable_irq(vif->irq);
+	xen_netbk_unmap_frontend_rings(vif);
 	if (vif->irq) {
 		unbind_from_irqhandler(vif->irq, vif);
 		vif->irq = 0;
 	}
-
-	xen_netbk_unmap_frontend_rings(vif);
 }
 
 void xenvif_free(struct xenvif *vif)
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 70b830f..1595f81 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -814,6 +814,7 @@ static void xen_netbk_rx_action(struct xen_netbk *netbk)
 			xenvif_put(vif);
 		npo.meta_cons += sco->meta_slots_used;
 		dev_kfree_skb(skb);
+		xenvif_put_rings(vif);
 	}
 
 	list_for_each_entry_safe(vif, tmp, &notify, notify_list) {
@@ -1864,6 +1865,9 @@ static int xen_netbk_kthread(void *data)
 
 void xen_netbk_unmap_frontend_rings(struct xenvif *vif)
 {
+	atomic_dec(&vif->ring_refcnt);
+	wait_event(vif->waiting_to_unmap, atomic_read(&vif->ring_refcnt) == 0);
+
 	if (vif->tx.sring)
 		xenbus_unmap_ring_vfree(xenvif_to_xenbus_device(vif),
 					vif->tx.sring);
@@ -1882,6 +1886,8 @@ int xen_netbk_map_frontend_rings(struct xenvif *vif,
 
 	int err = -ENOMEM;
 
+	atomic_set(&vif->ring_refcnt, 1);
+
 	err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(vif),
 				     tx_ring_ref, &addr);
 	if (err)
-- 
2.8.0.rc2.1.gbe9624a

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ