netdev - [PATCH 2/2] netxen: Add sanity checks for Rx buffers returning from hardware

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1387257753-18676-3-git-send-email-david@gibson.dropbear.id.au>
Date:	Tue, 17 Dec 2013 16:22:33 +1100
From:	David Gibson <david@...son.dropbear.id.au>
To:	manish.chopra@...gic.com, sony.chacko@...gic.com,
	rajesh.borundia@...gic.com
Cc:	netdev@...r.kernel.org, snagarka@...hat.com, tcamuso@...hat.com,
	vdasgupt@...hat.com, David Gibson <david@...son.dropbear.id.au>
Subject: [PATCH 2/2] netxen: Add sanity checks for Rx buffers returning from hardware

At Red Hat we've seen a couple of customer cases where the kernel has
crashed due to corruption of the Rx buffer free lists in the netxen driver.

>From the information we've had so far, we haven't been able to locate the
bug.  Our working theory is that due to either a driver, hardware or
firmware bug, we very occasionally process Rx descriptors which have the
wrong buffer ref_handle in them.  This is consistent with all the details
we've seen of the two crashes we've seen.

More specifically here's what we suspect happened:

Case 1:
  netxen_process_rcv_ring()
    netxen_process_rcv() on buffer X (correctly)
      list_add_tail(buffer X, sds_ring free list)

    netxen_process_rcv() on buffer Y (correctly)
      list_add_tail(buffer Y, sds_ring free list)

    netxen_process_rcv() on buffer X            <- wrong index
      list_add_tail(buffer X, sds_ring free list) <- list corrupted

    netxen_merge_rx_buffers()
      list_splice() sds_ring free list to rds_ring free_list

    netxen_post_rx_buffers_nodb()
      BUG on list_del() due to bad prev pointer in buffer X

Case 2:
  CPU0                      CPU1
  netxen_process_rcv_list() on sds_ring A
    netxen_process_rcv() on buffer X (correctly)
    list_add_tail(buffer X, sds_ring A free_list)

                            netxen_process_rcv_list() on sds_ring B
                              netxen_process_rcv() buffer X   <- wrong
                              list_add_tail(buffer X, sds_ring B list)
                            [...]
                              netxen_post_rx_buffers_nodb()
                                list_del(buffer X)   <- buf X poisoned

    for_each on sds_ring A free list <- crash due to dereferencing
                                        poisoned pointers in buffer X

Because the bug is triggered very rarely, and we have no known
reproducer, it's difficult to pin down exactly what's going wrong.
This patch tries to improve the situation by adding some (fairly cheap)
sanity checks when we process an Rx buffer to make sure it's not
already on someone's free list.

Signed-off-by: David Gibson <david@...son.dropbear.id.au>
---
 .../net/ethernet/qlogic/netxen/netxen_nic_init.c   | 37 ++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c b/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c
index 68658b8..edbdf4b 100644
--- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c
+++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c
@@ -1531,6 +1531,39 @@ no_skb:
 	return skb;
 }
 
+static int netxen_check_rx_buf(int index, struct netxen_rx_buffer *rxbuf)
+{
+	int ret = 0;
+
+	if ((rxbuf->list.next != LIST_POISON1)
+	    || (rxbuf->list.prev != LIST_POISON2)) {
+		printk(KERN_ERR "netxen: Rx buffer is already on free list "
+		       "(next=%p, prev=%p)\n", rxbuf->list.next,
+		       rxbuf->list.prev);
+		ret = -1;
+	}
+
+	if (rxbuf->ref_handle != index) {
+		printk(KERN_ERR "netxen: Rx buffer has bad ref_handle "
+		       "(%d instead of %d)\n", rxbuf->ref_handle, index);
+		ret = -1;
+	}
+
+	if (rxbuf->state != NETXEN_BUFFER_BUSY) {
+		printk(KERN_ERR "netxen: Rx buffer has bad state %d\n",
+		       rxbuf->state);
+		ret = -1;
+	}
+
+	if (!rxbuf->skb) {
+		printk(KERN_ERR "netxen: Rx buffer has no skb\n",
+		       rxbuf->skb);
+		ret = -1;
+	}
+
+	return ret;
+}
+
 static struct netxen_rx_buffer *
 netxen_process_rcv(struct netxen_adapter *adapter,
 		struct nx_host_sds_ring *sds_ring,
@@ -1553,6 +1586,8 @@ netxen_process_rcv(struct netxen_adapter *adapter,
 		return NULL;
 
 	buffer = &rds_ring->rx_buf_arr[index];
+	if (WARN_ON(netxen_check_rx_buf(index, buffer) != 0))
+		return NULL;
 
 	length = netxen_get_sts_totallength(sts_data0);
 	cksum  = netxen_get_sts_status(sts_data0);
@@ -1614,6 +1649,8 @@ netxen_process_lro(struct netxen_adapter *adapter,
 		return NULL;
 
 	buffer = &rds_ring->rx_buf_arr[index];
+	if (WARN_ON(netxen_check_rx_buf(index, buffer) != 0))
+		return NULL;
 
 	timestamp = netxen_get_lro_sts_timestamp(sts_data0);
 	lro_length = netxen_get_lro_sts_length(sts_data0);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html