lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <7866DA1F8D2D4541B87FEE88E633ABAA2B72081FB6@MNEXMB1.qlogic.org>
Date:	Thu, 5 Aug 2010 09:20:03 -0500
From:	Usha Srinivasan <usha.srinivasan@...gic.com>
To:	"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Receive processing stops when dev->poll returns 1

Hello,
I have run into an interesting and frustrating problem which I've not been able to resolve. I am hoping someone can help me.  

I have a network driver which sets its dev->weight to 100 (like ipoib) and when it processes 100 received packets, following the rules, it decrements dev->quota and *budget and returns 1 without calling netif_rx_complete.  When my driver does that, all processing of incoming packets for all interfaces comes to a halt.  

How do I know this?  Because, as soon as my driver returns 1 to dev->poll, I lose my putty session and eth0 stops working; eth0 counters show that it stops receiving packets, though it is able to transmit.  My own device stops receiving packets.  I have scoured the code for ipoib and other network devices and I see no difference in what my driver does.  I have tried to lower weight for ipoib & eth0 hoping to reproduce with those device it but no luck.

One guess is that net_rx_action spent more than 1 tick processing all the incoming packets for all interfaces it polled; I verified that my driver by itself does not spend that much. When this happens, net_rx_action exits after marking NETIF_RX_SOFTIRQ as pending.  So one would expect it to be called again later, but my guess is that doesn't happen thereby resulting in a stoppage of incoming packets. Is that possible and, if so, what is the fix?

1814 static void net_rx_action(struct softirq_action *h)
1815 {
1816         struct softnet_data *queue = &__get_cpu_var(softnet_data);
1817         unsigned long start_time = jiffies;
1818         int budget = netdev_budget;
1819         void *have;
1820 
1821         local_irq_disable();
1822 
1823         while (!list_empty(&queue->poll_list)) {
1824                 struct net_device *dev;
1825 
1826                 if (budget <= 0 || jiffies - start_time > 1)
1827                         goto softnet_break;
1828 
1829                 local_irq_enable();
1830 
1831                 dev = list_entry(queue->poll_list.next,
1832                                  struct net_device, poll_list);
1833                 have = netpoll_poll_lock(dev);
1834 
1835                 if (dev->quota <= 0 || dev->poll(dev, &budget)) {
1836                         netpoll_poll_unlock(have);
1837                         local_irq_disable();
1838                         list_move_tail(&dev->poll_list, &queue->poll_list);
1839                         if (dev->quota < 0)
1840                                 dev->quota += dev->weight;
1841                         else
1842                                 dev->quota = dev->weight;
1843                 } else {
1844                         netpoll_poll_unlock(have);
1845                         dev_put(dev);
1846                         local_irq_disable();
1847                 }
1848         }
1849 out:
1850         local_irq_enable();
1851         return;
1852 
1853 softnet_break:
1854         __get_cpu_var(netdev_rx_stat).time_squeeze++;
1855         __raise_softirq_irqoff(NET_RX_SOFTIRQ);
1856         goto out;
1857 }
1858

I have run into this problem on four systems running RHEL5, SLES10 or SLES 11.  The above describes what happens in RHEL5/SLES10.  This is different in SLES11, wherein dev->poll has been replaced by netif_napi_add and the poll function returns done without quota/budget manipulation; yet, I run into the same behavior. 

Any help appreciated! Thanks in advance!

Usha

___________________
Usha Srinivasan
Software Engineer
QLogic Corporation
780 5th Ave, Suite A
King of Prussia, PA 19406
(610) 233-4844
(610) 233-4777 (Fax)
(610) 233-4838 (Main Desk)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ