netdev - NAPI poll behavior in various Intel drivers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <20080104.034036.160194618.davem@davemloft.net>
Date:	Fri, 04 Jan 2008 03:40:36 -0800 (PST)
From:	David Miller <davem@...emloft.net>
To:	netdev@...r.kernel.org
CC:	auke-jan.h.kok@...el.com
Subject: NAPI poll behavior in various Intel drivers

Several Intel networking drivers such as e1000, e1000e
and e100 all do this to exit NAPI polling:

	if ((!tx_cleaned && (work_done == 0)) ||
 	   !netif_running(poll_dev)) {

I tried to make this use in the NAPI rework:

	if ((!tx_cleaned && (work_done < budget)) ||
 	   !netif_running(poll_dev)) {

But that got reverted by:

	commit f7bbb9098315d712351aba7861a8c9fcf6bf0213

	e1000: Fix NAPI state bug when Rx complete

	Don't exit polling when we have not yet used our budget, this causes
	the NAPI system to end up with a messed up poll list.

	Signed-off-by: Auke Kok <auke-jan.h.kok@...el.com>
	Signed-off-by: Jeff Garzik <jeff@...zik.org>

I definitely would not have signed off on that :-)

That "tx_cleaned" thing clouds the logic in all of these driver's
poll routines.

The one necessary precondition is that when work_done < budget
we exit polling and return a value less than budget.

If the ->poll() returns a value less than budget, net_rx_action()
assumes that the device has been removed from the poll() list.

		/* Drivers must not modify the NAPI state if they
		 * consume the entire weight.  In such cases this code
		 * still "owns" the NAPI instance and therefore can
		 * move the instance around on the list at-will.
		 */
		if (unlikely(work == weight))
			list_move_tail(&n->poll_list, list);

This "work_done == 0" test in these drivers, is thus, wrong.  It
should be "work_done < budget" and the whole tx_cleaned thing needs to
be removed.

It happens to work, because what happens is that we loop again and
process the same NAPI struct again.

As a result, E1000 devices get polled TWICE every time they
process at least one RX packet, but do not consume the whole
quota.

I smell a performance hack, and if so this is wrong and against
all of the principles of NAPI.  Either that or it's a workaround
for the "!netif_running()" case.

I noticed this while trying to work on a generic fix for the
"->poll() does not exit when device is brought down while being
bombed with packets" bug.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html