lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 19 Sep 2007 10:35:08 -0700 (PDT)
From:	David Miller <davem@...emloft.net>
To:	netdev@...r.kernel.org
Cc:	shemminger@...ux-foundation.org
Subject: Re: new NAPI quota synchronization issue

From: David Miller <davem@...emloft.net>
Date: Wed, 19 Sep 2007 09:58:25 -0700 (PDT)

> Probably a good way to deal with this is to simply make the quota
> handling stateless.
> 
> The only reason we handle partial quota usage is to handle the
> global netdev_budget.  But we can simply "round up" and let
> netdev_budget get oversubscribed by one napi->quota's worth
> if necessary.
> 
> At that point, n->quota only holds two states when used, full
> and empty.  And at that point there is no reason to maintain
> it's value at all.  Only the weight is necessary.
> 
> I'll try to post a patch which implements this later today.

Ok, here is the patch and I've checked it into net-2.6.24 as well.

There really shouldn't be any fundamental synchronization issues
in the new NAPI stuff any longer.  I'm pretty sure any problems
remaining can only be caused by drivers bugs but we'll see :-)

I went over the list handling several times and it looks bulletproof.
Only the thread of control that sets the NAPI_STATE_SCHED bit
atomically will do list modifications, until the thread of control
that decides unconditionally to clear the bit, which will do a
list_del() immediately before clearing that bit.

commit d97459caa5dc97b5da0b9be1ec3f107f3c58d7f9
Author: David S. Miller <davem@...set.davemloft.net>
Date:   Wed Sep 19 10:31:58 2007 -0700

    [NAPI]: Make quota management stateless.
    
    Because we update the napi->quota after returning from
    napi->poll() we have races which can, among other things,
    allow napi->quota to go negative.
    
    For example, if the driver uses the NAPI resched mechanism
    it typically does a completion like this:
    
    	netif_rx_complete(dev, napi);
    	if (unlikely(more_work_showed_up(priv))) {
    		if (netif_rx_reschedule(dev, napi))
    			goto poll_more;
    	}
    	return work_done;
    
    Between the netif_rx_complete() and the netif_rx_reschedule()
    an interrupt on another cpu can schedule the NAPI.  Which is
    fine and handled by the checking of the netif_rx_reschedule()
    return value, but when it happens:
    
    1) The other cpu can do a rull ->poll() run, and update the
       napi->quota
    2) The current thread of execution returns and updates
       napi->quota too
    
    So #1 uses a not-updated napi->quota value, and #2 over
    subtracts from napi->quota.
    
    In short napi->quota access is not synchronized enough.
    
    The good news it that we don't really need it in the first
    place.  The only time we can have partial napi->quota
    updates is when we are trying to adhere to netdev_budget
    in the polling loop of net_rx_action().
    
    We can allow a slight oversubscription of netdev_budget,
    but at most one napi->weight, and that is harmless.
    
    Given that, napi->quota takes on only two values when used,
    the full napi->weight and zero.  Therefore it is entirely
    superfluous and we can always pass napi->weight down to
    the ->poll() routine, and kill off napi->quota and all the
    synchronization problems.
    
    Signed-off-by: David S. Miller <davem@...emloft.net>

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index bc88e4c..cf89ce6 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -294,7 +294,6 @@ struct napi_struct {
 
 	unsigned long		state;
 	int			weight;
-	int			quota;
 	int			(*poll)(struct napi_struct *, int);
 #ifdef CONFIG_NETPOLL
 	spinlock_t		poll_lock;
diff --git a/net/core/dev.c b/net/core/dev.c
index 471e59d..0b9f82e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2107,13 +2107,6 @@ static int process_backlog(struct napi_struct *napi, int quota)
 	return work;
 }
 
-static void napi_check_quota_bug(struct napi_struct *n)
-{
-	/* It is illegal to consume more than the given quota.  */
-	if (WARN_ON_ONCE(n->quota < 0))
-		n->quota = 0;
-}
-
 /**
  * __napi_schedule - schedule for receive
  * @napi: entry to schedule
@@ -2124,9 +2117,6 @@ void fastcall __napi_schedule(struct napi_struct *n)
 {
 	unsigned long flags;
 
-	napi_check_quota_bug(n);
-	n->quota = n->weight;
-
 	local_irq_save(flags);
 	list_add_tail(&n->poll_list, &__get_cpu_var(softnet_data).poll_list);
 	__raise_softirq_irqoff(NET_RX_SOFTIRQ);
@@ -2146,6 +2136,7 @@ static void net_rx_action(struct softirq_action *h)
 
 	while (!list_empty(list)) {
 		struct napi_struct *n;
+		int work;
 
 		/* If softirq window is exhuasted then punt.
 		 *
@@ -2168,27 +2159,21 @@ static void net_rx_action(struct softirq_action *h)
 
 		have = netpoll_poll_lock(n);
 
-		/* if quota not exhausted process work */
-		if (likely(n->quota > 0)) {
-			int work = n->poll(n, min(budget, n->quota));
+		work = n->poll(n, n->weight);
 
-			budget -= work;
-			n->quota -= work;
-		}
+		WARN_ON_ONCE(work > n->weight);
 
-		local_irq_disable();
+		budget -= work;
 
-		napi_check_quota_bug(n);
+		local_irq_disable();
 
 		/* Drivers must not modify the NAPI state if they
-		 * consume the entire quota.  In such cases this code
+		 * consume the entire weight.  In such cases this code
 		 * still "owns" the NAPI instance and therefore can
 		 * move the instance around on the list at-will.
 		 */
-		if (unlikely(!n->quota)) {
-			n->quota = n->weight;
+		if (unlikely(work == n->weight))
 			list_move_tail(&n->poll_list, list);
-		}
 
 		netpoll_poll_unlock(have);
 	}
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ