netdev - [PATCH] net: use rcu_barrier() in rollback_registered

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 14 Sep 2010 00:24:54 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	nicolas.dichtel@...nd.com, David Miller <davem@...emloft.net>
Cc:	netdev <netdev@...r.kernel.org>,
	Octavian Purdila <opurdila@...acom.com>,
	Benjamin LaHaise <bcrl@...ck.org>
Subject: [PATCH] net: use rcu_barrier() in rollback_registered_many

Le vendredi 10 septembre 2010 à 16:24 +0200, Eric Dumazet a écrit : 
> Le vendredi 10 septembre 2010 à 15:35 +0200, Nicolas Dichtel a écrit :
> > Hi all,
> > 
> > We got a scalability problem when we try to remove a lot of virtual interfaces. 
> > After analysis, we found that a refcnt on a device was released too late.
> > Here is a proposal patch. If we are not missing something, the refcnt can be 
> > release before call_rcu(). In IPv6, this is already the case.
> > 
> > Comments are welcome.
> > 
> > 
> > Regards,
> > Nicolas
> > pièce jointe différences entre fichiers
> > (0001-ipv4-release-dev-refcnt-early-when-destroying-inetd.patch)
> > From 6fe291ff56b1f94599dfaa57dfb0ed4c168b603f Mon Sep 17 00:00:00 2001
> > From: Nicolas Dichtel <nicolas.dichtel@...nd.com>
> > Date: Fri, 10 Sep 2010 14:52:15 +0200
> > Subject: [PATCH] ipv4: release dev refcnt early when destroying inetdev
> > 
> > When a virtual device is removed, refcnt on dev is released
> > after rcu barrier, hence we fall always in the msleep(250)
> > of netdev_wait_allrefs(). This causes a long delay when
> > a lot of interfaces are removed.
> > Refcnt can be released before this rcu barrier, this allows
> > to accelerate the removing of virtual interfaces.
> > 
> > Test of removing 50 ipip tunnel interfaces:
> >  Before the patch:
> >   real    0m12.804s
> >   user    0m0.020s
> >   sys     0m0.000s
> > 
> >  After the patch:
> >   real    0m0.988s
> >   user    0m0.004s
> >   sys     0m0.016s
> > 
> > Signed-off-by: Wang Xuefu <xuefu.wang@...nd.com>
> > Signed-off-by: Nicolas Dichtel <nicolas.dichtel@...nd.com>
> > ---
> 
> This is a well known problem, (many patches were sent some months ago)
> but your patch is not the right solution.
> 
> As long as the idev is not yet freed, it can be used and we need to
> access idev->dev
> 
> 

I believe I understood one problem.

In rollback_registered_many(), we call the inetdev_event() (and
inetdev_destroy() at line 4844 :

call_netdevice_notifiers(NETDEV_UNREGISTER, dev);

Then, we call synchronize_net() at line 4870

So by the time netdev_wait_allrefs() is called, we should have called
in_dev_finish_destroy() 

But using synchronize_net() is a bit wrong here : 

	"It waits until all pre-existing rcu readers have completed."

We have no guarantee all call_rcu() that we posted to dismantle the
device completed :

- If number of online cpus is 1, synchronize_net() is a no op
- If our thread migrates to another cpu, synchronize_net() can returns
  while old callbacks are not yet processed.

We should probably use rcu_barrier() instead, to wait for all
outstanding RCU callbacks to complete.

I also believe the order of netdevice notifiers is wrong (we dont set
priority), and that we should call fib_netdev_event() _before_
dst_dev_event(). This needs another patch.

Thanks

[PATCH] net: use rcu_barrier() in rollback_registered_many

netdev_wait_allrefs() waits that all references to a device vanishes.

It currently uses a _very_ pessimistic 250 ms delay between each probe.
Some users reported that no more than 4 devices can be dismantled per
second, this is a pretty serious problem for some setups.

Most of the time, a refcount is about to be released by an RCU callback,
that is still in flight because rollback_registered_many() uses a
synchronize_rcu() call instead of rcu_barrier(). Problem is visible if
number of online cpus is one, because synchronize_rcu() is then a no op.

time to remove 50 ipip tunnels on a UP machine :

before patch : real 11.910s
after patch : real 1.250s

Reported-by: Nicolas Dichtel <nicolas.dichtel@...nd.com>
Reported-by: Octavian Purdila <opurdila@...acom.com>
Reported-by: Benjamin LaHaise <bcrl@...ck.org>
Signed-off-by: Eric Dumazet <eric.dumazet@...il.com>
---
 net/core/dev.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index fc2dc93..6de5a82 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4867,7 +4867,7 @@ static void rollback_registered_many(struct list_head *head)
 	dev = list_first_entry(head, struct net_device, unreg_list);
 	call_netdevice_notifiers(NETDEV_UNREGISTER_BATCH, dev);
 
-	synchronize_net();
+	rcu_barrier();
 
 	list_for_each_entry(dev, head, unreg_list)
 		dev_put(dev);


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html