[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1284416694.2627.89.camel@edumazet-laptop>
Date: Tue, 14 Sep 2010 00:24:54 +0200
From: Eric Dumazet <eric.dumazet@...il.com>
To: nicolas.dichtel@...nd.com, David Miller <davem@...emloft.net>
Cc: netdev <netdev@...r.kernel.org>,
Octavian Purdila <opurdila@...acom.com>,
Benjamin LaHaise <bcrl@...ck.org>
Subject: [PATCH] net: use rcu_barrier() in rollback_registered_many
Le vendredi 10 septembre 2010 à 16:24 +0200, Eric Dumazet a écrit :
> Le vendredi 10 septembre 2010 à 15:35 +0200, Nicolas Dichtel a écrit :
> > Hi all,
> >
> > We got a scalability problem when we try to remove a lot of virtual interfaces.
> > After analysis, we found that a refcnt on a device was released too late.
> > Here is a proposal patch. If we are not missing something, the refcnt can be
> > release before call_rcu(). In IPv6, this is already the case.
> >
> > Comments are welcome.
> >
> >
> > Regards,
> > Nicolas
> > pièce jointe différences entre fichiers
> > (0001-ipv4-release-dev-refcnt-early-when-destroying-inetd.patch)
> > From 6fe291ff56b1f94599dfaa57dfb0ed4c168b603f Mon Sep 17 00:00:00 2001
> > From: Nicolas Dichtel <nicolas.dichtel@...nd.com>
> > Date: Fri, 10 Sep 2010 14:52:15 +0200
> > Subject: [PATCH] ipv4: release dev refcnt early when destroying inetdev
> >
> > When a virtual device is removed, refcnt on dev is released
> > after rcu barrier, hence we fall always in the msleep(250)
> > of netdev_wait_allrefs(). This causes a long delay when
> > a lot of interfaces are removed.
> > Refcnt can be released before this rcu barrier, this allows
> > to accelerate the removing of virtual interfaces.
> >
> > Test of removing 50 ipip tunnel interfaces:
> > Before the patch:
> > real 0m12.804s
> > user 0m0.020s
> > sys 0m0.000s
> >
> > After the patch:
> > real 0m0.988s
> > user 0m0.004s
> > sys 0m0.016s
> >
> > Signed-off-by: Wang Xuefu <xuefu.wang@...nd.com>
> > Signed-off-by: Nicolas Dichtel <nicolas.dichtel@...nd.com>
> > ---
>
> This is a well known problem, (many patches were sent some months ago)
> but your patch is not the right solution.
>
> As long as the idev is not yet freed, it can be used and we need to
> access idev->dev
>
>
I believe I understood one problem.
In rollback_registered_many(), we call the inetdev_event() (and
inetdev_destroy() at line 4844 :
call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
Then, we call synchronize_net() at line 4870
So by the time netdev_wait_allrefs() is called, we should have called
in_dev_finish_destroy()
But using synchronize_net() is a bit wrong here :
"It waits until all pre-existing rcu readers have completed."
We have no guarantee all call_rcu() that we posted to dismantle the
device completed :
- If number of online cpus is 1, synchronize_net() is a no op
- If our thread migrates to another cpu, synchronize_net() can returns
while old callbacks are not yet processed.
We should probably use rcu_barrier() instead, to wait for all
outstanding RCU callbacks to complete.
I also believe the order of netdevice notifiers is wrong (we dont set
priority), and that we should call fib_netdev_event() _before_
dst_dev_event(). This needs another patch.
Thanks
[PATCH] net: use rcu_barrier() in rollback_registered_many
netdev_wait_allrefs() waits that all references to a device vanishes.
It currently uses a _very_ pessimistic 250 ms delay between each probe.
Some users reported that no more than 4 devices can be dismantled per
second, this is a pretty serious problem for some setups.
Most of the time, a refcount is about to be released by an RCU callback,
that is still in flight because rollback_registered_many() uses a
synchronize_rcu() call instead of rcu_barrier(). Problem is visible if
number of online cpus is one, because synchronize_rcu() is then a no op.
time to remove 50 ipip tunnels on a UP machine :
before patch : real 11.910s
after patch : real 1.250s
Reported-by: Nicolas Dichtel <nicolas.dichtel@...nd.com>
Reported-by: Octavian Purdila <opurdila@...acom.com>
Reported-by: Benjamin LaHaise <bcrl@...ck.org>
Signed-off-by: Eric Dumazet <eric.dumazet@...il.com>
---
net/core/dev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index fc2dc93..6de5a82 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4867,7 +4867,7 @@ static void rollback_registered_many(struct list_head *head)
dev = list_first_entry(head, struct net_device, unreg_list);
call_netdevice_notifiers(NETDEV_UNREGISTER_BATCH, dev);
- synchronize_net();
+ rcu_barrier();
list_for_each_entry(dev, head, unreg_list)
dev_put(dev);
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists