[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4CA21BC6.5070300@6wind.com>
Date: Tue, 28 Sep 2010 18:45:58 +0200
From: Nicolas Dichtel <nicolas.dichtel@...nd.com>
To: Eric Dumazet <eric.dumazet@...il.com>
CC: netdev <netdev@...r.kernel.org>,
Octavian Purdila <opurdila@...acom.com>
Subject: Re: [PATCH] ipv4: remove all rt cache entries on UNREGISTER event
Eric Dumazet wrote:
> Le mardi 28 septembre 2010 à 17:24 +0200, Nicolas Dichtel a écrit :
>> Hi,
>>
>> I face a problem when I try to remove an interface,
>> netdev_wait_allrefs() complains about refcount.
>>
>> Here is a trivial scenario to reproduce the problem:
>> # ip tunnel add mode ipip remote 10.16.0.164 local 10.16.0.72 dev eth0
>> # ./a.out tunl1
>> # ip tunnel del tunl1
>>
>> Note: a.out binary create an IPv4 raw socket, attach it to tunl1
>> (SO_BINDTODEVICE), set it as multicast (IP_MULTICAST_LOOP), set the
>> multicast interface to tunl1 (IP_MULTICAST_IF), build the IP header
>> (IP_HDRINCL) and then send a single packet (192.168.6.1 -> 224.0.0.18).
>>
>> Note2: when a.out is executed, tunl1 has no ip address and is down.
>>
>
> CC Octavian Purdila, the patch author.
>
> I am just wondering why this route is created in the first place.
At first, I asked myself the same question, but it seems that this is
allowed to send a packet through this kind of socket, even if interface
is down. Packet will be destroyed by the noop qdisk.
But I agree that it is strange to perform route lookup and everything to
destroy the packet at the end ...
Maybe raw_sendmsg() can delete it directly ;-) ... or maybe
ip_route_output_flow().
Any suggestions welcome.
Regards,
Nicolas
>
> Maybe a fix would be to forbid this ?
>
> Some machines have a giant route cache, so its very important to avoid
> expensive scans.
>
>> Then, I got a serie of "kernel:[1206699.728010] unregister_netdevice:
>> waiting for tunl1 to become free. Usage count = 3" and after some time,
>> interface is removed.
>>
>> The problem is that route cache entries are only invalidate on
>> UNREGISTER event, and not removed (introduced by commit
>> e2ce146848c81af2f6d42e67990191c284bf0c33). We must wait that
>> rt_check_expire() remove the remaining route cache entries.
>>
>> To fix the problem, I propose to remove a part of the previous commit.
>>
>> Regards,
>> Nicolas
>> pièce jointe différences entre fichiers
>> (0001-ipv4-remove-all-rt-cache-entries-on-UNREGISTER-even.patch)
>> From 3344e2e0431fe803c4dac8757a8746908357d780 Mon Sep 17 00:00:00 2001
>> From: Nicolas Dichtel <nicolas.dichtel@...nd.com>
>> Date: Tue, 28 Sep 2010 16:38:19 +0200
>> Subject: [PATCH] ipv4: remove all rt cache entries on UNREGISTER event
>>
>> Commit e2ce146848c81af2f6d42e67990191c284bf0c33 (ipv4: factorize cache clearing
>> for batched unregister operations) add a new parameter to fib_disable_ip() to
>> only invalidate route cache entries on unregister event.
>> This is wrong, we should ensure that all cache entries are removed on
>> unregister event, else netdev_wait_allrefs() may complain. A cache entry
>> can be created between event DOWN and UNREGISTER.
>>
>> So, I revert a part of the patch.
>>
>> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@...nd.com>
>> ---
>> net/ipv4/fib_frontend.c | 10 +++++-----
>> 1 files changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
>> index 7d02a9f..377e815 100644
>> --- a/net/ipv4/fib_frontend.c
>> +++ b/net/ipv4/fib_frontend.c
>> @@ -917,11 +917,11 @@ static void nl_fib_lookup_exit(struct net *net)
>> net->ipv4.fibnl = NULL;
>> }
>>
>> -static void fib_disable_ip(struct net_device *dev, int force, int delay)
>> +static void fib_disable_ip(struct net_device *dev, int force)
>> {
>> if (fib_sync_down_dev(dev, force))
>> fib_flush(dev_net(dev));
>> - rt_cache_flush(dev_net(dev), delay);
>> + rt_cache_flush(dev_net(dev), 0);
>> arp_ifdown(dev);
>> }
>>
>> @@ -944,7 +944,7 @@ static int fib_inetaddr_event(struct notifier_block *this, unsigned long event,
>> /* Last address was deleted from this interface.
>> Disable IP.
>> */
>> - fib_disable_ip(dev, 1, 0);
>> + fib_disable_ip(dev, 1);
>> } else {
>> rt_cache_flush(dev_net(dev), -1);
>> }
>> @@ -959,7 +959,7 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo
>> struct in_device *in_dev = __in_dev_get_rtnl(dev);
>>
>> if (event == NETDEV_UNREGISTER) {
>> - fib_disable_ip(dev, 2, -1);
>> + fib_disable_ip(dev, 2);
>> return NOTIFY_DONE;
>> }
>>
>> @@ -977,7 +977,7 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo
>> rt_cache_flush(dev_net(dev), -1);
>> break;
>> case NETDEV_DOWN:
>> - fib_disable_ip(dev, 0, 0);
>> + fib_disable_ip(dev, 0);
>> break;
>> case NETDEV_CHANGEMTU:
>> case NETDEV_CHANGE:
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists