netdev - Re: [PATCH 3/3] IPVS: init and cleanup restructuring.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 20 Apr 2011 12:41:49 +0200
From:	Hans Schillstrom <hans@...illstrom.com>
To:	Julian Anastasov <ja@....bg>
Cc:	horms@...ge.net.au, ebiederm@...ssion.com,
	lvs-devel@...r.kernel.org, netdev@...r.kernel.org,
	netfilter-devel@...r.kernel.org, hans.schillstrom@...csson.com
Subject: Re: [PATCH 3/3] IPVS: init and cleanup restructuring.

On Wednesday, April 20, 2011 01:19:38 Julian Anastasov wrote:
> 
> 	Hello,
> 
> On Tue, 19 Apr 2011, Hans Schillstrom wrote:
> 
> > This patch tries to restore the initial init and cleanup
> > sequences that was before name space patch.
> > 
> > The number of calls to register_pernet_device have been
> > reduced to one for the ip_vs.ko
> > Schedulers still have their own calls.
> > 
> > This patch adds a function __ip_vs_service_cleanup()
> > and a throttle or actually on/off switch for
> > the netfilter hooks.
> > 
> > The nf hooks will be enabled when the first service is loaded
> > and disabled when the last service is removed or when a
> > name space exit starts.
> 
> 	For me using _net suffix is more clear compared
> to __ prefix for the pernet methods.
> 
Sure I'll do that.

> 	For ip_vs_in: may be we can move the check here:
> 
> +	net = skb_net(skb);
> +	if (net->ipvs->throttle)
> +		return NF_ACCEPT;
>  	ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
>  
>  	/* Bad... Do not break raw sockets */

Done

> 
> 	It will save such checks later in ICMP funcs. But this
> throttle idea looks dangerous for cleanup. It does not use RCU.
> The readers can cache the 0 in throttle for long time.
> May be by using register_pernet_device we are in list with other
> devices and it is still possible some device used by
> our dst_cache to be unregistered before IPVS or we to be
> unregistered before such device and some race with throttle
> to happen. throttle is good when enabling traffic with
> the first virtual service, later it can slowly stop the traffic
> but we can not rely on it during netns cleanup.

OK , I will revert back to register_pernet_subsys(),
and use one single register_pernet_device()  exit hook for :
 - the throttle to disable traffic
 - stop the kernel threads.

> 
> 	So, there are 2 problems with the devices:
> 
> - if we use _device pernet registration we can see packets
> in our netns during cleanup. I assume this is possible
> when IPVS is unregistered before such devices.
> 
> - dests can cache dst and to hold the device after it is
> unregistered in netns, obviously for very short time until
> IPVS is later unregistered from netns. And for long time
> if device is unregistered but netns remains.
> 
> 	Also, in most of the cases svc->refcnt is above 0 because
> dests can be in trash list. You should be lucky to delete the
> service without any connections, only then ip_vs_svc_unhash can
> see refcnt == 0.
> 
> 	So, may be we have to use register_pernet_subsys (not
> _device). We need just to register notifier with
> register_netdevice_notifier and to catch NETDEV_UNREGISTER,
> so that if any dest uses this device we have to release the dst:
> 
I made a quick test of the concept and it seems to work, 
(A mix of new and old connections at 1GB/s into 4 namespaces when killing them all)

> 	- lock mutex
> 	- for every dest (also in trash):
> 	spin_lock_bh(&dest->dst_lock);
> 	if (dest->dst_cache && dest->dst_cache->dev == unregistered_dev)
> 		ip_vs_dst_reset(dest);
> 	spin_unlock_bh(&dest->dst_lock);


Here is a what i did

static inline void
__ip_vs_dev_reset(struct ip_vs_dest *dest, struct net_device *dev)
{
	spin_lock_bh(&dest->dst_lock);
	if (dest->dst_cache && dest->dst_cache->dev == dev)
		ip_vs_dst_reset(dest);
	spin_unlock_bh(&dest->dst_lock);
}

static void ip_vs_svc_reset(struct ip_vs_service *svc, struct net_device *dev)
{
	struct ip_vs_dest *dest;

	list_for_each_entry(dest, &svc->destinations, n_list) {
		__ip_vs_dev_reset(dest, dev);
	}
}

static int ip_vs_dst_event(struct notifier_block *this, unsigned long event,
			    void *ptr)
{
	struct net_device *dev = ptr;
	struct net *net = dev_net(dev);
	struct ip_vs_service *svc;
	struct ip_vs_dest *dest;
	unsigned int idx;

	if (event != NETDEV_UNREGISTER)
		return NOTIFY_DONE;

	EnterFunction(2);
	mutex_lock(&__ip_vs_mutex);
	for (idx = 0; idx<IP_VS_SVC_TAB_SIZE; idx++) {
		write_lock_bh(&__ip_vs_svc_lock);
		list_for_each_entry(svc, &ip_vs_svc_table[idx], s_list) {
			if (net_eq(svc->net, net))
				ip_vs_svc_reset(svc, dev);
		}
		list_for_each_entry(svc, &ip_vs_svc_fwm_table[idx], f_list) {
			if (net_eq(svc->net, net))
				ip_vs_svc_reset(svc, dev);
		}
		write_unlock_bh(&__ip_vs_svc_lock);
	}

	list_for_each_entry(dest, &net_ipvs(net)->dest_trash, n_list) {
		__ip_vs_dev_reset(dest, dev);
	}
	mutex_unlock(&__ip_vs_mutex);
	LeaveFunction(2);
	return NOTIFY_DONE;
}

static struct notifier_block ip_vs_dst_notifier = {
	.notifier_call = ip_vs_dst_event,
};


int __init ip_vs_control_init(void)
...
	at the end
	ret = register_netdevice_notifier(&ip_vs_dst_notifier);
...

and an unregister_ of course.


> 
> 	There are many examples for this, eg. net/core/fib_rules.c
> Then we are sure on cleanup we can not see traffic for our net
> because all devices are unregistered before us. We don't have
> to rely on throttle to stop the traffic during cleanup. And
> we do not hold devices after NETDEV_UNREGISTER.
> 
> 	I can prepare such patch but in next days. We need such
> code anyways because the dests can hold such dsts when no
> traffic is present and we can see again this "waiting for %s" ...
> message.
> 
> 	throttle still can be used but now it can not stop
> the traffic if connections exist.
> 
> 	For __ip_vs_service_cleanup: it still has to use mutex.

OK I will try to use unlock methods, if it doesn't work use the mutex.

> Or we can avoid it by introducing ip_vs_unlink_service_nolock:
> ip_vs_flush will look like your __ip_vs_service_cleanup and
> will call ip_vs_unlink_service_nolock. ip_vs_unlink_service_nolock
> will be called by ip_vs_flush and by ip_vs_unlink_service.
> You can try such changes, if not, I'll prepare some patches
> after 2-3 days.
> 

Regards
Hans
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html