netdev - Re: [PATCH] igmp: make /proc/net/{igmp,mcfilter} per netns

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <48C9522B.8080002@fr.ibm.com>
Date:	Thu, 11 Sep 2008 19:15:23 +0200
From:	Daniel Lezcano <dlezcano@...ibm.com>
To:	David Stevens <dlstevens@...ibm.com>
CC:	Alexey Dobriyan <adobriyan@...il.com>,
	containers@...ts.linux-foundation.org, davem@...emloft.net,
	netdev@...r.kernel.org
Subject: Re: [PATCH] igmp: make /proc/net/{igmp,mcfilter} per netns

David Stevens wrote:
> As I've said before, I really don't like the model you're
> using for multicasting here (if I understand correctly, and
> I shamelessly admit I haven't looked at this code in detail).

Hi David,

Sorry for the delay.

> As I understand it, you're modelling the multiple virtual interfaces
> as different pieces of hardware on the same physical network.

Exact. The network namespace acts at the layer 2 level. The network 
resources are isolated and accessed relatively from the namespace 
instead of a global static variable. For example, the network device 
list is per namespace as well as the loopback.

The network devices belong to a specific namespace and can not be used, 
neither seen from another namespace. How the network namespace is able 
to discuss with the outside world will depends on the inter container 
network configuration: a physical device can be assigned to a network 
namespace, or a system with a bridge + a physical network device + one 
side of a pair device (having the other side to the namespace), or a 
macvlan assigned to a namespace, or 'nat' with a pair device, or a 
tunnel, etc ...

> The implication is that apps joining the same group in multiple
> containers will result in multiple advertisements for the same
> group, from each of the multiple instances of IGMP & MLD.
> In IPv4, that's just ineffecient.

I agree.

 > In IPv6, the question is: do you have
> multiple link-local addresses-- one for each virtual device?
> If not, then MLD will be sending multiple copies of everything in
> violation of the spec (since they'll be from the same source, too).

Yes, each virtual device has its own set of network resources, so when 
it is activated in the namespace, the link local address is computed, 
the DAD is invoked and the ip is set on the device.

> I think IGMP and MLD both belong with the physical interface, since
> they pretty much do exactly what you want already: glom all the
> different filters and group memberships together into exactly the
> minimal set of group memberships needed for everyone to hear
> just the pieces they've requested.
> If you do that at the interface, then you won't have any duplicated
> traffic on the physical net and you can separate copies as needed
> for the different virtual nets on the host. Perfect, and indistinguishable
> externally from a non-container machine (and the code to do it is
> already in IGMP and MLD).
> 
> If you treat them as separate physical devices all the way to the
> wire, then you're just needlessly increasing the host processing
> you need to do, as well as loading the multicast routers and network
> that are unfortunate enough to be on the same network as you are.

That makes sense, but the containers can be configured to have a network 
inside the host which acts like a router, a kind of an internal cluster 
in the host, I want to have each container to send an mcast report to 
reproduce the real behaviour of a physical network.

> I haven't been paying attention, so I'll be happy if you tell me you've
> already addressed this. :-) Otherwise, I think it'd be wise to do so
> before it's released into the wild and can't be easily changed.

No, you are right, I didn't addressed that. I thought we stated that was 
an optimization which can be done later.

I don't think having for N containers, N reports for joining / leaving a 
group is something critical at this point, IMHO we can live with that 
for now.

The critical point is : the protocol must not be violated and AFAICS 
this is the case, right ?

Your points are totally valid and I agree 100% with you. But as you can 
see this optimization is not trivial to realize because we have to take 
into account different use cases of the network namespaces and have the 
network stack to behave in a clever way depending on the report to be 
sent internally in the host each time or externally one time.

I will add this optimization to my huge TODO list :) The only question 
is where should I put it, at the beginning or at the end of the list ?
If you think I missed something and there is something wrong with the 
actual approach (expect it can be more efficient) and it is critical for 
the kernel / the protocol, just let me know and I will go to your 
suggestion.

Thanks for your feedback.

   -- Daniel

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html