[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <48C9522B.8080002@fr.ibm.com>
Date: Thu, 11 Sep 2008 19:15:23 +0200
From: Daniel Lezcano <dlezcano@...ibm.com>
To: David Stevens <dlstevens@...ibm.com>
CC: Alexey Dobriyan <adobriyan@...il.com>,
containers@...ts.linux-foundation.org, davem@...emloft.net,
netdev@...r.kernel.org
Subject: Re: [PATCH] igmp: make /proc/net/{igmp,mcfilter} per netns
David Stevens wrote:
> As I've said before, I really don't like the model you're
> using for multicasting here (if I understand correctly, and
> I shamelessly admit I haven't looked at this code in detail).
Hi David,
Sorry for the delay.
> As I understand it, you're modelling the multiple virtual interfaces
> as different pieces of hardware on the same physical network.
Exact. The network namespace acts at the layer 2 level. The network
resources are isolated and accessed relatively from the namespace
instead of a global static variable. For example, the network device
list is per namespace as well as the loopback.
The network devices belong to a specific namespace and can not be used,
neither seen from another namespace. How the network namespace is able
to discuss with the outside world will depends on the inter container
network configuration: a physical device can be assigned to a network
namespace, or a system with a bridge + a physical network device + one
side of a pair device (having the other side to the namespace), or a
macvlan assigned to a namespace, or 'nat' with a pair device, or a
tunnel, etc ...
> The implication is that apps joining the same group in multiple
> containers will result in multiple advertisements for the same
> group, from each of the multiple instances of IGMP & MLD.
> In IPv4, that's just ineffecient.
I agree.
> In IPv6, the question is: do you have
> multiple link-local addresses-- one for each virtual device?
> If not, then MLD will be sending multiple copies of everything in
> violation of the spec (since they'll be from the same source, too).
Yes, each virtual device has its own set of network resources, so when
it is activated in the namespace, the link local address is computed,
the DAD is invoked and the ip is set on the device.
> I think IGMP and MLD both belong with the physical interface, since
> they pretty much do exactly what you want already: glom all the
> different filters and group memberships together into exactly the
> minimal set of group memberships needed for everyone to hear
> just the pieces they've requested.
> If you do that at the interface, then you won't have any duplicated
> traffic on the physical net and you can separate copies as needed
> for the different virtual nets on the host. Perfect, and indistinguishable
> externally from a non-container machine (and the code to do it is
> already in IGMP and MLD).
>
> If you treat them as separate physical devices all the way to the
> wire, then you're just needlessly increasing the host processing
> you need to do, as well as loading the multicast routers and network
> that are unfortunate enough to be on the same network as you are.
That makes sense, but the containers can be configured to have a network
inside the host which acts like a router, a kind of an internal cluster
in the host, I want to have each container to send an mcast report to
reproduce the real behaviour of a physical network.
> I haven't been paying attention, so I'll be happy if you tell me you've
> already addressed this. :-) Otherwise, I think it'd be wise to do so
> before it's released into the wild and can't be easily changed.
No, you are right, I didn't addressed that. I thought we stated that was
an optimization which can be done later.
I don't think having for N containers, N reports for joining / leaving a
group is something critical at this point, IMHO we can live with that
for now.
The critical point is : the protocol must not be violated and AFAICS
this is the case, right ?
Your points are totally valid and I agree 100% with you. But as you can
see this optimization is not trivial to realize because we have to take
into account different use cases of the network namespaces and have the
network stack to behave in a clever way depending on the report to be
sent internally in the host each time or externally one time.
I will add this optimization to my huge TODO list :) The only question
is where should I put it, at the beginning or at the end of the list ?
If you think I missed something and there is something wrong with the
actual approach (expect it can be more efficient) and it is critical for
the kernel / the protocol, just let me know and I will go to your
suggestion.
Thanks for your feedback.
-- Daniel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists