[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130531102233.13e51cff@nehalam.linuxnetplumber.net>
Date: Fri, 31 May 2013 10:22:33 -0700
From: Stephen Hemminger <stephen@...workplumber.org>
To: David Stevens <dlstevens@...ibm.com>
Cc: netdev@...r.kernel.org, netdev-owner@...r.kernel.org
Subject: Re: RFC - VXLAN port range facility
On Fri, 31 May 2013 13:08:56 -0400
David Stevens <dlstevens@...ibm.com> wrote:
> Stephen Hemminger <stephen@...workplumber.org> wrote on 05/31/2013
> 12:13:38 PM:
>
> >
> > RFC text:
> > Outer UDP Header: This is the outer UDP header with a source
> > port provided by the VTEP and the destination port being a well
> > known UDP port to be obtained by IANA assignment. It is
> recommended
> > that the source port be a hash of the inner Ethernet frame's
> headers
> > to obtain a level of entropy for ECMP/load balancing of the VM
> to VM
> > traffic across the VXLAN overlay.
> >
> >
> > You can restrict to a smaller range if that is a requirement of your
> > infrastructure.
>
> I'm suggesting the smaller range, because the fix for the part
> that is broken would become a resource issue for the current, larger
> default range.
> [and a "recommended" in a draft doesn't trump 35 years of UDP
> usage, even if it did say not to bind the ports...]
>
> > Normal UDP applications assign their source port from the ephemeral
> > port range,
> > so that is what VXLAN does.
>
> Normal UDP applications bind to the source port. If they are
> unbound, they bind just for the send and then unbind after. They
> cannot use a port already bound _because_the_bind_prohibits_it.
> That is, in fact, the entire issue I'm raising. (!) If I have
> a UDP application that binds to port 35000, no other UDP application
> will ever use that port until I release it, and any ICMP errors delivered
> to my socket are triggered by my application.
> That became no longer true with the addition of VXLAN port ranges,
> because VXLAN does not use UDP bind, or any of the UDP code, to enforce
> this. It simply generates a random number in the range, which _can_be_
> 35000 or any other bound port, and then sends its own, constructed UDP
> header using that port.
>
> The proper way to fix this would be to actually bind to a port in
> the range, and retry another port if the binding fails, until the binding
> succeeds. But as VXLAN picks a randomized source port _for_each_packet_,
> I'm not suggesting we do that.
> I'm suggesting, instead, that we bind on all the source ports we
> will use at start-up, which then reserves those ports for VXLAN and
> prevents anyone else from binding on them.
> That solves the issue of binding and unbinding on each packet,
> but I am not then suggesting that VXLAN should bind on 30,000 ports on
> start-up. That would be silly, especially on a system whose primary
> function
> is not VXLAN.
> So, the logical next question is: does VXLAN really need a range
> of 30,000 ports as the "normal" circumstance? I think the answer to that
> is definitely "no." In fact, just one port would work fine a lot of the
> time, and when multiple ports are needed, the capability is still there.
> That suggests changing the *default* range (I suggest to 1 port).
The range could be smaller yes, but that means you are restricting
hashing.
> My conclusions from that reasoning:
>
> 1) VXLAN use of UDP source ports is broken; it cannot use ports that are
> already bound, and right now it does
> 2) while a bind/unbind would work, doing that on every packet is slow
The problem is the bind/unbind is a flow state operation, and
doing keeping flow state wouldn't scale.
>
> so,
>
> 3) the default port range should be much smaller and VXLAN should bind
> in advance to the set of ports it wants to use.
Probably should not overlap ephemeral port range for applications.
>
> Now, maybe it wouldn't kill performance, and so doing a bind/unbind per
> packet is still an option, but that would definitely hurt performance
> for people who don't actually care about port entropy.
What about a peek operation that just avoids existing ports.
> Whether solved by a bind/unbind, pre-binding to a smaller default port
> range, or a switch between the two, I think VXLAN *must* follow the
> rules in its use of UDP and ensure that it doesn't send using source
> ports in use by something else. It can't just generate a random one
> and use it without checking it, as it does now.
>
> +-DLS
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists