[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <OF19ABC0E6.77100D83-ON85257B7C.005AFB01-85257B7C.005E33DC@us.ibm.com>
Date: Fri, 31 May 2013 13:08:56 -0400
From: David Stevens <dlstevens@...ibm.com>
To: Stephen Hemminger <stephen@...workplumber.org>
Cc: netdev@...r.kernel.org, netdev-owner@...r.kernel.org
Subject: Re: RFC - VXLAN port range facility
Stephen Hemminger <stephen@...workplumber.org> wrote on 05/31/2013
12:13:38 PM:
>
> RFC text:
> Outer UDP Header: This is the outer UDP header with a source
> port provided by the VTEP and the destination port being a well
> known UDP port to be obtained by IANA assignment. It is
recommended
> that the source port be a hash of the inner Ethernet frame's
headers
> to obtain a level of entropy for ECMP/load balancing of the VM
to VM
> traffic across the VXLAN overlay.
>
>
> You can restrict to a smaller range if that is a requirement of your
> infrastructure.
I'm suggesting the smaller range, because the fix for the part
that is broken would become a resource issue for the current, larger
default range.
[and a "recommended" in a draft doesn't trump 35 years of UDP
usage, even if it did say not to bind the ports...]
> Normal UDP applications assign their source port from the ephemeral
> port range,
> so that is what VXLAN does.
Normal UDP applications bind to the source port. If they are
unbound, they bind just for the send and then unbind after. They
cannot use a port already bound _because_the_bind_prohibits_it.
That is, in fact, the entire issue I'm raising. (!) If I have
a UDP application that binds to port 35000, no other UDP application
will ever use that port until I release it, and any ICMP errors delivered
to my socket are triggered by my application.
That became no longer true with the addition of VXLAN port ranges,
because VXLAN does not use UDP bind, or any of the UDP code, to enforce
this. It simply generates a random number in the range, which _can_be_
35000 or any other bound port, and then sends its own, constructed UDP
header using that port.
The proper way to fix this would be to actually bind to a port in
the range, and retry another port if the binding fails, until the binding
succeeds. But as VXLAN picks a randomized source port _for_each_packet_,
I'm not suggesting we do that.
I'm suggesting, instead, that we bind on all the source ports we
will use at start-up, which then reserves those ports for VXLAN and
prevents anyone else from binding on them.
That solves the issue of binding and unbinding on each packet,
but I am not then suggesting that VXLAN should bind on 30,000 ports on
start-up. That would be silly, especially on a system whose primary
function
is not VXLAN.
So, the logical next question is: does VXLAN really need a range
of 30,000 ports as the "normal" circumstance? I think the answer to that
is definitely "no." In fact, just one port would work fine a lot of the
time, and when multiple ports are needed, the capability is still there.
That suggests changing the *default* range (I suggest to 1 port).
My conclusions from that reasoning:
1) VXLAN use of UDP source ports is broken; it cannot use ports that are
already bound, and right now it does
2) while a bind/unbind would work, doing that on every packet is slow
so,
3) the default port range should be much smaller and VXLAN should bind
in advance to the set of ports it wants to use.
Now, maybe it wouldn't kill performance, and so doing a bind/unbind per
packet is still an option, but that would definitely hurt performance
for people who don't actually care about port entropy.
Whether solved by a bind/unbind, pre-binding to a smaller default port
range, or a switch between the two, I think VXLAN *must* follow the
rules in its use of UDP and ensure that it doesn't send using source
ports in use by something else. It can't just generate a random one
and use it without checking it, as it does now.
+-DLS
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists