[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 30 Mar 2010 08:06:08 +0200
From: Eric Dumazet <eric.dumazet@...il.com>
To: Andi Kleen <andi@...stfloor.org>
Cc: "Templin, Fred L" <Fred.L.Templin@...ing.com>,
Rick Jones <rick.jones2@...com>,
"Edgar E. Iglesias" <edgar.iglesias@...il.com>,
Glen Turner <gdt@....id.au>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: UDP path MTU discovery
Le mardi 30 mars 2010 à 07:20 +0200, Andi Kleen a écrit :
> On Mon, Mar 29, 2010 at 04:38:49PM -0700, Templin, Fred L wrote:
> > > 1) 4096 bytes UDP messages... well...
> > > 2) Using regular TCP for DNS servers... well...
> > >
> > > I believe some guys were pushing TCPCT (Cookie Transactions) for this
> > > case ( http://tools.ietf.org/html/draft-simpson-tcpct-00.html )
> > >
> > > (That is, using an enhanced TCP for long DNS queries... but not only for
> > > DNS...)
> >
> > IPv4 gets by this by setting DF=0 in the IP header, and
> > lets the network fragment the packet if necessary. IPv6 can
> > similarly get by this by having the sending host fragment
> > the large UDP packet into IPv6 fragments no longer than
> > 1280 bytes each.
>
> That's true -- in theory the UDP app unwilling/unable to do proper ptmudisc
> could set the path mtu to 1280 + header and still keep path mtu discovery off
> and then just fragment.
>
> Drawback would be of course suboptimal network use with too small MTUs
> in the common case.
>
> Right now there is no right socket option to set the path mtu. We
> have a IP_MTU option, but it only works for getting the MTU.
> That's because the PMTU is in the routing cache entry and shared
> by multiple sockets. Presumably one could add a special case
> with an MTU in the socket overriding the one in the destination entry.
We have IP_MTU_DISCOVER option with four existing values
/* IP_MTU_DISCOVER values */
#define IP_PMTUDISC_DONT 0 /* Never send DF frames */
#define IP_PMTUDISC_WANT 1 /* Use per route hints */
#define IP_PMTUDISC_DO 2 /* Always DF */
#define IP_PMTUDISC_PROBE 3 /* Ignore dst pmtu */
We might add a fifth value (or open full range) and change
static inline int ip_skb_dst_mtu(struct sk_buff *skb)
{
struct inet_sock *inet = skb->sk ? inet_sk(skb->sk) : NULL;
return (inet && inet->pmtudisc == IP_PMTUDISC_PROBE) ?
skb_dst(skb)->dev->mtu : dst_mtu(skb_dst(skb));
}
->
static inline int ip_skb_dst_mtu(struct sk_buff *skb)
{
if (skb->sk) {
struct inet_sock *inet = inet_sk(skb->sk);
if (inet->pmtudisc > IP_PMTUDISC_PROBE)
return inet->pmtudisc;
if (inet->pmtudisc == IP_PMTUDISC_PROBE)
return skb_dst(skb)->dev->mtu;
}
return dst_mtu(skb_dst(skb));
}
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists