[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKgNAkiSrGdXd2e1dpcC0fPmARRwkMrUvB8vCJ7eQ8SYsCuuUw@mail.gmail.com>
Date: Thu, 22 Sep 2011 06:15:53 +0200
From: Michael Kerrisk <mtk.manpages@...il.com>
To: Benjamin Poirier <benjamin.poirier@...il.com>
Cc: Neil Horman <nhorman@...driver.com>, linux-man@...r.kernel.org,
netdev@...r.kernel.org
Subject: Re: discrepancy in ip(7) wrt. IP DF flag for UDP sockets
Ben, Neil,
On Tue, Sep 20, 2011 at 4:31 PM, Neil Horman <nhorman@...driver.com> wrote:
> On Tue, Sep 20, 2011 at 10:12:34AM -0400, Benjamin Poirier wrote:
>> On 11-09-20 09:38, Neil Horman wrote:
>> > On Tue, Sep 20, 2011 at 09:29:54AM -0400, Benjamin Poirier wrote:
>> > > On 11-09-20 08:14, Michael Kerrisk wrote:
>> > > > Hello Benjamin, Neil,
>> > > >
>> [snip]
>> > > >
>> > > > Could you describe the required change in terms of how the man page
>> > > > text should look--i.e., rewrite the passage as you think it should
>> > > > look?
>> > >
>> > > How about changing it to:
>> > > IP_MTU_DISCOVER (since Linux 2.2)
>> > > Set or receive the Path MTU Discovery setting for a socket. When
>> > > enabled, the don't-fragment flag is set on all outgoing packets.
>> > > Linux will perform Path MTU Discovery as defined in RFC 1191 on
>> > > SOCK_STREAM sockets. For non-SOCK_STREAM sockets, it is the
>> > > user's responsibility to packetize the data in MTU sized chunks
>> > > and to do the retransmits if necessary. The kernel will reject
>> > > (with EMSGSIZE) datagrams that are bigger than the known path
>> > > MTU. The system-wide default is controlled by the
>> > > /proc/sys/net/ipv4/ip_no_pmtu_disc file.
>> > >
>> > > Path MTU discovery flags Meaning
>> > > [...]
>> > >
>> > > There are some differences between _DO and _WANT that are glossed over
>> > > in this description, but I suppose there's only so much detail you can
>> > > put in...
>> > >
>> > > Thanks,
>> > > -Ben
>> > >
>> > Yeah, I think thats close, but its only the users responsibility to package
>> > datagrams in mtu sized chunks if they force the dont fragment bit on. If they
>> > go wtih the default, the stack will fragment a datagram is it sees fit according
>> > to the mtu of the path it traverses
>>
>> Exactly. To get into this level of detail, I think we have to mention
>> the option value, not just enabled/disabled. Let's try like this:
>>
>> IP_MTU_DISCOVER (since Linux 2.2)
>> Set or receive the Path MTU Discovery setting for a socket. When
>> enabled, Linux will perform Path MTU Discovery as defined in RFC
>> 1191 on SOCK_STREAM sockets. For non-SOCK_STREAM sockets,
>> IP_PMTUDISC_DO forces the don't-fragment flag to be set on all
>> outgoing packets. It is the user's responsibility to packetize
>> the data in MTU sized chunks and to do the retransmits if
>> necessary. The kernel will reject (with EMSGSIZE) datagrams
>> that are bigger than the known path MTU. IP_PMTUDISC_WANT will
>> fragment a datagram if needed according to the path MTU or will
>> set the don't-fragment flag otherwise.
>>
>> The system-wide default can be toggled between IP_PMTUDISC_WANT
>> and IP_PMTUDISC_DONT by writting to the
>> /proc/sys/net/ipv4/ip_no_pmtu_disc file.
>>
> Yes, that sounds good to me. Thanks for doing this!
> Acked-by: Neil Horman <nhorman@...driver.com>
Ben, thanks for writing this, and Neil, thanks for reviewing it. I've
applied that change for man-pages-3.34.
Ben, I added one small piece ti the description of
/proc/sys/net/ipv4/ip_no_pmtu_disc. For completeness, I've reproduced
the entire text below. Perhaps you could take a quick scan, to make
sure that the changed text is consistent with the whole piece.
Thanks,
Michael
IP_MTU_DISCOVER (since Linux 2.2)
Set or receive the Path MTU Discovery setting for a
socket. When enabled, Linux will perform Path MTU Dis-
covery as defined in RFC 1191 on SOCK_STREAM sockets.
For non-SOCK_STREAM sockets, IP_PMTUDISC_DO forces the
don't-fragment flag to be set on all outgoing packets.
The don't-fragment flag is set on all outgoing data-
grams. It is the user's responsibility to packetize the
data in MTU-sized chunks and to do the retransmits if
necessary. The kernel will reject (with EMSGSIZE) data-
grams that are bigger than the known path MTU.
IP_PMTUDISC_WANT will fragment a datagram if needed
according to the path MTU, or will set the don't-frag-
ment flag otherwise.
The system-wide default can be toggled between IP_PMTUD-
ISC_WANT and IP_PMTUDISC_DONT by writing (respectively,
zero and nonzero values) to the
/proc/sys/net/ipv4/ip_no_pmtu_disc file.
Path MTU discovery flags Meaning
IP_PMTUDISC_WANT Use per-route settings.
IP_PMTUDISC_DONT Never do Path MTU Discovery.
IP_PMTUDISC_DO Always do Path MTU Discovery.
IP_PMTUDISC_PROBE Set DF but ignore Path MTU.
When PMTU discovery is enabled, the kernel automatically
keeps track of the path MTU per destination host. When
it is connected to a specific peer with connect(2), the
currently known path MTU can be retrieved conveniently
using the IP_MTU socket option (e.g., after an EMSGSIZE
error occurred). The path MTU may change over time.
For connectionless sockets with many destinations, the
new MTU for a given destination can also be accessed
using the error queue (see IP_RECVERR). A new error
will be queued for every incoming MTU update.
While MTU discovery is in progress, initial packets from
datagram sockets may be dropped. Applications using UDP
should be aware of this and not take it into account for
their packet retransmit strategy.
To bootstrap the path MTU discovery process on uncon-
nected sockets, it is possible to start with a big data-
gram size (up to 64K-headers bytes long) and let it
shrink by updates of the path MTU.
To get an initial estimate of the path MTU, connect a
datagram socket to the destination address using con-
nect(2) and retrieve the MTU by calling getsockopt(2)
with the IP_MTU option.
It is possible to implement RFC 4821 MTU probing with
SOCK_DGRAM or SOCK_RAW sockets by setting a value of
IP_PMTUDISC_PROBE (available since Linux 2.6.22). This
is also particularly useful for diagnostic tools such as
tracepath(8) that wish to deliberately send probe pack-
ets larger than the observed Path MTU.
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists