[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKdL+dS_-UiHA5tr7zP-5LtwDd0aXpZq5tpMYf9kr7hu2SwdGQ@mail.gmail.com>
Date: Thu, 11 May 2017 21:10:11 +0200
From: Fredrik Markström <fredrik.markstrom@...il.com>
To: Stephen Hemminger <stephen@...workplumber.org>
Cc: Eric Dumazet <eric.dumazet@...il.com>,
"David S. Miller" <davem@...emloft.net>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, bridge@...ts.linux-foundation.org
Subject: Re: [PATCH v2 1/2] net: Added mtu parameter to dev_forward_skb calls
On Thu, May 11, 2017 at 6:01 PM, Stephen Hemminger
<stephen@...workplumber.org> wrote:
> On Thu, 11 May 2017 15:46:27 +0200
> Fredrik Markstrom <fredrik.markstrom@...il.com> wrote:
>
>> From: Fredrik Markström <fredrik.markstrom@...il.com>
>>
>> is_skb_forwardable() currently checks if the packet size is <= mtu of
>> the receiving interface. This is not consistent with most of the hardware
>> ethernet drivers that happily receives packets larger then MTU.
>
> Wrong.
What is "Wrong" ? I was initially skeptical to implement this patch,
since it feels odd to have different MTU:s set on the two sides of a
link. After consulting some IP people and the RFC:s I kind of changed
my mind and thought I'd give it a shot. In the RFCs I couldn't find
anything that defined when and when not a received packet should be
dropped.
>
> Hardware interfaces are free to drop any packet greater than MTU (actually MTU + VLAN).
> The actual limit is a function of the hardware. Some hardware can only limit by
> power of 2; some can only limit frames larger than 1500; some have no limiting at all.
Agreed. The purpose of these patches is to be able to configure an
veth interface to mimic these different behaviors. Non of the Ethernet
interfaces I have access to drops packets due to them being larger
then the configured MTU like veth does.
Being able to mimic real Ethernet hardware is useful when
consolidating hardware using containers/namespaces.
In a reply to a comment from David Miller in my previous version of
the patch I attached the example below to demonstrate the case in
detail.
This works with all ethernet hardware setups I have access to:
---- 8< ------
# Host A eth2 and Host B eth0 is on the same network.
# On HOST A
% ip address add 1.2.3.4/24 dev eth2
% ip link set eth2 mtu 300 up
% # HOST B
% ip address add 1.2.3.5/24 dev eth0
% ip link set eth0 mtu 1000 up
% ping -c 1 -W 1 -s 400 1.2.3.4
PING 1.2.3.4 (1.2.3.4) 400(428) bytes of data.
408 bytes from 1.2.3.4: icmp_seq=1 ttl=64 time=1.57 ms
--- 1.2.3.4 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.573/1.573/1.573/0.000 ms
---- 8< ------
But it doesn't work with veth:
---- 8< ------
# veth0 and veth1 is a veth pair and veth1 has ben moved to a separate
network namespace.
% # NS A
% ip address add 1.2.3.4/24 dev veth0
% ip link set veth0 mtu 300 up
% # NS B
% ip address add 1.2.3.5/24 dev veth1
% ip link set veth1 mtu 1000 up
% ping -c 1 -W 1 -s 400 1.2.3.4
PING 1.2.3.4 (1.2.3.4) 400(428) bytes of data.
--- 1.2.3.4 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms
---- 8< ------
--
/Fredrik
Powered by blists - more mailing lists