netdev - Re: [PATCH v3] veth: Drop MTU check when forwarding packets

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87frr8wt03.fsf@toke.dk>
Date: Tue, 13 Aug 2024 13:40:28 +0200
From: Toke Høiland-Jørgensen <toke@...nel.org>
To: Duan Jiong <djduanjiong@...il.com>
Cc: Willem de Bruijn <willemdebruijn.kernel@...il.com>, netdev@...r.kernel.org
Subject: Re: [PATCH v3] veth: Drop MTU check when forwarding packets

Duan Jiong <djduanjiong@...il.com> writes:

> Sorry for responding to the email so late.
>
>>
>> I do see your point that a virtual device doesn't really *have* to
>> respect MTU, though. So if we were implementing a new driver this
>> argument would be a lot easier to make. In fact, AFAICT the newer netkit
>> driver doesn't check the MTU setting before forwarding, so there's
>> already some inconsistency there.
>>
>> >> You still haven't answered what's keeping you from setting the MTU
>> >> correctly on the veth devices you're using?
>> >
>
> vm1(mtu 1600)---ovs---ipsec vpn1(mtu 1500)---ipsec vpn2(mtu
> 1500)---ovs---vm2(mtu 1600)

Where's the veth device in this setup?

> My scenario is that two vms are communicating via ipsec vpn gateway,
> the two vpn gateways are interconnected via public network, the vpn
> gateway has only one NIC, single arm mode. vpn gateway mtu will be
> 1500 in general, but the packets sent by the vm's to the vpn gateway
> may be more than 1500, and at this time, if implemented according to
> the existing veth driver, the packets sent by the vm's will be
> discarded. If allowed to receive large packets, the vpn gateway can
> actually accept large packets then esp encapsulate them and then
> fragment so that in the end it doesn't affect the connectivity of the
> network.

I'm not sure I quite get the setup; it sounds like you want a subset of
the traffic to adhere to one MTU, and another subset to adhere to a
different MTU, on the same interface? Could you not divide the traffic
over two different interfaces (with different MTUs) instead?

>> > Agreed that it has a risk, so some justification is in order. Similar
>> > to how commit 5f7d57280c19 (" bpf: Drop MTU check when doing TC-BPF
>> > redirect to ingress") addressed a specific need.
>>
>> Exactly :)
>>
>> And cf the above, using netkit may be an alternative that doesn't carry
>> this risk (assuming that's compatible with the use case).
>>
>> -Toke
>
>
> I can see how there could be a potential risk here, can we consider
> adding a switchable option to control this behavior?

Hmm, a toggle has its own cost in terms of complexity and overhead. Plus
it's adding new UAPI. It may be that this is the least bad option in the
end, but before going that route we should be very sure that there's not
another way to solve your problem (cf the above).

This has been discussed before, BTW, most recently five-and-some
years ago:

https://patchwork.ozlabs.org/project/netdev/patch/CAMJ5cBHZ4DqjE6Md-0apA8aaLLk9Hpiypfooo7ud-p9XyFyeng@mail.gmail.com/

-Toke