lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <Y4nlslrmQBa6Lqf3@system76-pc.localdomain>
Date:   Fri, 02 Dec 2022 11:47:01 +0000
From:   Sahid Orentino Ferdjaoui 
        <sahid.ferdjaoui@...ustrialdiscipline.com>
To:     bpf@...r.kernel.org, netdev@...r.kernel.org
Subject: [Q/A][ICMP Unreach need-to-frag] Forwarded packets and IP fragmentation

Hi,

I have question regarding design. The point is to handle ICMP Error
Unreach need-to-fragment.

Specifically and to share a bit more of context. The idea is to make
Cilium handling ICMP Error Unreach need-to-fragment with service
NodePort.

I understand that Cilium is not Linux but for that particular case we
are in the middle of both.

Initially the question that I had was, does Linux do packet
fragmentation from a host that is forwarding traffic. That when it has
a route for that traffic which indicates a smaller MTU than the
traffic comming from. Will the traffic be fragmented during egressing?

But then I was considering to discuss what I have experienced and
ideas since I may be wrong on my way to implement it.

Also that I have noticed the option `ip_forward_use_pmtu` But probably
not for thise case, I have enabled it but no luck.


Pod-X      : 172.10.0.10
NodePort-X : 192.168.39.23
Router-X   : 192.168.39.1
Client-X   : 10.1.0.100


                                  +------------+
                                  | Pod-X      |
                                  +------+-----+
   Cilium Host-Y                         |
            ------+-------------+--------+-------------------
                  ||
                  || VxLan
                  ||
                  ||               +------------+
                  ||               | NodePort-X |  192.168.39.0/24 dev eth0
                  ||               +------+-----+
   Cilium Host-X  ||                      |
            ------++------------+---------+-------------------
                                |
   World                        |
   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                   |  ^
                   |  | ICMP Error Router-Y to NodePort
                   |  |   192.168.39.1/192.168.39.23
                   |  |      with in Payload 192.168.39.23/10.1.0.100
            +-------------+
            | Router-X    |
            +-------------+
                   |
   ----------------+-------+--------
                           |
                         Client-X


Routes
------
10.1.0.100 via 192.168.39.1 MTU 800


For a given Pod behind a service Nodeport delivery contents that is
exceeding MTU of one of the networking equipment in the path between
cluster and client. In that situation the networking equipment will
return to Cluster (NodePort) an ICMP Error Unreach need-to-fragment.

* Forwarding the packet to the Pod would not work since the Pod only
  has a view of the path between the node that is hosting the service
  NodePort and the backend node that is hosting the Pod, and we don’t
  want to reduce the MTU for that path.

Saying all of that I’m struggling to find the right approach.

I have experimented some:

1/ Having the host that is hosting the service Nodeport handling (as
   opposite to forward it to Pod) the ICMP Error message (currently
   dropped). The expected state would be to have the route table of
   the host that is hosting service NodePort be updated accordingly
   the ICMP Error.  But that does not look possible at that point in
   Linux since there are some checks to validate that the ICMP Error
   has been received for a response of a packet emit [0]. In context
   of Cilium we bypass netfilter during egressing, right?

	sk = __inet_lookup_established(net, net->ipv4.tcp_death_row.hashinfo,
				       iph->daddr, th->dest, iph->saddr,
				       ntohs(th->source), inet_iif(skb), 0);
	if (!sk) {
		__ICMP_INC_STATS(net, ICMP_MIB_INERRORS);
		return -ENOENT;
	}

[0] https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/net/ipv4/tcp_ipv4.c#n487


2/ Having service NodePort itself updating the route table of the host
   to instruct the new route with MTU based on the ICMP Error Unreach
   need-to-frag.  In that situation It may be expect that the packets
   get fragmented by the host during egressing. But based on my tests
   that does not look to work, I'm nore sure if Linux handle that case
   of forwarding/fragmenting?

3/ Having service NodePort handling the full implementation of ICMP
   Error Unreach need-to-fragement;
   - For a ICMP Error received the service would maintaining a MAP
     with routes and MTU.
   - For a packet leaving, the service would fragment packets if
     needed.


s.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ