lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <07B7943445653648AD9B4DBB916BB48F1C265220@cnshjmbx01>
Date:   Wed, 25 Jan 2017 06:40:10 +0000
From:   YUAN Jia <Jia.Yuan@...atel-sbell.com.cn>
To:     "'linux-sctp@...r.kernel.org'" <linux-sctp@...r.kernel.org>,
        'network dev' <netdev@...r.kernel.org>
Subject: A PMTU auto-discovery error for large SCTP packets

Hi All,

Recently, I met a problem of SCTP association broken which was resulted from large SCTP packets as attached in this mail.
Because the 1st packet’s length is 1626 that exceeds the next hop’s MTU of 1500, an ICMP packet of code 4 (Fragmentation needed) reflects back and carries the correct MTU value 1500. However, the mechanism of auto-adjusting PMTU doesn’t work. Having debugged the kernel, I find the ICMP packet is dropped at a pre-routing net filter called ‘nft_chain_nat_ipv4’ due to ‘CONFIG_NFT_CHAIN_NAT_IPV4’ being enabled. Below is the calling sequence:
PATH1:     NF_INET_PRE_ROUTING → nft_nat_ipv4_in → nf_nat_ipv4_in → nf_nat_ipv4_fn → nf_nat_icmp_reply_translation → nf_nat_ipv4_manip_pkt
PATH2:     NF_INET_PRE_ROUTING → nft_nat_ipv4_in → nf_nat_ipv4_in → nf_nat_ipv4_fn → nf_nat_packet → l3proto->manip_pkt(nf_nat_ipv4_manip_pkt)
COMMON:  nf_nat_ipv4_manip_pkt → l4proto->manip_pkt(sctp_manip_pkt) → skb_make_writable

To reach the final function ‘skb_make_writable’ in this calling chain, the ICMP packet and various header pointers can be depicted as below:
MAC(l2) + [VLAN(l2)] + IP(l3) + ICMP(l4) + { payload ⇒ IP + SCTP }            And the input parameter ‘hdroff’ now equals to the length from ‘skb->data’ to the SCTP header in the ICMP payload.
So, the statement ‘skb_make_writable(skb, hdroff + sizeof(*hdr))’ assumes that the SCTP header is intact and whole. However, certain network elements (routes, gateways, or something like that) probably send ICMP only containing extra 8 bytes (64 bits) after the IP header of original packet. Just as the attachment shown, the ICMP only contained the source port, destination port and SCTP verification tag of the partial (8 bytes) SCTP header in the previous SCTP packet. Such the case can make ‘skb_make_writable’ return false. And then, the ICMP packet will be dropped. Finally, the upper layer’s ‘err_handler’ would not be triggered to notify SCTP for updating the PMTU.

I compare it with how the TCP protocol is handled. In the file ‘net/netfilter/nf_nat_proto_tcp.c’, there’s also a similar function called ‘tcp_manip_pkt’, and a paragraph of commence describing as below:
     int hdrsize = 8; /* TCP connection tracking guarantees this much */
     
    /* this could be a inner header returned in icmp packet; in such
       cases we cannot update the checksum field since it is outside of
       the 8 bytes of transport layer headers we are guaranteed */
    if (skb->len >= hdroff + sizeof(struct tcphdr))
        hdrsize = sizeof(struct tcphdr);

    if (!skb_make_writable(skb, hdroff + hdrsize))
        return false;
……………………… and later …………………………
    if (hdrsize < sizeof(*hdr))
        return true;

I think that ‘sctp_manip_pkt’ should also behave like this. Isn’t it?

Best regards,
Richard


Download attachment "icmp_pmtu.pcap" of type "application/octet-stream" (1902 bytes)

Download attachment "icmp_pmtu.rar" of type "application/octet-stream" (1098 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ