[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <07B7943445653648AD9B4DBB916BB48F1C265220@cnshjmbx01>
Date: Wed, 25 Jan 2017 06:40:10 +0000
From: YUAN Jia <Jia.Yuan@...atel-sbell.com.cn>
To: "'linux-sctp@...r.kernel.org'" <linux-sctp@...r.kernel.org>,
'network dev' <netdev@...r.kernel.org>
Subject: A PMTU auto-discovery error for large SCTP packets
Hi All,
Recently, I met a problem of SCTP association broken which was resulted from large SCTP packets as attached in this mail.
Because the 1st packet’s length is 1626 that exceeds the next hop’s MTU of 1500, an ICMP packet of code 4 (Fragmentation needed) reflects back and carries the correct MTU value 1500. However, the mechanism of auto-adjusting PMTU doesn’t work. Having debugged the kernel, I find the ICMP packet is dropped at a pre-routing net filter called ‘nft_chain_nat_ipv4’ due to ‘CONFIG_NFT_CHAIN_NAT_IPV4’ being enabled. Below is the calling sequence:
PATH1: NF_INET_PRE_ROUTING → nft_nat_ipv4_in → nf_nat_ipv4_in → nf_nat_ipv4_fn → nf_nat_icmp_reply_translation → nf_nat_ipv4_manip_pkt
PATH2: NF_INET_PRE_ROUTING → nft_nat_ipv4_in → nf_nat_ipv4_in → nf_nat_ipv4_fn → nf_nat_packet → l3proto->manip_pkt(nf_nat_ipv4_manip_pkt)
COMMON: nf_nat_ipv4_manip_pkt → l4proto->manip_pkt(sctp_manip_pkt) → skb_make_writable
To reach the final function ‘skb_make_writable’ in this calling chain, the ICMP packet and various header pointers can be depicted as below:
MAC(l2) + [VLAN(l2)] + IP(l3) + ICMP(l4) + { payload ⇒ IP + SCTP } And the input parameter ‘hdroff’ now equals to the length from ‘skb->data’ to the SCTP header in the ICMP payload.
So, the statement ‘skb_make_writable(skb, hdroff + sizeof(*hdr))’ assumes that the SCTP header is intact and whole. However, certain network elements (routes, gateways, or something like that) probably send ICMP only containing extra 8 bytes (64 bits) after the IP header of original packet. Just as the attachment shown, the ICMP only contained the source port, destination port and SCTP verification tag of the partial (8 bytes) SCTP header in the previous SCTP packet. Such the case can make ‘skb_make_writable’ return false. And then, the ICMP packet will be dropped. Finally, the upper layer’s ‘err_handler’ would not be triggered to notify SCTP for updating the PMTU.
I compare it with how the TCP protocol is handled. In the file ‘net/netfilter/nf_nat_proto_tcp.c’, there’s also a similar function called ‘tcp_manip_pkt’, and a paragraph of commence describing as below:
int hdrsize = 8; /* TCP connection tracking guarantees this much */
/* this could be a inner header returned in icmp packet; in such
cases we cannot update the checksum field since it is outside of
the 8 bytes of transport layer headers we are guaranteed */
if (skb->len >= hdroff + sizeof(struct tcphdr))
hdrsize = sizeof(struct tcphdr);
if (!skb_make_writable(skb, hdroff + hdrsize))
return false;
……………………… and later …………………………
if (hdrsize < sizeof(*hdr))
return true;
I think that ‘sctp_manip_pkt’ should also behave like this. Isn’t it?
Best regards,
Richard
Download attachment "icmp_pmtu.pcap" of type "application/octet-stream" (1902 bytes)
Download attachment "icmp_pmtu.rar" of type "application/octet-stream" (1098 bytes)
Powered by blists - more mailing lists