lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b0d7a77b-a33d-150c-65e2-6caebcec772f@nextfour.com>
Date:   Sat, 6 Mar 2021 16:49:19 +0200
From:   Mika Penttilä <mika.penttila@...tfour.com>
To:     Pablo Neira Ayuso <pablo@...filter.org>,
        netfilter-devel@...r.kernel.org
Cc:     davem@...emloft.net, netdev@...r.kernel.org, kuba@...nel.org
Subject: Re: [PATCH net 3/9] netfilter: nf_nat: undo erroneous tcp edemux
 lookup



On 6.3.2021 14.12, Pablo Neira Ayuso wrote:
> From: Florian Westphal <fw@...len.de>
>
> Under extremely rare conditions TCP early demux will retrieve the wrong
> socket.
>
> 1. local machine establishes a connection to a remote server, S, on port
>     p.
>
>     This gives:
>     laddr:lport -> S:p
>     ... both in tcp and conntrack.
>
> 2. local machine establishes a connection to host H, on port p2.
>     2a. TCP stack choses same laddr:lport, so we have
>     laddr:lport -> H:p2 from TCP point of view.
>     2b). There is a destination NAT rewrite in place, translating
>          H:p2 to S:p.  This results in following conntrack entries:
>
>     I)  laddr:lport -> S:p  (origin)  S:p -> laddr:lport (reply)
>     II) laddr:lport -> H:p2 (origin)  S:p -> laddr:lport2 (reply)
>
>     NAT engine has rewritten laddr:lport to laddr:lport2 to map
>     the reply packet to the correct origin.
Could you eloborate where and how linux nat engine is doing the

laddr:lport to laddr:lport2

rewrite? There's only DST nat and there should be conflict (for reply) 
in tuple establishment afaik....


>
>     When server sends SYN/ACK to laddr:lport2, the PREROUTING hook
>     will undo-the SNAT transformation, rewriting IP header to
>     S:p -> laddr:lport
>
>     This causes TCP early demux to associate the skb with the TCP socket
>     of the first connection.
>
>     The INPUT hook will then reverse the DNAT transformation, rewriting
>     the IP header to H:p2 -> laddr:lport.
>
> Because packet ends up with the wrong socket, the new connection
> never completes: originator stays in SYN_SENT and conntrack entry
> remains in SYN_RECV until timeout, and responder retransmits SYN/ACK
> until it gives up.
>
> To resolve this, orphan the skb after the input rewrite:
> Because the source IP address changed, the socket must be incorrect.
> We can't move the DNAT undo to prerouting due to backwards
> compatibility, doing so will make iptables/nftables rules to no longer
> match the way they did.
>
> After orphan, the packet will be handed to the next protocol layer
> (tcp, udp, ...) and that will repeat the socket lookup just like as if
> early demux was disabled.
>
> Fixes: 41063e9dd1195 ("ipv4: Early TCP socket demux.")
> Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1427
> Signed-off-by: Florian Westphal <fw@...len.de>
> Signed-off-by: Pablo Neira Ayuso <pablo@...filter.org>
> ---
>   net/netfilter/nf_nat_proto.c | 25 +++++++++++++++++++++----
>   1 file changed, 21 insertions(+), 4 deletions(-)
>
> diff --git a/net/netfilter/nf_nat_proto.c b/net/netfilter/nf_nat_proto.c
> index e87b6bd6b3cd..4731d21fc3ad 100644
> --- a/net/netfilter/nf_nat_proto.c
> +++ b/net/netfilter/nf_nat_proto.c
> @@ -646,8 +646,8 @@ nf_nat_ipv4_fn(void *priv, struct sk_buff *skb,
>   }
>   
>   static unsigned int
> -nf_nat_ipv4_in(void *priv, struct sk_buff *skb,
> -	       const struct nf_hook_state *state)
> +nf_nat_ipv4_pre_routing(void *priv, struct sk_buff *skb,
> +			const struct nf_hook_state *state)
>   {
>   	unsigned int ret;
>   	__be32 daddr = ip_hdr(skb)->daddr;
> @@ -659,6 +659,23 @@ nf_nat_ipv4_in(void *priv, struct sk_buff *skb,
>   	return ret;
>   }
>   
> +static unsigned int
> +nf_nat_ipv4_local_in(void *priv, struct sk_buff *skb,
> +		     const struct nf_hook_state *state)
> +{
> +	__be32 saddr = ip_hdr(skb)->saddr;
> +	struct sock *sk = skb->sk;
> +	unsigned int ret;
> +
> +	ret = nf_nat_ipv4_fn(priv, skb, state);
> +
> +	if (ret == NF_ACCEPT && sk && saddr != ip_hdr(skb)->saddr &&
> +	    !inet_sk_transparent(sk))
> +		skb_orphan(skb); /* TCP edemux obtained wrong socket */
> +
> +	return ret;
> +}
> +
>   static unsigned int
>   nf_nat_ipv4_out(void *priv, struct sk_buff *skb,
>   		const struct nf_hook_state *state)
> @@ -736,7 +753,7 @@ nf_nat_ipv4_local_fn(void *priv, struct sk_buff *skb,
>   static const struct nf_hook_ops nf_nat_ipv4_ops[] = {
>   	/* Before packet filtering, change destination */
>   	{
> -		.hook		= nf_nat_ipv4_in,
> +		.hook		= nf_nat_ipv4_pre_routing,
>   		.pf		= NFPROTO_IPV4,
>   		.hooknum	= NF_INET_PRE_ROUTING,
>   		.priority	= NF_IP_PRI_NAT_DST,
> @@ -757,7 +774,7 @@ static const struct nf_hook_ops nf_nat_ipv4_ops[] = {
>   	},
>   	/* After packet filtering, change source */
>   	{
> -		.hook		= nf_nat_ipv4_fn,
> +		.hook		= nf_nat_ipv4_local_in,
>   		.pf		= NFPROTO_IPV4,
>   		.hooknum	= NF_INET_LOCAL_IN,
>   		.priority	= NF_IP_PRI_NAT_SRC,

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ