[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b0d7a77b-a33d-150c-65e2-6caebcec772f@nextfour.com>
Date: Sat, 6 Mar 2021 16:49:19 +0200
From: Mika Penttilä <mika.penttila@...tfour.com>
To: Pablo Neira Ayuso <pablo@...filter.org>,
netfilter-devel@...r.kernel.org
Cc: davem@...emloft.net, netdev@...r.kernel.org, kuba@...nel.org
Subject: Re: [PATCH net 3/9] netfilter: nf_nat: undo erroneous tcp edemux
lookup
On 6.3.2021 14.12, Pablo Neira Ayuso wrote:
> From: Florian Westphal <fw@...len.de>
>
> Under extremely rare conditions TCP early demux will retrieve the wrong
> socket.
>
> 1. local machine establishes a connection to a remote server, S, on port
> p.
>
> This gives:
> laddr:lport -> S:p
> ... both in tcp and conntrack.
>
> 2. local machine establishes a connection to host H, on port p2.
> 2a. TCP stack choses same laddr:lport, so we have
> laddr:lport -> H:p2 from TCP point of view.
> 2b). There is a destination NAT rewrite in place, translating
> H:p2 to S:p. This results in following conntrack entries:
>
> I) laddr:lport -> S:p (origin) S:p -> laddr:lport (reply)
> II) laddr:lport -> H:p2 (origin) S:p -> laddr:lport2 (reply)
>
> NAT engine has rewritten laddr:lport to laddr:lport2 to map
> the reply packet to the correct origin.
Could you eloborate where and how linux nat engine is doing the
laddr:lport to laddr:lport2
rewrite? There's only DST nat and there should be conflict (for reply)
in tuple establishment afaik....
>
> When server sends SYN/ACK to laddr:lport2, the PREROUTING hook
> will undo-the SNAT transformation, rewriting IP header to
> S:p -> laddr:lport
>
> This causes TCP early demux to associate the skb with the TCP socket
> of the first connection.
>
> The INPUT hook will then reverse the DNAT transformation, rewriting
> the IP header to H:p2 -> laddr:lport.
>
> Because packet ends up with the wrong socket, the new connection
> never completes: originator stays in SYN_SENT and conntrack entry
> remains in SYN_RECV until timeout, and responder retransmits SYN/ACK
> until it gives up.
>
> To resolve this, orphan the skb after the input rewrite:
> Because the source IP address changed, the socket must be incorrect.
> We can't move the DNAT undo to prerouting due to backwards
> compatibility, doing so will make iptables/nftables rules to no longer
> match the way they did.
>
> After orphan, the packet will be handed to the next protocol layer
> (tcp, udp, ...) and that will repeat the socket lookup just like as if
> early demux was disabled.
>
> Fixes: 41063e9dd1195 ("ipv4: Early TCP socket demux.")
> Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1427
> Signed-off-by: Florian Westphal <fw@...len.de>
> Signed-off-by: Pablo Neira Ayuso <pablo@...filter.org>
> ---
> net/netfilter/nf_nat_proto.c | 25 +++++++++++++++++++++----
> 1 file changed, 21 insertions(+), 4 deletions(-)
>
> diff --git a/net/netfilter/nf_nat_proto.c b/net/netfilter/nf_nat_proto.c
> index e87b6bd6b3cd..4731d21fc3ad 100644
> --- a/net/netfilter/nf_nat_proto.c
> +++ b/net/netfilter/nf_nat_proto.c
> @@ -646,8 +646,8 @@ nf_nat_ipv4_fn(void *priv, struct sk_buff *skb,
> }
>
> static unsigned int
> -nf_nat_ipv4_in(void *priv, struct sk_buff *skb,
> - const struct nf_hook_state *state)
> +nf_nat_ipv4_pre_routing(void *priv, struct sk_buff *skb,
> + const struct nf_hook_state *state)
> {
> unsigned int ret;
> __be32 daddr = ip_hdr(skb)->daddr;
> @@ -659,6 +659,23 @@ nf_nat_ipv4_in(void *priv, struct sk_buff *skb,
> return ret;
> }
>
> +static unsigned int
> +nf_nat_ipv4_local_in(void *priv, struct sk_buff *skb,
> + const struct nf_hook_state *state)
> +{
> + __be32 saddr = ip_hdr(skb)->saddr;
> + struct sock *sk = skb->sk;
> + unsigned int ret;
> +
> + ret = nf_nat_ipv4_fn(priv, skb, state);
> +
> + if (ret == NF_ACCEPT && sk && saddr != ip_hdr(skb)->saddr &&
> + !inet_sk_transparent(sk))
> + skb_orphan(skb); /* TCP edemux obtained wrong socket */
> +
> + return ret;
> +}
> +
> static unsigned int
> nf_nat_ipv4_out(void *priv, struct sk_buff *skb,
> const struct nf_hook_state *state)
> @@ -736,7 +753,7 @@ nf_nat_ipv4_local_fn(void *priv, struct sk_buff *skb,
> static const struct nf_hook_ops nf_nat_ipv4_ops[] = {
> /* Before packet filtering, change destination */
> {
> - .hook = nf_nat_ipv4_in,
> + .hook = nf_nat_ipv4_pre_routing,
> .pf = NFPROTO_IPV4,
> .hooknum = NF_INET_PRE_ROUTING,
> .priority = NF_IP_PRI_NAT_DST,
> @@ -757,7 +774,7 @@ static const struct nf_hook_ops nf_nat_ipv4_ops[] = {
> },
> /* After packet filtering, change source */
> {
> - .hook = nf_nat_ipv4_fn,
> + .hook = nf_nat_ipv4_local_in,
> .pf = NFPROTO_IPV4,
> .hooknum = NF_INET_LOCAL_IN,
> .priority = NF_IP_PRI_NAT_SRC,
Powered by blists - more mailing lists