[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f0e7e495-522f-9916-9771-feac28baecdc@nextfour.com>
Date: Sat, 6 Mar 2021 18:10:18 +0200
From: Mika Penttilä <mika.penttila@...tfour.com>
To: Pablo Neira Ayuso <pablo@...filter.org>,
netfilter-devel@...r.kernel.org
Cc: davem@...emloft.net, netdev@...r.kernel.org, kuba@...nel.org
Subject: Re: [PATCH net 3/9] netfilter: nf_nat: undo erroneous tcp edemux
lookup
On 6.3.2021 16.49, Mika Penttilä wrote:
>
>
> On 6.3.2021 14.12, Pablo Neira Ayuso wrote:
>> From: Florian Westphal <fw@...len.de>
>>
>> Under extremely rare conditions TCP early demux will retrieve the wrong
>> socket.
>>
>> 1. local machine establishes a connection to a remote server, S, on port
>> p.
>>
>> This gives:
>> laddr:lport -> S:p
>> ... both in tcp and conntrack.
>>
>> 2. local machine establishes a connection to host H, on port p2.
>> 2a. TCP stack choses same laddr:lport, so we have
>> laddr:lport -> H:p2 from TCP point of view.
>> 2b). There is a destination NAT rewrite in place, translating
>> H:p2 to S:p. This results in following conntrack entries:
>>
>> I) laddr:lport -> S:p (origin) S:p -> laddr:lport (reply)
>> II) laddr:lport -> H:p2 (origin) S:p -> laddr:lport2 (reply)
>>
>> NAT engine has rewritten laddr:lport to laddr:lport2 to map
>> the reply packet to the correct origin.
> Could you eloborate where and how linux nat engine is doing the
>
> laddr:lport to laddr:lport2
>
> rewrite? There's only DST nat and there should be conflict (for reply)
> in tuple establishment afaik....
Ah I see it is the nat null binding for src to make it unique
>
>
>>
>> When server sends SYN/ACK to laddr:lport2, the PREROUTING hook
>> will undo-the SNAT transformation, rewriting IP header to
>> S:p -> laddr:lport
>>
>> This causes TCP early demux to associate the skb with the TCP socket
>> of the first connection.
>>
>> The INPUT hook will then reverse the DNAT transformation, rewriting
>> the IP header to H:p2 -> laddr:lport.
>>
>> Because packet ends up with the wrong socket, the new connection
>> never completes: originator stays in SYN_SENT and conntrack entry
>> remains in SYN_RECV until timeout, and responder retransmits SYN/ACK
>> until it gives up.
>>
>> To resolve this, orphan the skb after the input rewrite:
>> Because the source IP address changed, the socket must be incorrect.
>> We can't move the DNAT undo to prerouting due to backwards
>> compatibility, doing so will make iptables/nftables rules to no longer
>> match the way they did.
>>
>> After orphan, the packet will be handed to the next protocol layer
>> (tcp, udp, ...) and that will repeat the socket lookup just like as if
>> early demux was disabled.
>>
>> Fixes: 41063e9dd1195 ("ipv4: Early TCP socket demux.")
>> Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1427
>> Signed-off-by: Florian Westphal <fw@...len.de>
>> Signed-off-by: Pablo Neira Ayuso <pablo@...filter.org>
>> ---
>> net/netfilter/nf_nat_proto.c | 25 +++++++++++++++++++++----
>> 1 file changed, 21 insertions(+), 4 deletions(-)
>>
>> diff --git a/net/netfilter/nf_nat_proto.c b/net/netfilter/nf_nat_proto.c
>> index e87b6bd6b3cd..4731d21fc3ad 100644
>> --- a/net/netfilter/nf_nat_proto.c
>> +++ b/net/netfilter/nf_nat_proto.c
>> @@ -646,8 +646,8 @@ nf_nat_ipv4_fn(void *priv, struct sk_buff *skb,
>> }
>> static unsigned int
>> -nf_nat_ipv4_in(void *priv, struct sk_buff *skb,
>> - const struct nf_hook_state *state)
>> +nf_nat_ipv4_pre_routing(void *priv, struct sk_buff *skb,
>> + const struct nf_hook_state *state)
>> {
>> unsigned int ret;
>> __be32 daddr = ip_hdr(skb)->daddr;
>> @@ -659,6 +659,23 @@ nf_nat_ipv4_in(void *priv, struct sk_buff *skb,
>> return ret;
>> }
>> +static unsigned int
>> +nf_nat_ipv4_local_in(void *priv, struct sk_buff *skb,
>> + const struct nf_hook_state *state)
>> +{
>> + __be32 saddr = ip_hdr(skb)->saddr;
>> + struct sock *sk = skb->sk;
>> + unsigned int ret;
>> +
>> + ret = nf_nat_ipv4_fn(priv, skb, state);
>> +
>> + if (ret == NF_ACCEPT && sk && saddr != ip_hdr(skb)->saddr &&
>> + !inet_sk_transparent(sk))
>> + skb_orphan(skb); /* TCP edemux obtained wrong socket */
>> +
>> + return ret;
>> +}
>> +
>> static unsigned int
>> nf_nat_ipv4_out(void *priv, struct sk_buff *skb,
>> const struct nf_hook_state *state)
>> @@ -736,7 +753,7 @@ nf_nat_ipv4_local_fn(void *priv, struct sk_buff
>> *skb,
>> static const struct nf_hook_ops nf_nat_ipv4_ops[] = {
>> /* Before packet filtering, change destination */
>> {
>> - .hook = nf_nat_ipv4_in,
>> + .hook = nf_nat_ipv4_pre_routing,
>> .pf = NFPROTO_IPV4,
>> .hooknum = NF_INET_PRE_ROUTING,
>> .priority = NF_IP_PRI_NAT_DST,
>> @@ -757,7 +774,7 @@ static const struct nf_hook_ops nf_nat_ipv4_ops[]
>> = {
>> },
>> /* After packet filtering, change source */
>> {
>> - .hook = nf_nat_ipv4_fn,
>> + .hook = nf_nat_ipv4_local_in,
>> .pf = NFPROTO_IPV4,
>> .hooknum = NF_INET_LOCAL_IN,
>> .priority = NF_IP_PRI_NAT_SRC,
>
Powered by blists - more mailing lists